nips nips2005 nips2005-60 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Purnamrita Sarkar, Andrew W. Moore
Abstract: This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. The generalized model associates each entity with a point in p-dimensional Euclidian latent space. The points can move as time progresses but large moves in latent space are improbable. Observed links between entities are more likely if the entities are close in latent space. We show how to make such a model tractable (subquadratic in the number of entities) by the use of appropriate kernel functions for similarity in latent space; the use of low dimensional kd-trees; a new efficient dynamic adaptation of multidimensional scaling for a first pass of approximate projection of entities into latent space; and an efficient conjugate gradient update rule for non-linear local optimization in which amortized time per entity during an update is O(log n). We use both synthetic and real-world data on upto 11,000 entities which indicate linear scaling in computation time and improved performance over four alternative approaches. We also illustrate the system operating on twelve years of NIPS co-publication data. We present a detailed version of this work in [1]. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper explores two aspects of social network modeling. [sent-4, score-0.108]
2 First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. [sent-5, score-0.13]
3 Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. [sent-6, score-0.422]
4 The generalized model associates each entity with a point in p-dimensional Euclidian latent space. [sent-7, score-0.412]
5 The points can move as time progresses but large moves in latent space are improbable. [sent-8, score-0.292]
6 Observed links between entities are more likely if the entities are close in latent space. [sent-9, score-1.058]
7 We use both synthetic and real-world data on upto 11,000 entities which indicate linear scaling in computation time and improved performance over four alternative approaches. [sent-11, score-0.471]
8 1 Introduction Social network analysis is becoming increasingly important in many fields besides sociology including intelligence analysis [2], marketing [3] and recommender systems [4]. [sent-14, score-0.061]
9 Consider a friendship graph in which the nodes are entities and two entities are linked if and only if they have been observed to collaborate in some way. [sent-16, score-0.842]
10 In 2002, Raftery et al [5]introduced a model similar to Multidimensional Scaling in which entities are associated with locations in p-dimensional space, and links are more likely if the entities are close in latent space. [sent-17, score-1.124]
11 In this paper we suppose that each observed link is associated with a discrete timestep, so each timestep produces its own graph of observed links, and information is preserved between timesteps by two assumptions. [sent-18, score-0.393]
12 First we assume entities can move in latent space between timesteps, but large moves are improbable. [sent-19, score-0.638]
13 Let Gt be the graph of observed pairwise links at time t. [sent-21, score-0.219]
14 Assuming n entities, and a p-dimensional latent space, let Xt be an n × p matrix in which the ith row, called xi , corresponds to the latent position of entity i at time t. [sent-22, score-0.648]
15 For most of this paper we treat the problem as a tracking problem in which we estimate Xt at each timestep as a function of the current observed graph Gt and the previously estimated positions Xt−1 . [sent-24, score-0.285]
16 We want Xt = arg max P (X|Gt , Xt−1 ) = arg max P (Gt |X)P (X|Xt−1 ) X (1) X In Section 2 we design models of P (Gt |Xt ) and P (Xt |Xt−1 ) that meet our modeling needs and which have learning times that are tractable as n gets large. [sent-25, score-0.105]
17 The first stage generalizes linear multidimensional scaling algorithms to the dynamic case while carefully maintaining the ability to computationally exploit sparsity in the graph. [sent-27, score-0.228]
18 The second stage refines this estimate using an augmented conjugate gradient approach in which gradient updates can use kd-trees over latent space to allow O(n log n) computation per step. [sent-29, score-0.442]
19 X0 X1 … XT Figure 1: Model through time G0 2 G1 GT The DSNL (Dynamic Social Network in Latent space) Model Let dij = |xi − xj | be the Euclidian distance between entities i and j in latent space at time t. [sent-30, score-0.976]
20 We denote linkage at time t by i ∼ j, and absence of a link by i ∼ j. [sent-32, score-0.18]
21 1 Observation Model The likelihood score function P (Gt |Xt ) intuitively measures how well the model explains pairs of entities which are actually connected in the training graph as well as those that are not. [sent-36, score-0.565]
22 Thus it is simply pij P (Gt |Xt ) = i∼j (1 − pij ) (2) i∼j Following [5] the link probability is a logistic function of dij and is denoted as pL , i. [sent-37, score-0.85]
23 To extend this model to the dynamic case, we now make two important alterations. [sent-41, score-0.058]
24 pL = ij First, we allow entities to vary their sociability. [sent-42, score-0.418]
25 Some entities participate in many links while others are in few. [sent-43, score-0.481]
26 We give each entity a radius, which will be used as a sphere of interaction within latent space. [sent-44, score-0.389]
27 We introduce the term rij to replace α in equation (3). [sent-46, score-0.223]
28 Intuitively, an entity with higher degree will have a larger radius. [sent-48, score-0.188]
29 Thus we define the radius of entity i with degree δi as, c(δi + 1), so that rij is c × (max(δi , δj ) + 1), and c will be estimated from the data. [sent-49, score-0.427]
30 In practice, we estimate the constant c by a simple line-search on the score function. [sent-50, score-0.077]
31 The actual logistic function, and our kernelized version with ρ = 0. [sent-69, score-0.057]
32 The actual (flat, with one minimum), and the modified (steep with two minima) constraint functions, for two dimensions, with Xt varying over a 2-d grid, from (−2, −2) to (2, 2), and Xt−1 = (1, 1) The second alteration is to weigh the link probabilities by a kernel function. [sent-72, score-0.088]
33 We alter the simple logistic link probability pL , such that two entities have high probability of linkage ij only if their latent coordinates are within distance rij of one another. [sent-73, score-1.084]
34 Later we will need the kernelized function to be continuous and differentiable at rij . [sent-75, score-0.214]
35 K(dij ) = (1 − (dij /rij )2 )2 , = 0, when dij ≤ rij otherwise Using this function we redefine our link probability pij as This is equivalent to having, 1 K(dij ) + ρ(1 − K(dij )) 1 + e(dij −rij ) =ρ pij = pL K(dij ) ij (4) + ρ(1 − K(dij )) . [sent-77, score-1.049]
36 when dij ≤ rij otherwise (5) We plot this function in Figure 2A. [sent-78, score-0.485]
37 2 Transition Model The second part of the score penalizes large displacements from the previous time step. [sent-80, score-0.128]
38 We use the most obvious Gaussian model: each coordinate of each latent position is independently subjected to a Gaussian perturbation with mean 0 and variance σ 2 . [sent-81, score-0.201]
39 Thus n log P (Xt |Xt−1 ) = − X |Xi,t − Xi,t−1 |2 /2σ 2 + const (6) i=1 3 Learning Stage One: Linear Approximation We generalize classical multidimensional scaling (MDS) [6] to get an initial estimate of the positions in the latent space. [sent-82, score-0.427]
40 It takes as input an n × n matrix of non-negative distances D where Di,j denotes the target distance between entity i and entity j. [sent-84, score-0.475]
41 It produces an n × p matrix X where the ith row is the position ˜ of entity i in p-dimensional latent space. [sent-85, score-0.417]
42 MDS finds arg minX |D − XX T |F where | · |F ˜ is the similarity matrix obtained from D, using standard denotes the Frobenius norm [7]. [sent-86, score-0.046]
43 Let Γ be the matrix of the eigenvectors of D, and Λ be a diagonal matrix with the corresponding eigenvalues. [sent-88, score-0.056]
44 Firstly, what should be our target distance matrix D? [sent-94, score-0.051]
45 The first answer follows from [5] and defines Dij as length of the shortest path from i to j in graph G. [sent-96, score-0.047]
46 When accounting for time, we do not want the positions of entities to change drastically from one time step to another. [sent-99, score-0.473]
47 The idea is to work with the distances and not the positions themselves. [sent-105, score-0.096]
48 Since we are learning the positions from distances, we change our constraint (during this linear stage of learning) to encourage the pairwise distance between all pairs of entities to change little between each time step, instead of encouraging the individual coordinates to change little. [sent-106, score-0.73]
49 The above expression has an analytical solution: an T ˜ X t Xt ) T ( Dt T X t Xt ) T λ(Xt Xt affine combination of the current information from the graph and the coordinates at the last timestep. [sent-108, score-0.114]
50 As in MDS, eigendecomposition of the right hand side of equation 8 yields the solution Xt which minimizes the objective function in equation 7. [sent-111, score-0.137]
51 We now have a method which finds latent coordinates for time t that are consistent with Gt and have similar pairwise distances as Xt−1 . [sent-112, score-0.383]
52 But although all pairwise distances may be similar, the coordinates may be very different. [sent-113, score-0.152]
53 We solve this by applying the Procrustes transform to the solution Xt of equation 8. [sent-115, score-0.05]
54 Before moving on to stage two’s nonlinear optimization we must address the scalability of stage one. [sent-118, score-0.169]
55 The naive implementation (SVD of the matrix from equation 8) has a cost of T ˜ O(n3 ), for n nodes, since both Dt , and Xt Xt , are dense n × n matrices. [sent-119, score-0.089]
56 The power method is an iterative eigendecomposition technique which only involves multiplying a matrix by a vector. [sent-121, score-0.062]
57 4 Stage Two: Nonlinear Search Stage One places entities in reasonably consistent locations which fit our intuition, but it is not tied to the probabilistic model from Section 2. [sent-123, score-0.442]
58 Stage two uses these locations as initializations for applying nonlinear optimization directly to the model in equation 1. [sent-124, score-0.098]
59 We use conjugate gradient (CG) which was the most effective of several alternatives attempted. [sent-125, score-0.092]
60 The most important practical question is how to make these gradient computations tractable, especially when the model likelihood involves a double sum over all entities. [sent-126, score-0.064]
61 Plugging this information in (10), we have, ∂pij /∂Xi,k,t = ψi,j,k,t 0 when dij ≤ rij , otherwise. [sent-134, score-0.485]
62 (11) Equation (9) now becomes X ψi,j,k,t X ∂ log P (Gt |Xt ) = − ∂Xi,k,t pij j,i∼j dij ≤rij j,i∼j dij ≤rij ψi,j,k,t 1 − pij (12) when dij ≤ rij and zero otherwise. [sent-135, score-1.543]
63 A slightly more sophisticated trick, omitted for space reasons, lets us compute log P (Gt |Xt ), in O(rn + n log n) time. [sent-138, score-0.094]
64 To aid the early steps of CG, we add an additional term to the score function, which penalizes all pairs of connected entities according to the square 2 of their separation in latent space, i. [sent-140, score-0.717]
65 We investigate three things: ability of the algorithm to reconstruct the latent space based only on link observations, anecdotal evaluation of what happens to the NIPS data, and scalability results on large datasets from Citeseer. [sent-144, score-0.367]
66 1 Comparing with ground truth We generate synthetic data for six consecutive timesteps. [sent-146, score-0.057]
67 At each timestep the next set of two-dimensional latent coordinates are generated with the former positions as mean, and a gaussian noise of standard deviation σ = 0. [sent-147, score-0.524]
68 At each step , each entity is linked with a relatively higher probability to the ones falling within its radius, or containing it within their radii. [sent-150, score-0.231]
69 html which any two entities i and j outside the maximum pairwise radii rij are connected. [sent-156, score-0.689]
70 Accuracy is measured by drawing a test set from the same model, and determining the ROC curve for predicting whether a pair of entities will be linked in the test set. [sent-158, score-0.419]
71 A random model, guessing link probabilities randomly (this should have an AUC of 0. [sent-164, score-0.088]
72 This ranks the likelihood of being linked in the testset according to the frequency of linkage in the training set. [sent-168, score-0.105]
73 Time-varying MDS: The model that results from running stage one only. [sent-171, score-0.095]
74 MDS with no time: The model that results from ignoring time information and running independent MDS on each timestep. [sent-173, score-0.053]
75 Figure 3 shows the ROC curves for the third timestep on a test set of size 160. [sent-174, score-0.19]
76 Table 1 shows the AUC scores of our approach and the five alternatives for 3 different sizes of the dataset over the first, third, and last time steps. [sent-175, score-0.071]
77 AUC score on graphs of size n for six different models (A) True (B) Model learned by DSNL,(C) Random Model,(D) Simple Counting model(Control), (E) MDS with time, and (F) MDS without time. [sent-177, score-0.136]
78 The simple counting model rightly guesses some of the links in the test graph from the training graph. [sent-234, score-0.205]
79 When the number of links is small, MDS without time does poorly compared to our temporal version. [sent-237, score-0.135]
80 However as the number of links grows quadratically with the number of entities, regular MDS does almost as well as the temporal version: this is not a surprise because the generalization benefit from the previous timestep becomes unnecessary with sufficient data on the current timestep. [sent-238, score-0.295]
81 2 Visualizing the NIPS coauthorship data over time For clarity we present a subset of the NIPS dataset, obtained by choosing a well-connected author, and including all authors and links within a few hops. [sent-241, score-0.19]
82 In each picture we have the links for that timestep, a few well connected people highlighted, with their radii. [sent-243, score-0.15]
83 To give some intuition of the movement of the rest of the points, we divided the area in the first timestep in 4 parts, and colored and shaped the points in each differently. [sent-247, score-0.19]
84 In this paper we limit ourselves to anecdotal examination of the latent positions. [sent-249, score-0.232]
85 For example, with BurgesC and V apnikV we see that they had very small radii in the first four years, and were further apart from one another, since there was no co-publication. [sent-250, score-0.085]
86 However in the second timestep they move closer, though there are no direct links. [sent-251, score-0.229]
87 We end the discussion with entities HintonG , GhahramaniZ , and JordanM . [sent-254, score-0.376]
88 In the first timestep they did not coauthor with one another, and were placed outside one-another’s radii. [sent-255, score-0.229]
89 In the second timestep GhahramaniZ , and HintonG coauthor with JordanM . [sent-256, score-0.229]
90 However since HintonG had a large radius and more links than the former, it is harder for him to meet all the constraints, and he doesn’t move very close to JordanM . [sent-257, score-0.215]
91 In the next timestep however GhahramaniZ has a link with both of the others, and they move substantially closer to one another. [sent-258, score-0.317]
92 When kd-trees are used and the graphs are sparse scaling is clearly sub-quadratic and nearly linear in the number of entities, meeting our expectation of O(n log n) performance. [sent-261, score-0.103]
93 The results show subquadratic timecomplexity along with satisfactory link prediction on test sets. [sent-263, score-0.127]
94 6 Conclusions and Future Work This paper has described a method for modeling relationships that change over time. [sent-264, score-0.047]
95 We also plan to extend this to find the posterior distributions of the coordinates following the approach used by [5]. [sent-267, score-0.067]
96 Koch C KochC Manwani ManwaniA Burges A C Vapnik V Viola Sejnowski T BurgesC P VapnikV ViolaP SejnowskiT HintonG Jordan M Hinton G Ghahramani Z JordanM GhahramaniZ (A) (B) 1 ManwaniA KochC quadratic score score using kd−tree 0. [sent-301, score-0.154]
97 1 0 300 (C) 400 500 600 700 Number of entities 800 900 1000 (D) Figure 4: NIPS coauthorship data at A. [sent-310, score-0.41]
98 Time taken for score calculation vs number of entities. [sent-317, score-0.077]
99 An algorithm for clustering relational data with applications to social network analysis and comparison with multidimensional scaling. [sent-337, score-0.188]
100 Studies in the robustness of multidimensional scaling : Perturbational analysis of classical scaling. [sent-349, score-0.121]
wordName wordTfidf (topN-words)
[('xt', 0.491), ('entities', 0.376), ('dij', 0.294), ('mds', 0.255), ('pij', 0.217), ('latent', 0.201), ('rij', 0.191), ('timestep', 0.19), ('entity', 0.188), ('gt', 0.177), ('links', 0.105), ('ghahramaniz', 0.098), ('hintong', 0.098), ('jordanm', 0.098), ('pl', 0.095), ('link', 0.088), ('social', 0.086), ('radii', 0.085), ('multidimensional', 0.08), ('score', 0.077), ('stage', 0.072), ('dt', 0.072), ('coordinates', 0.067), ('linkage', 0.062), ('burgesc', 0.059), ('dsnl', 0.059), ('auc', 0.054), ('radius', 0.048), ('distances', 0.048), ('positions', 0.048), ('graph', 0.047), ('locations', 0.043), ('linked', 0.043), ('ij', 0.042), ('gradient', 0.041), ('scaling', 0.041), ('timesteps', 0.041), ('move', 0.039), ('biquadratic', 0.039), ('coauthor', 0.039), ('kochc', 0.039), ('logp', 0.039), ('manwania', 0.039), ('pconst', 0.039), ('raftery', 0.039), ('recommender', 0.039), ('sarkar', 0.039), ('sejnowskit', 0.039), ('subquadratic', 0.039), ('vapnikv', 0.039), ('cg', 0.037), ('pairwise', 0.037), ('log', 0.036), ('dynamic', 0.035), ('logistic', 0.034), ('euclidian', 0.034), ('coauthorship', 0.034), ('eigendecomposition', 0.034), ('six', 0.033), ('roc', 0.033), ('equation', 0.032), ('anecdotal', 0.031), ('time', 0.03), ('counting', 0.03), ('conjugate', 0.029), ('svd', 0.029), ('dense', 0.029), ('matrix', 0.028), ('nips', 0.028), ('tractable', 0.028), ('relationships', 0.028), ('preserved', 0.027), ('graphs', 0.026), ('viola', 0.025), ('scalability', 0.025), ('synthetic', 0.024), ('distance', 0.023), ('model', 0.023), ('people', 0.023), ('kernelized', 0.023), ('meet', 0.023), ('nds', 0.023), ('alternatives', 0.022), ('network', 0.022), ('space', 0.022), ('connected', 0.022), ('penalizes', 0.021), ('clarity', 0.021), ('generalize', 0.021), ('objective', 0.021), ('automated', 0.02), ('conditionally', 0.02), ('pairs', 0.02), ('change', 0.019), ('sizes', 0.019), ('arg', 0.018), ('solution', 0.018), ('former', 0.018), ('try', 0.018), ('gets', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999887 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models
Author: Purnamrita Sarkar, Andrew W. Moore
Abstract: This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. The generalized model associates each entity with a point in p-dimensional Euclidian latent space. The points can move as time progresses but large moves in latent space are improbable. Observed links between entities are more likely if the entities are close in latent space. We show how to make such a model tractable (subquadratic in the number of entities) by the use of appropriate kernel functions for similarity in latent space; the use of low dimensional kd-trees; a new efficient dynamic adaptation of multidimensional scaling for a first pass of approximate projection of entities into latent space; and an efficient conjugate gradient update rule for non-linear local optimization in which amortized time per entity during an update is O(log n). We use both synthetic and real-world data on upto 11,000 entities which indicate linear scaling in computation time and improved performance over four alternative approaches. We also illustrate the system operating on twelve years of NIPS co-publication data. We present a detailed version of this work in [1]. 1
2 0.2464903 80 nips-2005-Gaussian Process Dynamical Models
Author: Jack Wang, Aaron Hertzmann, David M. Blei
Abstract: This paper introduces Gaussian Process Dynamical Models (GPDM) for nonlinear time series analysis. A GPDM comprises a low-dimensional latent space with associated dynamics, and a map from the latent space to an observation space. We marginalize out the model parameters in closed-form, using Gaussian Process (GP) priors for both the dynamics and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach on human motion capture data in which each pose is 62-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces. Webpage: http://www.dgp.toronto.edu/∼ jmwang/gpdm/ 1
3 0.18709324 89 nips-2005-Group and Topic Discovery from Relations and Their Attributes
Author: Xuerui Wang, Natasha Mohanty, Andrew McCallum
Abstract: We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and thirteen years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words, or Blockstructures for votes, the Group-Topic model’s joint inference discovers more cohesive groups and improved topics. 1
4 0.15339397 50 nips-2005-Convex Neural Networks
Author: Yoshua Bengio, Nicolas L. Roux, Pascal Vincent, Olivier Delalleau, Patrice Marcotte
Abstract: Convexity has recently received a lot of attention in the machine learning community, and the lack of convexity has been seen as a major disadvantage of many learning algorithms, such as multi-layer artificial neural networks. We show that training multi-layer neural networks in which the number of hidden units is learned can be viewed as a convex optimization problem. This problem involves an infinite number of variables, but can be solved by incrementally inserting a hidden unit at a time, each time finding a linear classifier that minimizes a weighted sum of errors. 1
5 0.14431806 185 nips-2005-Subsequence Kernels for Relation Extraction
Author: Raymond J. Mooney, Razvan C. Bunescu
Abstract: We present a new kernel method for extracting semantic relations between entities in natural language text, based on a generalization of subsequence kernels. This kernel uses three types of subsequence patterns that are typically employed in natural language to assert relationships between two entities. Experiments on extracting protein interactions from biomedical corpora and top-level relations from newspaper corpora demonstrate the advantages of this approach. 1
6 0.12753591 115 nips-2005-Learning Shared Latent Structure for Image Synthesis and Robotic Imitation
7 0.12733486 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
8 0.10505723 79 nips-2005-Fusion of Similarity Data in Clustering
9 0.088792711 46 nips-2005-Consensus Propagation
10 0.084962428 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
11 0.082450889 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
12 0.074883379 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
13 0.068381988 133 nips-2005-Nested sampling for Potts models
14 0.067198612 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
15 0.06474416 141 nips-2005-Norepinephrine and Neural Interrupts
16 0.061736345 98 nips-2005-Infinite latent feature models and the Indian buffet process
17 0.059424058 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error
18 0.055696294 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models
19 0.055055697 45 nips-2005-Conditional Visual Tracking in Kernel Space
20 0.053795822 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
topicId topicWeight
[(0, 0.192), (1, 0.037), (2, 0.015), (3, 0.042), (4, 0.024), (5, -0.214), (6, 0.137), (7, 0.024), (8, 0.001), (9, 0.141), (10, -0.077), (11, 0.148), (12, -0.377), (13, -0.153), (14, 0.047), (15, 0.022), (16, -0.02), (17, -0.001), (18, 0.057), (19, -0.096), (20, 0.072), (21, 0.057), (22, 0.185), (23, 0.075), (24, 0.012), (25, 0.016), (26, 0.06), (27, -0.056), (28, -0.028), (29, 0.145), (30, -0.044), (31, 0.218), (32, 0.038), (33, 0.019), (34, 0.016), (35, -0.141), (36, -0.062), (37, 0.106), (38, -0.017), (39, 0.033), (40, 0.191), (41, 0.053), (42, -0.008), (43, -0.044), (44, -0.112), (45, 0.053), (46, 0.03), (47, -0.068), (48, -0.038), (49, -0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.96936303 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models
Author: Purnamrita Sarkar, Andrew W. Moore
Abstract: This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. The generalized model associates each entity with a point in p-dimensional Euclidian latent space. The points can move as time progresses but large moves in latent space are improbable. Observed links between entities are more likely if the entities are close in latent space. We show how to make such a model tractable (subquadratic in the number of entities) by the use of appropriate kernel functions for similarity in latent space; the use of low dimensional kd-trees; a new efficient dynamic adaptation of multidimensional scaling for a first pass of approximate projection of entities into latent space; and an efficient conjugate gradient update rule for non-linear local optimization in which amortized time per entity during an update is O(log n). We use both synthetic and real-world data on upto 11,000 entities which indicate linear scaling in computation time and improved performance over four alternative approaches. We also illustrate the system operating on twelve years of NIPS co-publication data. We present a detailed version of this work in [1]. 1
2 0.62575328 89 nips-2005-Group and Topic Discovery from Relations and Their Attributes
Author: Xuerui Wang, Natasha Mohanty, Andrew McCallum
Abstract: We present a probabilistic generative model of entity relationships and their attributes that simultaneously discovers groups among the entities and topics among the corresponding textual attributes. Block-models of relationship data have been studied in social network analysis for some time. Here we simultaneously cluster in several modalities at once, incorporating the attributes (here, words) associated with certain relationships. Significantly, joint inference allows the discovery of topics to be guided by the emerging groups, and vice-versa. We present experimental results on two large data sets: sixteen years of bills put before the U.S. Senate, comprising their corresponding text and voting records, and thirteen years of similar data from the United Nations. We show that in comparison with traditional, separate latent-variable models for words, or Blockstructures for votes, the Group-Topic model’s joint inference discovers more cohesive groups and improved topics. 1
3 0.6190809 80 nips-2005-Gaussian Process Dynamical Models
Author: Jack Wang, Aaron Hertzmann, David M. Blei
Abstract: This paper introduces Gaussian Process Dynamical Models (GPDM) for nonlinear time series analysis. A GPDM comprises a low-dimensional latent space with associated dynamics, and a map from the latent space to an observation space. We marginalize out the model parameters in closed-form, using Gaussian Process (GP) priors for both the dynamics and the observation mappings. This results in a nonparametric model for dynamical systems that accounts for uncertainty in the model. We demonstrate the approach on human motion capture data in which each pose is 62-dimensional. Despite the use of small data sets, the GPDM learns an effective representation of the nonlinear dynamics in these spaces. Webpage: http://www.dgp.toronto.edu/∼ jmwang/gpdm/ 1
4 0.40568933 67 nips-2005-Extracting Dynamical Structure Embedded in Neural Activity
Author: Afsheen Afshar, Gopal Santhanam, Stephen I. Ryu, Maneesh Sahani, Byron M. Yu, Krishna V. Shenoy
Abstract: Spiking activity from neurophysiological experiments often exhibits dynamics beyond that driven by external stimulation, presumably reflecting the extensive recurrence of neural circuitry. Characterizing these dynamics may reveal important features of neural computation, particularly during internally-driven cognitive operations. For example, the activity of premotor cortex (PMd) neurons during an instructed delay period separating movement-target specification and a movementinitiation cue is believed to be involved in motor planning. We show that the dynamics underlying this activity can be captured by a lowdimensional non-linear dynamical systems model, with underlying recurrent structure and stochastic point-process output. We present and validate latent variable methods that simultaneously estimate the system parameters and the trial-by-trial dynamical trajectories. These methods are applied to characterize the dynamics in PMd data recorded from a chronically-implanted 96-electrode array while monkeys perform delayed-reach tasks. 1
5 0.39687985 115 nips-2005-Learning Shared Latent Structure for Image Synthesis and Robotic Imitation
Author: Aaron Shon, Keith Grochow, Aaron Hertzmann, Rajesh P. Rao
Abstract: We propose an algorithm that uses Gaussian process regression to learn common hidden structure shared between corresponding sets of heterogenous observations. The observation spaces are linked via a single, reduced-dimensionality latent variable space. We present results from two datasets demonstrating the algorithms’s ability to synthesize novel data from learned correspondences. We first show that the method can learn the nonlinear mapping between corresponding views of objects, filling in missing data as needed to synthesize novel views. We then show that the method can learn a mapping between human degrees of freedom and robotic degrees of freedom for a humanoid robot, allowing robotic imitation of human poses from motion capture data. 1
6 0.38884497 185 nips-2005-Subsequence Kernels for Relation Extraction
7 0.38044262 50 nips-2005-Convex Neural Networks
8 0.36749053 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
9 0.30650416 81 nips-2005-Gaussian Processes for Multiuser Detection in CDMA receivers
10 0.3055135 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation
11 0.30031571 46 nips-2005-Consensus Propagation
12 0.28830904 191 nips-2005-The Forgetron: A Kernel-Based Perceptron on a Fixed Budget
13 0.28462741 68 nips-2005-Factorial Switching Kalman Filters for Condition Monitoring in Neonatal Intensive Care
14 0.26154101 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error
15 0.23850583 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours
16 0.23717606 79 nips-2005-Fusion of Similarity Data in Clustering
17 0.230459 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
18 0.22179504 52 nips-2005-Correlated Topic Models
19 0.21654253 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
20 0.20953944 45 nips-2005-Conditional Visual Tracking in Kernel Space
topicId topicWeight
[(3, 0.042), (10, 0.054), (18, 0.011), (27, 0.052), (31, 0.037), (34, 0.079), (41, 0.026), (55, 0.045), (62, 0.287), (69, 0.049), (73, 0.038), (88, 0.137), (91, 0.033)]
simIndex simValue paperId paperTitle
1 0.85601932 194 nips-2005-Top-Down Control of Visual Attention: A Rational Account
Author: Michael Shettel, Shaun Vecera, Michael C. Mozer
Abstract: Theories of visual attention commonly posit that early parallel processes extract conspicuous features such as color contrast and motion from the visual field. These features are then combined into a saliency map, and attention is directed to the most salient regions first. Top-down attentional control is achieved by modulating the contribution of different feature types to the saliency map. A key source of data concerning attentional control comes from behavioral studies in which the effect of recent experience is examined as individuals repeatedly perform a perceptual discrimination task (e.g., “what shape is the odd-colored object?”). The robust finding is that repetition of features of recent trials (e.g., target color) facilitates performance. We view this facilitation as an adaptation to the statistical structure of the environment. We propose a probabilistic model of the environment that is updated after each trial. Under the assumption that attentional control operates so as to make performance more efficient for more likely environmental states, we obtain parsimonious explanations for data from four different experiments. Further, our model provides a rational explanation for why the influence of past experience on attentional control is short lived. 1 INTRODUCTION The brain does not have the computational capacity to fully process the massive quantity of information provided by the eyes. Selective attention operates to filter the spatiotemporal stream to a manageable quantity. Key to understanding the nature of attention is discovering the algorithm governing selection, i.e., understanding what information will be selected and what will be suppressed. Selection is influenced by attributes of the spatiotemporal stream, often referred to as bottom-up contributions to attention. For example, attention is drawn to abrupt onsets, motion, and regions of high contrast in brightness and color. Most theories of attention posit that some visual information processing is performed preattentively and in parallel across the visual field. This processing extracts primitive visual features such as color and motion, which provide the bottom-up cues for attentional guidance. However, attention is not driven willy nilly by these cues. The deployment of attention can be modulated by task instructions, current goals, and domain knowledge, collectively referred to as top-down contributions to attention. How do bottom-up and top-down contributions to attention interact? Most psychologically and neurobiologically motivated models propose a very similar architecture in which information from bottom-up and top-down sources combines in a saliency (or activation) map (e.g., Itti et al., 1998; Koch & Ullman, 1985; Mozer, 1991; Wolfe, 1994). The saliency map indicates, for each location in the visual field, the relative importance of that location. Attention is drawn to the most salient locations first. Figure 1 sketches the basic architecture that incorporates bottom-up and top-down contributions to the saliency map. The visual image is analyzed to extract maps of primitive features such as color and orientation. Associated with each location in a map is a scalar visual image horizontal primitive feature maps vertical green top-down gains red saliency map FIGURE 1. An attentional saliency map constructed from bottom-up and top-down information bottom-up activations FIGURE 2. Sample display from Experiment 1 of Maljkovic and Nakayama (1994) response or activation indicating the presence of a particular feature. Most models assume that responses are stronger at locations with high local feature contrast, consistent with neurophysiological data, e.g., the response of a red feature detector to a red object is stronger if the object is surrounded by green objects. The saliency map is obtained by taking a sum of bottom-up activations from the feature maps. The bottom-up activations are modulated by a top-down gain that specifies the contribution of a particular map to saliency in the current task and environment. Wolfe (1994) describes a heuristic algorithm for determining appropriate gains in a visual search task, where the goal is to detect a target object among distractor objects. Wolfe proposes that maps encoding features that discriminate between target and distractors have higher gains, and to be consistent with the data, he proposes limits on the magnitude of gain modulation and the number of gains that can be modulated. More recently, Wolfe et al. (2003) have been explicit in proposing optimization as a principle for setting gains given the task definition and stimulus environment. One aspect of optimizing attentional control involves configuring the attentional system to perform a given task; for example, in a visual search task for a red vertical target among green vertical and red horizontal distractors, the task definition should result in a higher gain for red and vertical feature maps than for other feature maps. However, there is a more subtle form of gain modulation, which depends on the statistics of display environments. For example, if green vertical distractors predominate, then red is a better discriminative cue than vertical; and if red horizontal distractors predominate, then vertical is a better discriminative cue than red. In this paper, we propose a model that encodes statistics of the environment in order to allow for optimization of attentional control to the structure of the environment. Our model is designed to address a key set of behavioral data, which we describe next. 1.1 Attentional priming phenomena Psychological studies involve a sequence of experimental trials that begin with a stimulus presentation and end with a response from the human participant. Typically, trial order is randomized, and the context preceding a trial is ignored. However, in sequential studies, performance is examined on one trial contingent on the past history of trials. These sequential studies explore how experience influences future performance. Consider a the sequential attentional task of Maljkovic and Nakayama (1994). On each trial, the stimulus display (Figure 2) consists of three notched diamonds, one a singleton in color—either green among red or red among green. The task is to report whether the singleton diamond, referred to as the target, is notched on the left or the right. The task is easy because the singleton pops out, i.e., the time to locate the singleton does not depend on the number of diamonds in the display. Nonetheless, the response time significantly depends on the sequence of trials leading up to the current trial: If the target is the same color on the cur- rent trial as on the previous trial, response time is roughly 100 ms faster than if the target is a different color on the current trial. Considering that response times are on the order of 700 ms, this effect, which we term attentional priming, is gigantic in the scheme of psychological phenomena. 2 ATTENTIONAL CONTROL AS ADAPTATION TO THE STATISTICS OF THE ENVIRONMENT We interpret the phenomenon of attentional priming via a particular perspective on attentional control, which can be summarized in two bullets. • The perceptual system dynamically constructs a probabilistic model of the environment based on its past experience. • Control parameters of the attentional system are tuned so as to optimize performance under the current environmental model. The primary focus of this paper is the environmental model, but we first discuss the nature of performance optimization. The role of attention is to make processing of some stimuli more efficient, and consequently, the processing of other stimuli less efficient. For example, if the gain on the red feature map is turned up, processing will be efficient for red items, but competition from red items will reduce the efficiency for green items. Thus, optimal control should tune the system for the most likely states of the world by minimizing an objective function such as: J(g) = ∑ P ( e )RT g ( e ) (1) e where g is a vector of top-down gains, e is an index over environmental states, P(.) is the probability of an environmental state, and RTg(.) is the expected response time—assuming a constant error rate—to the environmental state under gains g. Determining the optimal gains is a challenge because every gain setting will result in facilitation of responses to some environmental states but hindrance of responses to other states. The optimal control problem could be solved via direct reinforcement learning, but the rapidity of human learning makes this possibility unlikely: In a variety of experimental tasks, evidence suggests that adaptation to a new task or environment can occur in just one or two trials (e.g., Rogers & Monsell, 1996). Model-based reinforcement learning is an attractive alternative, because given a model, optimization can occur without further experience in the real world. Although the number of real-world trials necessary to achieve a given level of performance is comparable for direct and model-based reinforcement learning in stationary environments (Kearns & Singh, 1999), naturalistic environments can be viewed as highly nonstationary. In such a situation, the framework we suggest is well motivated: After each experience, the environment model is updated. The updated environmental model is then used to retune the attentional system. In this paper, we propose a particular model of the environment suitable for visual search tasks. Rather than explicitly modeling the optimization of attentional control by setting gains, we assume that the optimization process will serve to minimize Equation 1. Because any gain adjustment will facilitate performance in some environmental states and hinder performance in others, an optimized control system should obtain faster reaction times for more probable environmental states. This assumption allows us to explain experimental results in a minimal, parsimonious framework. 3 MODELING THE ENVIRONMENT Focusing on the domain of visual search, we characterize the environment in terms of a probability distribution over configurations of target and distractor features. We distinguish three classes of features: defining, reported, and irrelevant. To explain these terms, consider the task of searching a display of size varying, colored, notched diamonds (Figure 2), with the task of detecting the singleton in color and judging the notch location. Color is the defining feature, notch location is the reported feature, and size is an irrelevant feature. To simplify the exposition, we treat all features as having discrete values, an assumption which is true of the experimental tasks we model. We begin by considering displays containing a single target and a single distractor, and shortly generalize to multidistractor displays. We use the framework of Bayesian networks to characterize the environment. Each feature of the target and distractor is a discrete random variable, e.g., Tcolor for target color and Dnotch for the location of the notch on the distractor. The Bayes net encodes the probability distribution over environmental states; in our working example, this distribution is P(Tcolor, Tsize, Tnotch, Dcolor, Dsize, Dnotch). The structure of the Bayes net specifies the relationships among the features. The simplest model one could consider would be to treat the features as independent, illustrated in Figure 3a for singleton-color search task. The opposite extreme would be the full joint distribution, which could be represented by a look up table indexed by the six features, or by the cascading Bayes net architecture in Figure 3b. The architecture we propose, which we’ll refer to as the dominance model (Figure 3c), has an intermediate dependency structure, and expresses the joint distribution as: P(Tcolor)P(Dcolor |Tcolor)P(Tsize |Tcolor)P(Tnotch |Tcolor)P(Dsize |Dcolor)P(Dnotch |Tcolor). The structured model is constructed based on three rules. 1. The defining feature of the target is at the root of the tree. 2. The defining feature of the distractor is conditionally dependent on the defining feature of the target. We refer to this rule as dominance of the target over the distractor. 3. The reported and irrelevant features of target (distractor) are conditionally dependent on the defining feature of the target (distractor). We refer to this rule as dominance of the defining feature over nondefining features. As we will demonstrate, the dominance model produces a parsimonious account of a wide range of experimental data. 3.1 Updating the environment model The model’s parameters are the conditional distributions embodied in the links. In the example of Figure 3c with binary random variables, the model has 11 parameters. However, these parameters are determined by the environment: To be adaptive in nonstationary environments, the model must be updated following each experienced state. We propose a simple exponentially weighted averaging approach. For two variables V and W with observed values v and w on trial t, a conditional distribution, P t ( V = u W = w ) = δ uv , is (a) Tcolor Dcolor Tsize Tnotch (b) Tcolor Dcolor Dsize Tsize Dnotch Tnotch (c) Tcolor Dcolor Dsize Tsize Dsize Dnotch Tnotch Dnotch FIGURE 3. Three models of a visual-search environment with colored, notched, size-varying diamonds. (a) feature-independence model; (b) full-joint model; (c) dominance model. defined, where δ is the Kronecker delta. The distribution representing the environment E following trial t, denoted P t , is then updated as follows: E E P t ( V = u W = w ) = αP t – 1 ( V = u W = w ) + ( 1 – α )P t ( V = u W = w ) (2) for all u, where α is a memory constant. Note that no update is performed for values of W other than w. An analogous update is performed for unconditional distributions. E How the model is initialized—i.e., specifying P 0 —is irrelevant, because all experimental tasks that we model, participants begin the experiment with many dozens of practice trials. E Data is not collected during practice trials. Consequently, any transient effects of P 0 do E not impact the results. In our simulations, we begin with a uniform distribution for P 0 , and include practice trials as in the human studies. Thus far, we’ve assumed a single target and a single distractor. The experiments that we model involve multiple distractors. The simple extension we require to handle multiple distractors is to define a frequentist probability for each distractor feature V, P t ( V = v W = w ) = C vw ⁄ C w , where C vw is the count of co-occurrences of feature values v and w among the distractors, and C w is the count of w. Our model is extremely simple. Given a description of the visual search task and environment, the model has only a single degree of freedom, α . In all simulations, we fix α = 0.75 ; however, the choice of α does not qualitatively impact any result. 4 SIMULATIONS In this section, we show that the model can explain a range of data from four different experiments examining attentional priming. All experiments measure response times of participants. On each trial, the model can be used to obtain a probability of the display configuration (the environmental state) on that trial, given the history of trials to that point. Our critical assumption—as motivated earlier—is that response times monotonically decrease with increasing probability, indicating that visual information processing is better configured for more likely environmental states. The particular relationship we assume is that response times are linear in log probability. This assumption yields long response time tails, as are observed in all human studies. 4.1 Maljkovic and Nakayama (1994, Experiment 5) In this experiment, participants were asked to search for a singleton in color in a display of three red or green diamonds. Each diamond was notched on either the left or right side, and the task was to report the side of the notch on the color singleton. The well-practiced participants made very few errors. Reaction time (RT) was examined as a function of whether the target on a given trial is the same or different color as the target on trial n steps back or ahead. Figure 4 shows the results, with the human RTs in the left panel and the simulation log probabilities in the right panel. The horizontal axis represents n. Both graphs show the same outcome: repetition of target color facilitates performance. This influence lasts only for a half dozen trials, with an exponentially decreasing influence further into the past. In the model, this decreasing influence is due to the exponential decay of recent history (Equation 2). Figure 4 also shows that—as expected—the future has no influence on the current trial. 4.2 Maljkovic and Nakayama (1994, Experiment 8) In the previous experiment, it is impossible to determine whether facilitation is due to repetition of the target’s color or the distractor’s color, because the display contains only two colors, and therefore repetition of target color implies repetition of distractor color. To unconfound these two potential factors, an experiment like the previous one was con- ducted using four distinct colors, allowing one to examine the effect of repeating the target color while varying the distractor color, and vice versa. The sequence of trials was composed of subsequences of up-to-six consecutive trials with either the target or distractor color held constant while the other color was varied trial to trial. Following each subsequence, both target and distractors were changed. Figure 5 shows that for both humans and the simulation, performance improves toward an asymptote as the number of target and distractor repetitions increases; in the model, the asymptote is due to the probability of the repeated color in the environment model approaching 1.0. The performance improvement is greater for target than distractor repetition; in the model, this difference is due to the dominance of the defining feature of the target over the defining feature of the distractor. 4.3 Huang, Holcombe, and Pashler (2004, Experiment 1) Huang et al. (2004) and Hillstrom (2000) conducted studies to determine whether repetitions of one feature facilitate performance independently of repetitions of another feature. In the Huang et al. study, participants searched for a singleton in size in a display consisting of lines that were short and long, slanted left or right, and colored white or black. The reported feature was target slant. Slant, size, and color were uncorrelated. Huang et al. discovered that repeating an irrelevant feature (color or orientation) facilitated performance, but only when the defining feature (size) was repeated. As shown in Figure 6, the model replicates human performance, due to the dominance of the defining feature over the reported and irrelevant features. 4.4 Wolfe, Butcher, Lee, and Hyde (2003, Experiment 1) In an empirical tour-de-force, Wolfe et al. (2003) explored singleton search over a range of environments. The task is to detect the presence or absence of a singleton in displays conHuman data Different Color 600 Different Color 590 580 570 15 13 11 9 7 Past 5 3.2 3 Same Color 2.8 Same Color 560 550 Simulation 3.4 log(P(trial)) Reaction Time (msec) 610 3 1 +1 +3 +5 Future 2.6 +7 15 13 Relative Trial Number 11 9 7 Past 5 3 1 +1 +3 +5 Future +7 Relative Trial Number FIGURE 4. Experiment 5 of Maljkovic and Nakayama (1994): performance on a given trial conditional on the color of the target on a previous or subsequent trial. Human data is from subject KN. 650 6 Distractors Same 630 5.5 log(P(trial)) FIGURE 5. Experiment 8 of Maljkovic and Nakayama (1994). (left panel) human data, average of subjects KN and SS; (right panel) simulation Reaction Time (msec) 640 620 Target Same 610 5 Distractors Same 4.5 4 600 Target Same 3.5 590 3 580 1 2 3 4 5 1 6 4 5 6 4 1000 Size Alternate Size Alternate log(P(trial)) Reaction Time (msec) 3 4.2 1050 FIGURE 6. Experiment 1 of Huang, Holcombe, & Pashler (2004). (left panel) human data; (right panel) simulation 2 Order in Sequence Order in Sequence 950 3.8 3.6 3.4 900 3.2 Size Repeat 850 Size Repeat 3 Color Repeat Color Alternate Color Repeat Color Alternate sisting of colored (red or green), oriented (horizontal or vertical) lines. Target-absent trials were used primarily to ensure participants were searching the display. The experiment examined seven experimental conditions, which varied in the amount of uncertainty as to the target identity. The essential conditions, from least to most uncertainty, are: blocked (e.g., target always red vertical among green horizontals), mixed feature (e.g., target always a color singleton), mixed dimension (e.g., target either red or vertical), and fully mixed (target could be red, green, vertical, or horizontal). With this design, one can ascertain how uncertainty in the environment and in the target definition influence task difficulty. Because the defining feature in this experiment could be either color or orientation, we modeled the environment with two Bayes nets—one color dominant and one orientation dominant—and performed model averaging. A comparison of Figures 7a and 7b show a correspondence between human RTs and model predictions. Less uncertainty in the environment leads to more efficient performance. One interesting result from the model is its prediction that the mixed-feature condition is easier than the fully-mixed condition; that is, search is more efficient when the dimension (i.e., color vs. orientation) of the singleton is known, even though the model has no abstract representation of feature dimensions, only feature values. 4.5 Optimal adaptation constant In all simulations so far, we fixed the memory constant. From the human data, it is clear that memory for recent experience is relatively short lived, on the order of a half dozen trials (e.g., left panel of Figure 4). In this section we provide a rational argument for the short duration of memory in attentional control. Figure 7c shows mean negative log probability in each condition of the Wolfe et al. (2003) experiment, as a function of α . To assess these probabilities, for each experimental condition, the model was initialized so that all of the conditional distributions were uniform, and then a block of trials was run. Log probability for all trials in the block was averaged. The negative log probability (y axis of the Figure) is a measure of the model’s misprediction of the next trial in the sequence. For complex environments, such as the fully-mixed condition, a small memory constant is detrimental: With rapid memory decay, the effective history of trials is a high-variance sample of the distribution of environmental states. For simple environments, a large memory constant is detrimental: With slow memory decay, the model does not transition quickly from the initial environmental model to one that reflects the statistics of a new environment. Thus, the memory constant is constrained by being large enough that the environment model can hold on to sufficient history to represent complex environments, and by being small enough that the model adapts quickly to novel environments. If the conditions in Wolfe et al. give some indication of the range of naturalistic environments an agent encounters, we have a rational account of why attentional priming is so short lived. Whether priming lasts 2 trials or 20, the surprising empirical result is that it does not last 200 or 2000 trials. Our rational argument provides a rough insight into this finding. (a) fully mixed mixed feature mixed dimension blocked 460 (c) Simulation fully mixed mixed feature mixed dimension blocked 4 5 420 log(P(trial)) 440 2 Blocked Red or Vertical Blocked Red and Vertical Mixed Feature Mixed Dimension Fully Mixed 4 3 log(P(trial)) reaction time (msec) (b) Human Data 480 3 2 1 400 1 380 0 360 0 red or vert red and vert target type red or vert red and vert target type 0 0.5 0.8 0.9 0.95 0.98 Memory Constant FIGURE 7. (a) Human data for Wolfe et al. (2003), Experiment 1; (b) simulation; (c) misprediction of model (i.e., lower y value = better) as a function of α for five experimental condition 5 DISCUSSION The psychological literature contains two opposing accounts of attentional priming and its relation to attentional control. Huang et al. (2004) and Hillstrom (2000) propose an episodic account in which a distinct memory trace—representing the complete configuration of features in the display—is laid down for each trial, and priming depends on configural similarity of the current trial to previous trials. Alternatively, Maljkovic and Nakayama (1994) and Wolfe et al. (2003) propose a feature-strengthening account in which detection of a feature on one trial increases its ability to attract attention on subsequent trials, and priming is proportional to the number of overlapping features from one trial to the next. The episodic account corresponds roughly to the full joint model (Figure 3b), and the feature-strengthening account corresponds roughly to the independence model (Figure 3a). Neither account is adequate to explain the range of data we presented. However, an intermediate account, the dominance model (Figure 3c), is not only sufficient, but it offers a parsimonious, rational explanation. Beyond the model’s basic assumptions, it has only one free parameter, and can explain results from diverse experimental paradigms. The model makes a further theoretical contribution. Wolfe et al. distinguish the environments in their experiment in terms of the amount of top-down control available, implying that different mechanisms might be operating in different environments. However, in our account, top-down control is not some substance distributed in different amounts depending on the nature of the environment. Our account treats all environments uniformly, relying on attentional control to adapt to the environment at hand. We conclude with two limitations of the present work. First, our account presumes a particular network architecture, instead of a more elegant Bayesian approach that specifies priors over architectures, and performs automatic model selection via the sequence of trials. We did explore such a Bayesian approach, but it was unable to explain the data. Second, at least one finding in the literature is problematic for the model. Hillstrom (2000) occasionally finds that RTs slow when an irrelevant target feature is repeated but the defining target feature is not. However, because this effect is observed only in some experiments, it is likely that any model would require elaboration to explain the variability. ACKNOWLEDGEMENTS We thank Jeremy Wolfe for providing the raw data from his experiment for reanalysis. This research was funded by NSF BCS Award 0339103. REFERENCES Huang, L, Holcombe, A. O., & Pashler, H. (2004). Repetition priming in visual search: Episodic retrieval, not feature priming. Memory & Cognition, 32, 12–20. Hillstrom, A. P. (2000). Repetition effects in visual search. Perception & Psychophysics, 62, 800-817. Itti, L., Koch, C., & Niebur, E. (1998). A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. Pattern Analysis & Machine Intelligence, 20, 1254–1259. Kearns, M., & Singh, S. (1999). Finite-sample convergence rates for Q-learning and indirect algorithms. In Advances in Neural Information Processing Systems 11 (pp. 996–1002). Cambridge, MA: MIT Press. Koch, C. and Ullman, S. (1985). Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology, 4, 219–227. Maljkovic, V., & Nakayama, K. (1994). Priming of pop-out: I. Role of features. Mem. & Cognition, 22, 657-672. Mozer, M. C. (1991). The perception of multiple objects: A connectionist approach. Cambridge, MA: MIT Press. Rogers, R. D., & Monsell, S. (1995). The cost of a predictable switch between simple cognitive tasks. Journal of Experimental Psychology: General, 124, 207–231. Wolfe, J.M. (1994). Guided Search 2.0: A Revised Model of Visual Search. Psych. Bull. & Rev., 1, 202–238. Wolfe, J. S., Butcher, S. J., Lee, C., & Hyde, M. (2003). Changing your mind: on the contributions of top-down and bottom-up guidance in visual search for feature singletons. Journal of Exptl. Psychology: Human Perception & Performance, 29, 483-502.
same-paper 2 0.77593392 60 nips-2005-Dynamic Social Network Analysis using Latent Space Models
Author: Purnamrita Sarkar, Andrew W. Moore
Abstract: This paper explores two aspects of social network modeling. First, we generalize a successful static model of relationships into a dynamic model that accounts for friendships drifting over time. Second, we show how to make it tractable to learn such models from data, even as the number of entities n gets large. The generalized model associates each entity with a point in p-dimensional Euclidian latent space. The points can move as time progresses but large moves in latent space are improbable. Observed links between entities are more likely if the entities are close in latent space. We show how to make such a model tractable (subquadratic in the number of entities) by the use of appropriate kernel functions for similarity in latent space; the use of low dimensional kd-trees; a new efficient dynamic adaptation of multidimensional scaling for a first pass of approximate projection of entities into latent space; and an efficient conjugate gradient update rule for non-linear local optimization in which amortized time per entity during an update is O(log n). We use both synthetic and real-world data on upto 11,000 entities which indicate linear scaling in computation time and improved performance over four alternative approaches. We also illustrate the system operating on twelve years of NIPS co-publication data. We present a detailed version of this work in [1]. 1
3 0.5436945 179 nips-2005-Sparse Gaussian Processes using Pseudo-inputs
Author: Edward Snelson, Zoubin Ghahramani
Abstract: We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N ) training cost and O(M 2 ) prediction cost per test case. We also find hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We finally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M , i.e. very sparse solutions, and it significantly outperforms other approaches in this regime. 1
4 0.54056114 48 nips-2005-Context as Filtering
Author: Daichi Mochihashi, Yuji Matsumoto
Abstract: Long-distance language modeling is important not only in speech recognition and machine translation, but also in high-dimensional discrete sequence modeling in general. However, the problem of context length has almost been neglected so far and a na¨ve bag-of-words history has been ı employed in natural language processing. In contrast, in this paper we view topic shifts within a text as a latent stochastic process to give an explicit probabilistic generative model that has partial exchangeability. We propose an online inference algorithm using particle filters to recognize topic shifts to employ the most appropriate length of context automatically. Experiments on the BNC corpus showed consistent improvement over previous methods involving no chronological order. 1
5 0.54015416 132 nips-2005-Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
Author: Amir Navot, Lavi Shpigelman, Naftali Tishby, Eilon Vaadia
Abstract: We present a non-linear, simple, yet effective, feature subset selection method for regression and use it in analyzing cortical neural activity. Our algorithm involves a feature-weighted version of the k-nearest-neighbor algorithm. It is able to capture complex dependency of the target function on its input and makes use of the leave-one-out error as a natural regularization. We explain the characteristics of our algorithm on synthetic problems and use it in the context of predicting hand velocity from spikes recorded in motor cortex of a behaving monkey. By applying feature selection we are able to improve prediction quality and suggest a novel way of exploring neural data.
6 0.54012942 21 nips-2005-An Alternative Infinite Mixture Of Gaussian Process Experts
7 0.53709567 45 nips-2005-Conditional Visual Tracking in Kernel Space
8 0.53692746 136 nips-2005-Noise and the two-thirds power Law
9 0.53655696 23 nips-2005-An Application of Markov Random Fields to Range Sensing
10 0.53622502 74 nips-2005-Faster Rates in Regression via Active Learning
11 0.53229392 144 nips-2005-Off-policy Learning with Options and Recognizers
12 0.53122836 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error
13 0.53046244 30 nips-2005-Assessing Approximations for Gaussian Process Classification
14 0.52809441 16 nips-2005-A matching pursuit approach to sparse Gaussian process regression
15 0.52701718 92 nips-2005-Hyperparameter and Kernel Learning for Graph Based Semi-Supervised Classification
16 0.52683884 137 nips-2005-Non-Gaussian Component Analysis: a Semi-parametric Framework for Linear Dimension Reduction
17 0.52657419 171 nips-2005-Searching for Character Models
18 0.52534395 195 nips-2005-Transfer learning for text classification
19 0.52399695 32 nips-2005-Augmented Rescorla-Wagner and Maximum Likelihood Estimation
20 0.5217672 63 nips-2005-Efficient Unsupervised Learning for Localization and Detection in Object Categories