nips nips2003 nips2003-69 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Amit Gruber, Yair Weiss
Abstract: The problem of “Structure From Motion” is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis to perform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Factorization with uncertainty and missing data: exploiting temporal coherence Amit Gruber and Yair Weiss School of Computer Science and Engineering The Hebrew University of Jerusalem 91904 Jerusalem, Israel {amitg,yweiss}@cs. [sent-1, score-0.781]
2 il Abstract The problem of “Structure From Motion” is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. [sent-4, score-0.8]
3 Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. [sent-5, score-0.632]
4 Each element of the measurement matrix contains the position of a point in a particular image. [sent-6, score-0.184]
5 When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. [sent-7, score-0.535]
6 Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. [sent-8, score-0.478]
7 In this paper we use the well known EM algorithm for factor analysis to perform factorization. [sent-9, score-0.125]
8 This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). [sent-10, score-1.153]
9 We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. [sent-11, score-0.147]
10 1 Introduction Figure 1 illustrates the classical structure from motion (SFM) displays introduced by Ullman [13]. [sent-12, score-0.286]
11 A transparent cylinder with painted dots rotates around its elongated axis. [sent-13, score-0.24]
12 Even though no structure is apparent in any single frame, humans obtain a vivid percept of a cylinder1 . [sent-14, score-0.27]
13 Typically a small number of feature points are tracked and a measurement matrix is formed in 1 An online animation of this famous stimulus aris. [sent-16, score-0.386]
14 html is available at: Figure 1: The classical structure from motion stimulus introduced by Ullman [13]. [sent-20, score-0.333]
15 Humans continue to perceive the correct structure even when each dot appears only for a small number of frames, but most existing factorization algorithm fail in this case. [sent-21, score-0.547]
16 Replotted from [1] which each element corresponds to the image coordinates of a tracked point. [sent-22, score-0.257]
17 The goal is to recover the camera motion and the 3D location of these points. [sent-23, score-0.694]
18 Under simplified camera models it can be shown that this problem reduces to a problem of matrix factorization. [sent-24, score-0.485]
19 We wish to describe the measurement matrix as a product of two low rank matrices. [sent-25, score-0.252]
20 Thus if all features are reliably tracked in all the frames, the problem can be solved trivially using SVD [11]. [sent-26, score-0.167]
21 In particular, performing an SVD on the measurement matrix of the rotating cylinder stimulus recovers the correct structure even if the measurement matrix is contaminated with significant amounts of noise and if the number of frames is relatively small. [sent-27, score-1.045]
22 But in any realistic situation, the measurement matrix will have missing entries. [sent-28, score-0.504]
23 This is either because certain feature points are occluded in some of the frames and hence their positions are unknown, or due to a failure in the tracking algorithm. [sent-29, score-0.217]
24 This has lead to the development of a number of algorithms for factorization with missing data [11, 6, 9, 2]. [sent-30, score-0.644]
25 Factorization with missing data turns out to be much more difficult than the full data case. [sent-31, score-0.316]
26 To illustrate the difficulty, consider the cylinder stimulus in figure 1. [sent-32, score-0.26]
27 Humans still obtain a vivid percept of a cylinder even when each dot has a short “dot life”. [sent-33, score-0.404]
28 That is, each dot appears at a random starting frame, continues to appear for a small number of frames, and then disappears [12]. [sent-34, score-0.113]
29 We applied the algorithms in [11, 6, 9, 2] to a sequence of 20 frames of a rotating cylinder in which the dot life was 10 frames. [sent-35, score-0.643]
30 Surprisingly, none of the algorithms could recover the cylinder structure. [sent-37, score-0.287]
31 They either failed to find any structure or they gave a structure that was drastically different from a cylinder. [sent-38, score-0.218]
32 Presumably, humans are using additional prior knowledge that the algorithms are not. [sent-39, score-0.173]
33 In this paper we point out a source of information in image sequences that is usually neglected by factorization algorithms: temporal coherence. [sent-40, score-0.534]
34 In a video sequence, the camera location at time t + 1 will probably be similar to its location at time t. [sent-41, score-0.56]
35 In other words, if we randomly permute the temporal order of the frames, we will get a very unlikely image sequence. [sent-42, score-0.22]
36 Yet nearly all existing factorization algorithms will be invariant to this random permutation of the frames: they only seek a low rank approximation to a matrix and permuting the rows of the matrix will not change the approximation. [sent-43, score-0.677]
37 In order to enable the use of temporal coherence, we formulate factorization in terms of maximum likelihood for a factor analysis model, where the latent variable corresponds to camera position. [sent-44, score-1.068]
38 We use the familiar EM algorithm for factor analysis to perform factorization with missing data and uncertainty. [sent-45, score-0.727]
39 We show how to add a temporal coherence prior to the model and derive the EM updates. [sent-46, score-0.521]
40 We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. [sent-47, score-0.147]
41 2 Model A set of P feature points in F images are tracked along an image sequence. [sent-48, score-0.223]
42 Let (uf p , vf p ) denote image coordinates of feature point p in frame f . [sent-49, score-0.25]
43 In the orthographic camera model, points in the 3D world are projected in parallel onto the image plane. [sent-53, score-0.522]
44 In this 0 1 0 0 Z 1 model, a camera can undergo rotation, translation, or a combination of the two. [sent-58, score-0.412]
45 Z1 · · · Z P MF 2F ×4 1 ··· 1 4×P mT d i i motion (rotation and translation, [Mi ]2×4 = ). [sent-63, score-0.176]
46 mi and ni are 3 × 1 n T ei i vectors that describe the rotation of the camera; di and ei are scalars describing camera translation, 2 and S describes points location in 3D. [sent-64, score-0.778]
47 (2) If the elements of the noise matrix η are uncorrelated and of equal variance then we seek a factorization that minimizes the mean squared error between W and M S. [sent-66, score-0.564]
48 Missing data can be modeled using equation 2 by assuming some elements of the noise matrix η have infinite variance. [sent-68, score-0.253]
49 1 Factorization as factor analysis It is well known that the SVD calculation can be formulated as a limiting case of maximum likelihood factor analysis [8]. [sent-71, score-0.25]
50 In standard factor analysis we have a set 2 We do not subtract the mean of each row from it, since in case of missing data the centroids of points do not coincide. [sent-72, score-0.455]
51 of observations {y(t)} that are linear combinations of a latent variable x(t): y(t) = Ax(t) + η(t) (3) 2 N (0, σx I) 2 with x(t) ∼ and η(t) ∼ N (0, Ψt ). [sent-73, score-0.1]
52 If Ψt is a diagonal matrix with constant elements Ψt = σ I then in the limit σ/σx → 0 the ML estimate for A will give the same answer as the SVD. [sent-74, score-0.117]
53 In equation 1 the horizontal and vertical coordinates of the same point appear in different rows. [sent-76, score-0.107]
54 It can be rewritten as: [U V ]F ×2P = [M N ]F ×8 S 0 0 S + [˜]F ×2P η (4) 8×2P Let y(t) be the vector of noisy observations (noisy image locations) at time t, i. [sent-77, score-0.154]
55 Let x(t) be a vector of length 8 that denotes the camera position at time t x(t) = ST 0 [m(t)T d(t) n(t)T e(t)]T and let A = . [sent-80, score-0.412]
56 Identifying y(t) with the tth row 0 ST of the matrix [U V ] and x(t) with the tth row of [m n], then equation 4 is equivalent to equation 3. [sent-81, score-0.233]
57 We can now use the standard EM algorithm for factor analysis to find the ML estimate for S. [sent-82, score-0.125]
58 E step: E(x(t)|y(t)) V (x(t)|y(t)) −2 σx I + AT Ψ−1 A t = −1 −2 σx I + AT Ψ−1 A t = −1 AT Ψ−1 y(t) t (6) < x(t) > = E(x(t)|y(t)) T < x(t)x(t) > = V (x(t)|y(t))− < x(t) >< x(t) > (5) (7) T (8) M step: In the M step we solve the normal equations for the structure S. [sent-83, score-0.107]
59 If we set Ψ−1 (p, p) = Ψ−1 (p + P, p + P ) = 0 if point p is missing in frame t then we t t obtain an EM algorithm for factorization with missing data. [sent-86, score-0.943]
60 Note that the form of the updates means we can put any value we wish in the missing elements of y and they will be ignored by the algorithm. [sent-87, score-0.364]
61 The graphical model assumed by most factorization algorithms for SFM. [sent-89, score-0.356]
62 The camera location x(t) is assumed to be independent of the camera location at any other time step. [sent-90, score-0.972]
63 We model temporal coherence by assuming a Markovian structure on the camera location. [sent-93, score-0.931]
64 A more realistic noise model for real images is that Ψt is not diagonal but rather that the noise in the horizontal and vertical coordinates of the same point are correlated with an arbitrary 2 × 2 inverse covariance matrix. [sent-94, score-0.318]
65 This problem is usually called factorization with uncertainty [5, 7]. [sent-95, score-0.357]
66 2 Adding temporal coherence The factor analysis algorithm for factorization assumes that the latent variables x(t) are independent. [sent-100, score-0.954]
67 In SFM this assumption means that the camera location in different frames is independent and hence permuting the order of the frames makes no difference for the factorization. [sent-101, score-0.897]
68 Typically camera location varies smoothly as a function of time. [sent-103, score-0.486]
69 Figure 2a shows the graphical model corresponding to most factorization algorithms: the independence of the camera location is represented by the fact that every time step is isolated from the other time steps in the graph. [sent-104, score-0.838]
70 They all fail when there is noise and missing data while factor analysis with temporal coherence succeeds. [sent-108, score-1.005]
71 Rather we assume the 3D trajectory of the camera is smooth. [sent-111, score-0.45]
72 The M step is unchanged from the classical factor analysis and is given by equation 9. [sent-113, score-0.235]
73 Note that the computation of the E step is still linear in the number of frames and datapoints. [sent-116, score-0.213]
74 Within the factorization framework, we can use the classical Kalman filter and obtain a simple algorithm that provably increases the likelihood at every iteration. [sent-122, score-0.355]
75 3 Experiments In this section we describe the experimental performance of EM with time coherence compared to ground truth and to previous algorithms for structure from motion with missing data [11, 6, 9, 2]. [sent-123, score-0.908]
76 The first input sequence is the sequence of the cylinder shown in figure 1. [sent-126, score-0.271]
77 100 points uniformly drawn from the cylinder surface are tracked over 20 frames. [sent-127, score-0.405]
78 The observed image locations were added a Gaussian noise with standard deviation σ = 0. [sent-129, score-0.203]
79 5 noise level (sigma) percentage of missing data Figure 4: Graphs depict influence of noise and percentage of missing data on reconstruction results of factor analysis and [6]. [sent-146, score-0.971]
80 200 150 100 50 0 −50 −100 100 50 0 −50 −100 −150 Figure 5: Results of scene reconstruction from a real sequence: A binder and is placed on a rotating surface filmed with a static camera. [sent-147, score-0.165]
81 Our algorithm succeeded in (approximately) obtaining the right structure and all other algorithms failed. [sent-148, score-0.111]
82 with missing data and (4) noisy observations with missing data. [sent-150, score-0.662]
83 All algorithms performed well and gave similar results for the full matrix noiseless sequence. [sent-151, score-0.259]
84 In the fully observed noisy case, factor analysis without temporal coherence gave comparable performance to Tomasi-Kanade, which minimize M S − W 2 . [sent-152, score-0.669]
85 When F temporal coherence was added, the reconstruction results were improved. [sent-153, score-0.51]
86 The algorithms of Jacobs and Brand turned to be noise sensitive. [sent-155, score-0.147]
87 In the case of noiseless missing data (figure 3 top), our algorithm and Jacobs’ algorithm reconstruct the correct motion and structure. [sent-156, score-0.537]
88 Tomasi-Kanade’s algorithm and Shum’s algorithm could not handle this pattern of missing data and failed to give any structure. [sent-157, score-0.325]
89 Once we add even very mild amounts of noise (figure 3 middle) all existing algorithms fail. [sent-158, score-0.224]
90 While factor analysis with temporal coherence continues to extract the correct structure even for significant noise values. [sent-159, score-0.785]
91 4 Discussion Despite progress in algorithms for factorization with uncertainty the best existing algorithms still fall far short of human performance, even for seemingly simple stimuli. [sent-161, score-0.491]
92 In this paper we have focused on one particular prior: the temporal smoothness of the camera motion. [sent-163, score-0.564]
93 We showed how to formulate SFM as a factor analysis problem and how to add temporal coherence to the EM algorithm. [sent-164, score-0.602]
94 Temporal coherence is just one of many possible priors. [sent-166, score-0.298]
95 It has been suggested that humans also use a smoothness prior on the 3D surface they are perceiving [12]. [sent-167, score-0.195]
96 Incremental singular value decomposition of uncertain data with missing values. [sent-181, score-0.329]
97 Linear fitting with missing data: Applications to structure-frommotion and to characterizing intensity images. [sent-202, score-0.288]
98 A unified factorization algorithm for points, line segments and planes with uncertain models. [sent-208, score-0.355]
99 Principal component analysis with missing data and its application to polyhedral object modeling. [sent-219, score-0.33]
100 Shape and motion from image streams under orthography: A factorization method. [sent-230, score-0.558]
wordName wordTfidf (topN-words)
[('camera', 0.412), ('factorization', 0.314), ('coherence', 0.298), ('missing', 0.288), ('cylinder', 0.213), ('motion', 0.176), ('frames', 0.175), ('em', 0.161), ('sfm', 0.16), ('temporal', 0.152), ('jacobs', 0.136), ('svd', 0.128), ('tracked', 0.113), ('measurement', 0.111), ('cp', 0.107), ('di', 0.105), ('noise', 0.105), ('humans', 0.087), ('erent', 0.085), ('factor', 0.083), ('dot', 0.077), ('coordinates', 0.076), ('kalman', 0.074), ('location', 0.074), ('noiseless', 0.073), ('shum', 0.073), ('matrix', 0.073), ('structure', 0.069), ('rotating', 0.068), ('image', 0.068), ('latent', 0.065), ('bp', 0.063), ('permuting', 0.061), ('utp', 0.061), ('vivid', 0.061), ('vtp', 0.061), ('reconstruction', 0.06), ('trivially', 0.054), ('sp', 0.054), ('orthography', 0.053), ('percept', 0.053), ('uf', 0.053), ('ullman', 0.053), ('vf', 0.053), ('frame', 0.053), ('january', 0.052), ('noisy', 0.051), ('existing', 0.05), ('tth', 0.049), ('stimulus', 0.047), ('ml', 0.045), ('prior', 0.044), ('rotation', 0.044), ('elements', 0.044), ('uncertainty', 0.043), ('eccv', 0.043), ('iccv', 0.043), ('gave', 0.043), ('analysis', 0.042), ('points', 0.042), ('algorithms', 0.042), ('classical', 0.041), ('uncertain', 0.041), ('ax', 0.039), ('life', 0.039), ('step', 0.038), ('trajectory', 0.038), ('surface', 0.037), ('fail', 0.037), ('pages', 0.037), ('failed', 0.037), ('translation', 0.037), ('rank', 0.036), ('continues', 0.036), ('perception', 0.035), ('observations', 0.035), ('gure', 0.035), ('ei', 0.035), ('truth', 0.035), ('challenging', 0.035), ('presumably', 0.033), ('jerusalem', 0.033), ('vision', 0.033), ('extensively', 0.032), ('wish', 0.032), ('recover', 0.032), ('realistic', 0.032), ('equation', 0.031), ('mi', 0.031), ('locations', 0.03), ('cant', 0.03), ('sequence', 0.029), ('axes', 0.028), ('seek', 0.028), ('full', 0.028), ('add', 0.027), ('transparent', 0.027), ('andersen', 0.027), ('perceiving', 0.027), ('submatrices', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence
Author: Amit Gruber, Yair Weiss
Abstract: The problem of “Structure From Motion” is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis to perform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. 1
2 0.19427334 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion
Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler
Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1
3 0.16107982 37 nips-2003-Automatic Annotation of Everyday Movements
Author: Deva Ramanan, David A. Forsyth
Abstract: This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The 3D motion capture data is manually annotated off-line using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate.
4 0.14142174 119 nips-2003-Local Phase Coherence and the Perception of Blur
Author: Zhou Wang, Eero P. Simoncelli
Abstract: unkown-abstract
5 0.1032171 7 nips-2003-A Functional Architecture for Motion Pattern Processing in MSTd
Author: Scott A. Beardsley, Lucia M. Vaina
Abstract: Psychophysical studies suggest the existence of specialized detectors for component motion patterns (radial, circular, and spiral), that are consistent with the visual motion properties of cells in the dorsal medial superior temporal area (MSTd) of non-human primates. Here we use a biologically constrained model of visual motion processing in MSTd, in conjunction with psychophysical performance on two motion pattern tasks, to elucidate the computational mechanisms associated with the processing of widefield motion patterns encountered during self-motion. In both tasks discrimination thresholds varied significantly with the type of motion pattern presented, suggesting perceptual correlates to the preferred motion bias reported in MSTd. Through the model we demonstrate that while independently responding motion pattern units are capable of encoding information relevant to the visual motion tasks, equivalent psychophysical performance can only be achieved using interconnected neural populations that systematically inhibit non-responsive units. These results suggest the cyclic trends in psychophysical performance may be mediated, in part, by recurrent connections within motion pattern responsive areas whose structure is a function of the similarity in preferred motion patterns and receptive field locations between units. 1 In trod u ction A major challenge in computational neuroscience is to elucidate the architecture of the cortical circuits for sensory processing and their effective role in mediating behavior. In the visual motion system, biologically constrained models are playing an increasingly important role in this endeavor by providing an explanatory substrate linking perceptual performance and the visual properties of single cells. Single cell studies indicate the presence of complex interconnected structures in middle temporal and primary visual cortex whose most basic horizontal connections can impart considerable computational power to the underlying neural population [1, 2]. Combined psychophysical and computational studies support these findings Figure 1: a) Schematic of the graded motion pattern (GMP) task. Discrimination pairs of stimuli were created by perturbing the flow angle (φ) of each 'test' motion (with average dot speed, vav), by ±φp in the stimulus space spanned by radial and circular motions. b) Schematic of the shifted center-of-motion (COM) task. Discrimination pairs of stimuli were created by shifting the COM of the ‘test’ motion to the left and right of a central fixation point. For each motion pattern the COM was shifted within the illusory inner aperture and was never explicitly visible. and suggest that recurrent connections may play a significant role in encoding the visual motion properties associated with various psychophysical tasks [3, 4]. Using this methodology our goal is to elucidate the computational mechanisms associated with the processing of wide-field motion patterns encountered during self-motion. In the human visual motion system, psychophysical studies suggest the existence of specialized detectors for the motion pattern components (i.e., radial, circular and spiral motions) associated with self-motion [5, 6]. Neurophysiological studies reporting neurons sensitive to motion patterns in the dorsal medial superior temporal area (MSTd) support the existence of such mechanisms [7-10], and in conjunction with psychophysical studies suggest a strong link between the patterns of neural activity and motion-based perceptual performance [11, 12]. Through the combination of human psychophysical performance and biologically constrained modeling we investigate the computational role of simple recurrent connections within a population of MSTd-like units. Based on the known visual motion properties within MSTd we ask what neural structures are computationally sufficient to encode psychophysical performance on a series of motion pattern tasks. 2 M o t i o n pa t t e r n d i sc r i m i n a t i o n Using motion pattern stimuli consistent with previous studies [5, 6], we have developed a set of novel psychophysical tasks designed to facilitate a more direct comparison between human perceptual performance and the visual motion properties of cells in MSTd that have been found to underlie the discrimination of motion patterns [11, 12]. The psychophysical tasks, referred to as the graded motion pattern (GMP) and shifted center-of-motion (COM) tasks, are outlined in Fig. 1. Using a temporal two-alternative-forced-choice task we measured discrimination thresholds to global changes in the patterns of complex motion (GMP task), [13], and shifts in the center-of-motion (COM task). Stimuli were presented with central fixation using a constant stimulus paradigm and consisted of dynamic random dot displays presented in a 24o annular region (central 4o removed). In each task, the stimulus duration was randomly perturbed across presentations (440±40 msec) to control for timing-based cues, and dots moved coherently through a radial speed Figure 2: a) GMP thresholds across 8 'test' motions at two mean dot speeds for two observers. Performance varied continuously with thresholds for radial motions (φ=0, 180o) significantly lower than those for circular motions (φ=90,270o), (p<0.001; t(37)=3.39). b) COM thresholds at three mean dot speeds for two observers. As with the GMP task, performance varied continuously with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47). gradient in directions consistent with the global motion pattern presented. Discrimination thresholds were obtained across eight ‘test’ motions corresponding to expansion, contraction, CW and CCW rotation, and the four intermediate spiral motions. To minimize adaptation to specific motion patterns, opposing motions (e.g., expansion/ contraction) were interleaved across paired presentations. 2.1 Results Discrimination thresholds are reported here from a subset of the observer population consisting of three experienced psychophysical observers, one of which was naïve to the purpose of the psychophysical tasks. For each condition, performance is reported as the mean and standard error averaged across 8-12 thresholds. Across observers and dot speeds GMP thresholds followed a distinct trend in the stimulus space [13], with radial motions (expansion/contraction) significantly lower than circular motions (CW/CCW rotation), (p<0.001; t(37)=3.39), (Fig. 2a). While thresholds for the intermediate spiral motions were not significantly different from circular motions (p=0.223, t(60)=0.74), the trends across 'test' motions were well fit within the stimulus space (SB: r>0.82, SC: r>0.77) by sinusoids whose period and phase were 196 ± 10o and -72 ± 20o respectively (Fig. 1a). When the radial speed gradient was removed by randomizing the spatial distribution of dot speeds, threshold performance increased significantly across observers (p<0.05; t(17)=1.91), particularly for circular motions (p<0.005; t(25)=3.31), (data not shown). Such performance suggests a perceptual contribution associated with the presence of the speed gradient and is particularly interesting given the fact that the speed gradient did not contribute computationally relevant information to the task. However, the speed gradient did convey information regarding the integrative structure of the global motion field and as such suggests a preference of the underlying motion mechanisms for spatially structured speed information. Similar trends in performance were observed in the COM task across observers and dot speeds. Discrimination thresholds varied continuously as a function of the 'test' motion with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47) and could be well fit by a sinusoidal trend line (e.g. SB at 3 deg/s: r>0.91, period = 178 ± 10 o and phase = -70 ± 25o), (Fig. 2b). 2.2 A local or global task? The consistency of the cyclic threshold profile in stimuli that restricted the temporal integration of individual dot motions [13], and simultaneously contained all directions of motion, generally argues against a primary role for local motion mechanisms in the psychophysical tasks. While the psychophysical literature has reported a wide variety of “local” motion direction anisotropies whose properties are reminiscent of the results observed here, e.g. [14], all would predict equivalent thresholds for radial and circular motions for a set of uniformly distributed and/or spatially restricted motion direction mechanisms. Together with the computational impact of the speed gradient and psychophysical studies supporting the existence of wide-field motion pattern mechanisms [5, 6], these results suggest that the threshold differences across the GMP and COM tasks may be associated with variations in the computational properties across a series of specialized motion pattern mechanisms. 3 A computational model The similarities between the motion pattern stimuli used to quantify human perception and the visual motion properties of cells in MSTd suggests that MSTd may play a computational role in the psychophysical tasks. To examine this hypothesis, we constructed a population of MSTd-like units whose visual motion properties were consistent with the reported neurophysiology (see [13] for details). Across the population, the distribution of receptive field centers was uniform across polar angle and followed a gamma distribution Γ(5,6) across eccenticity [7]. For each unit, visual motion responses followed a gaussian tuning profile as a function of the stimulus flow angle G( φ), (σi=60±30o; [10]), and the distance of the stimulus COM from the unit’s receptive field center Gsat(xi, yi, σs=19o), Eq. 1, such that its preferred motion response was position invariant to small shifts in the COM [10] and degraded continuously for large shifts [9]. Within the model, simulations were categorized according to the distribution of preferred motions represented across the population (one reported in MSTd and a uniform control). The first distribution simulated an expansion bias in which the density of preferred motions decreased symmetrically from expansions to contraction [10]. The second distribution simulated a uniform preference for all motions and was used as a control to quantify the effects of an expansion bias on psychophysical performance. Throughout the paper we refer to simulations containing these distributions as ‘Expansion-biased’ and ‘Uniform’ respectively. 3.1 Extracting perceptual estimates from the neural code For each stimulus presentation, the ith unit’s response was calculated as the average firing rate, Ri, from the product of its motion pattern and spatial tuning profiles, ( ) Ri = Rmax G min[φ − φi ] ,σ ti G sati (x− xi , y − y i ,σ s ) + P (λ = 12 ) (1) where Rmax is the maximum preferred stimulus response (spikes/s), min[ ] refers to the minimum angular distance between the stimulus flow angle φ and the unit’s preferred motion φi, Gsat is the unit’s spatial tuning profile saturated within the central 5±3o, σti and σs are the standard deviations of the unit’s motion pattern and Figure 3: Model vs. psychophysical performance for independently responding units. Model thresholds are reported as the average (±1 S.E.) across five simulated populations. a) GMP thresholds were highest for contracting motions and lowest for expanding motions across all Expansion-biased populations. b) Comparable trends in performance were observed for COM thresholds. Comparison with the Uniform control simulations in both tasks (2000 units shown here) indicates that thresholds closely followed the distribution of preferred motions simulated within the model. spatial tuning profiles respectively, (xi,yi) is the spatial location of the unit’s receptive field center, (x,y) is the spatial location of the stimulus COM, and P(λ=12) is the background activity simulated as an uncorrelated Poisson process. The psychophysical tasks were simulated using a modified center-of-gravity ^ approach to decode estimates of the stimulus properties, i.e. flow angle (φ ) and ˆ ˆ COM location in the visual field (x, y ) , from the neural population ∑ xi Ri ∑ y i Ri v , i , ∑ φ i Ri ∑ Ri i i i i (xˆ, yˆ , φˆ) = i∑ R (2) v where φi is the unit vector in the stimulus space (Fig. 1a) corresponding to the unit’s preferred motion. For each set of paired stimuli, psychophysical judgments were made by comparing the estimated stimulus properties according to the discrimination criteria, specified in the psychophysical tasks. As with the psychophysical experiments, discrimination thresholds were computed using a leastsquares fit to percent correct performance across constant stimulus levels. 3.2 Simulation 1: Independent neural responses In the first series of simulations, GMP and COM thresholds were quantified across three populations (500, 1000, and 2000 units) of independently responding units for each simulated distribution (Expansion-biased and Uniform). Across simulations, both the range in thresholds and their trends across ‘test’ motions were compared with human psychophysical performance to quantify the effects of population size and an expansion biased preferred motion distribution on model performance. Over the psychophysical range of interest (φp ± 7o), GMP thresholds for contracting motions were at chance across all Expansion-biased populations, (Fig. 3a). While thresholds for expanding motions were generally consistent with those for human observers, those for circular motions remained significantly higher for all but the largest populations. Similar trends in performance were observed for the COM task, (Fig. 3b). Here the range of COM thresholds was well matched with human performance for simulations containing 1000 units, however, the trends across motion patterns remained inconsistent even for the largest populations. Figure 4: Proposed recurrent connection profile between motion pattern units. a) Across the motion pattern space connection strength followed an inverse gaussian profile such that the ith unit (with preferred motion φi) systematically inhibited units with anti-preferred motions centered at 180+φi. b) Across the visual field connection strength followed a difference-of-gaussians profile as a function of the relative distance between receptive field centers such that spatially local units are mutually excitatory (σRe=10o) and more distant units were mutually inhibitory (σRi=80o). For simulations containing a uniform distribution of preferred motions, the threshold range was consistent with human performance on both tasks, however, the trend across motion patterns was generally flat. What variability did occur was due primarily to the discrete sampling of preferred motions across the population. Comparison of the discrimination thresholds for the Expansion-biased and Uniform populations indicates that the trend across thresholds was closely matched to the underlying distributions of preferred motions. This result in due in part to the nearequal weighting of independently responding units and can be explained to a first approximation by the proportional increase in the signal-to-noise ratio across the population as a function of the density of units responsive to a given 'test' motion. 3.3 Simulation 2: An interconnected neural structure In a second series of simulations, we examined the computational effect of adding recurrent connections between units. If the distribution of preferred motions in MSTd is in fact biased towards expansions, as the neurophysiology suggests, it seems unlikely that independent estimates of the visual motion information would be sufficient to yield the threshold profiles observed in the psychophysical tasks. We hypothesize that a simple fixed architecture of excitatory and/or inhibitory connections is sufficient to account for the cyclic trends in discrimination thresholds. Specifically, we propose that a recurrent connection profile whose strength varies as a function of (a) the similarity between preferred motion patterns and (b) the distance between receptive field centers, is computationally sufficient to recover the trends in GMP/COM performance (Fig. 4), wij = S R e − ( xi − x j )2 + ( yi − y j )2 2 2σ R e − SR e 2 − −(min[ φi − φ j ])2 ( xi − x j )2 + ( yi − y j )2 2 2 σ Ri − Sφ e 2σ I2 (3) Figure 5: Model vs. psychophysical performance for populations containing recurrent connections (σI=80o). As the number of units increased for Expansionbiased populations, discrimination thresholds decreased to psychophysical levels and the sinusoidal trend in thresholds emerged for both the (a) GMP and (b) COM tasks. Sinusoidal trends were established for as few as 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were (193.8 ± 11.7o, -70.0 ± 22.6o) and (168.2 ± 13.7o, -118.8 ± 31.8o) for the GMP and COM tasks respectively. where wij is the strength of the recurrent connection between ith and jth units, (xi,yi) and (xj,yj) denote the spatial locations of their receptive field centers, σRe (=10o) and σRi (=80o) together define the spatial extent of a difference-of-gaussians interaction between receptive field centers, and SR and Sφ scale the connection strength. To examine the effects of the spread of motion pattern-specific inhibition and connection strength in the model, σI, Sφ, and SR were considered free parameters. Within the parameter space used to define recurrent connections (i.e., σI, Sφ and SR), Monte Carlo simulations of Expansion-biased model performance (1000 units) yielded regions of high correlation on both tasks (with respect to the psychophysical thresholds, r>0.7) that were consistent across independently simulated populations. Typically these regions were well defined over a broad range such that there was significant overlap between tasks (e.g., for the GMP task (SR=0.03), σI=[45,120o], Sφ=[0.03,0.3] and for the COM task (σI=80o), Sφ = [0.03,0.08], SR = [0.005, 0.04]). Fig. 5 shows averaged threshold performance for simulations of interconnected units drawn from the highly correlated regions of the (σI, Sφ, SR) parameter space. For populations not explicitly examined in the Monte Carlo simulations connection strengths (Sφ, SR) were scaled inversely with population size to maintain an equivalent level of recurrent activity. With the incorporation of recurrent connections, the sinusoidal trend in GMP and COM thresholds emerged for Expansion-biased populations as the number of units increased. In both tasks the cyclic threshold profiles were established for 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were consistent with human performance. Unlike the Expansion-biased populations, Uniform populations were not significantly affected by the presence of recurrent connections (Fig. 5). Both the range in thresholds and the flat trend across motion patterns were well matched to those in Section 3.2. Together these results suggest that the sinusoidal trends in GMP and COM performance may be mediated by the combined contribution of the recurrent interconnections and the bias in preferred motions across the population. 4 D i s c u s s i on Using a biologically constrained computational model in conjunction with human psychophysical performance on two motion pattern tasks we have shown that the visual motion information encoded across an interconnected population of cells responsive to motion patterns, such as those in MSTd, is computationally sufficient to extract perceptual estimates consistent with human performance. Specifically, we have shown that the cyclic trend in psychophysical performance observed across tasks, (a) cannot be reproduced using populations of independently responding units and (b) is dependent, in part, on the presence of an expanding motion bias in the distribution of preferred motions across the neural population. The model’s performance suggests the presence of specific recurrent structures within motion pattern responsive areas, such as MSTd, whose strength varies as a function of the similarity between preferred motion patterns and the distance between receptive field centers. While such structures have not been explicitly examined in MSTd and other higher visual motion areas there is anecdotal support for the presence of inhibitory connections [8]. Together, these results suggest that robust processing of the motion patterns associated with self-motion and optic flow may be mediated, in part, by recurrent structures in extrastriate visual motion areas whose distributions of preferred motions are biased strongly in favor of expanding motions. Acknowledgments This work was supported by National Institutes of Health grant EY-2R01-07861-13 to L.M.V. References [1] Malach, R., Schirman, T., Harel, M., Tootell, R., & Malonek, D., (1997), Cerebral Cortex, 7(4): 386-393. [2] Gilbert, C. D., (1992), Neuron, 9: 1-13. [3] Koechlin, E., Anton, J., & Burnod, Y., (1999), Biological Cybernetics, 80: 2544. [4] Stemmler, M., Usher, M., & Niebur, E., (1995), Science, 269: 1877-1880. [5] Burr, D. C., Morrone, M. C., & Vaina, L. M., (1998), Vision Research, 38(12): 1731-1743. [6] Meese, T. S. & Harris, S. J., (2002), Vision Research, 42: 1073-1080. [7] Tanaka, K. & Saito, H. A., (1989), Journal of Neurophysiology, 62(3): 626-641. [8] Duffy, C. J. & Wurtz, R. H., (1991), Journal of Neurophysiology, 65(6): 13461359. [9] Duffy, C. J. & Wurtz, R. H., (1995), Journal of Neuroscience, 15(7): 5192-5208. [10] Graziano, M. S., Anderson, R. A., & Snowden, R., (1994), Journal of Neuroscience, 14(1): 54-67. [11] Celebrini, S. & Newsome, W., (1994), Journal of Neuroscience, 14(7): 41094124. [12] Celebrini, S. & Newsome, W. T., (1995), Journal of Neurophysiology, 73(2): 437-448. [13] Beardsley, S. A. & Vaina, L. M., (2001), Journal of Computational Neuroscience, 10: 255-280. [14] Matthews, N. & Qian, N., (1999), Vision Research, 39: 2205-2211.
6 0.089639157 115 nips-2003-Linear Dependent Dimensionality Reduction
7 0.085660152 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples
8 0.081075974 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
9 0.080054082 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data
10 0.078897104 43 nips-2003-Bounded Invariance and the Formation of Place Fields
11 0.077466659 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation
12 0.076419346 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution
13 0.075832576 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models
14 0.074799187 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays
15 0.065843448 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
16 0.0656102 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?
17 0.06441202 32 nips-2003-Approximate Expectation Maximization
18 0.063606143 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach
19 0.060048699 128 nips-2003-Minimax Embeddings
20 0.057655506 157 nips-2003-Plasticity Kernels and Temporal Statistics
topicId topicWeight
[(0, -0.201), (1, -0.022), (2, 0.068), (3, 0.02), (4, -0.169), (5, -0.063), (6, 0.18), (7, 0.029), (8, 0.078), (9, -0.164), (10, -0.058), (11, 0.117), (12, -0.06), (13, 0.129), (14, -0.076), (15, -0.007), (16, 0.119), (17, -0.09), (18, -0.007), (19, -0.188), (20, 0.038), (21, -0.032), (22, -0.009), (23, 0.068), (24, 0.031), (25, 0.016), (26, -0.031), (27, 0.043), (28, -0.019), (29, -0.12), (30, 0.082), (31, 0.037), (32, -0.068), (33, 0.055), (34, -0.042), (35, -0.004), (36, -0.007), (37, 0.007), (38, -0.059), (39, -0.007), (40, 0.123), (41, 0.086), (42, -0.068), (43, -0.074), (44, 0.141), (45, 0.025), (46, -0.033), (47, 0.082), (48, 0.055), (49, -0.095)]
simIndex simValue paperId paperTitle
same-paper 1 0.95973241 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence
Author: Amit Gruber, Yair Weiss
Abstract: The problem of “Structure From Motion” is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis to perform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. 1
2 0.62962168 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion
Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler
Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1
3 0.62029874 37 nips-2003-Automatic Annotation of Everyday Movements
Author: Deva Ramanan, David A. Forsyth
Abstract: This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The 3D motion capture data is manually annotated off-line using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate.
4 0.61492014 7 nips-2003-A Functional Architecture for Motion Pattern Processing in MSTd
Author: Scott A. Beardsley, Lucia M. Vaina
Abstract: Psychophysical studies suggest the existence of specialized detectors for component motion patterns (radial, circular, and spiral), that are consistent with the visual motion properties of cells in the dorsal medial superior temporal area (MSTd) of non-human primates. Here we use a biologically constrained model of visual motion processing in MSTd, in conjunction with psychophysical performance on two motion pattern tasks, to elucidate the computational mechanisms associated with the processing of widefield motion patterns encountered during self-motion. In both tasks discrimination thresholds varied significantly with the type of motion pattern presented, suggesting perceptual correlates to the preferred motion bias reported in MSTd. Through the model we demonstrate that while independently responding motion pattern units are capable of encoding information relevant to the visual motion tasks, equivalent psychophysical performance can only be achieved using interconnected neural populations that systematically inhibit non-responsive units. These results suggest the cyclic trends in psychophysical performance may be mediated, in part, by recurrent connections within motion pattern responsive areas whose structure is a function of the similarity in preferred motion patterns and receptive field locations between units. 1 In trod u ction A major challenge in computational neuroscience is to elucidate the architecture of the cortical circuits for sensory processing and their effective role in mediating behavior. In the visual motion system, biologically constrained models are playing an increasingly important role in this endeavor by providing an explanatory substrate linking perceptual performance and the visual properties of single cells. Single cell studies indicate the presence of complex interconnected structures in middle temporal and primary visual cortex whose most basic horizontal connections can impart considerable computational power to the underlying neural population [1, 2]. Combined psychophysical and computational studies support these findings Figure 1: a) Schematic of the graded motion pattern (GMP) task. Discrimination pairs of stimuli were created by perturbing the flow angle (φ) of each 'test' motion (with average dot speed, vav), by ±φp in the stimulus space spanned by radial and circular motions. b) Schematic of the shifted center-of-motion (COM) task. Discrimination pairs of stimuli were created by shifting the COM of the ‘test’ motion to the left and right of a central fixation point. For each motion pattern the COM was shifted within the illusory inner aperture and was never explicitly visible. and suggest that recurrent connections may play a significant role in encoding the visual motion properties associated with various psychophysical tasks [3, 4]. Using this methodology our goal is to elucidate the computational mechanisms associated with the processing of wide-field motion patterns encountered during self-motion. In the human visual motion system, psychophysical studies suggest the existence of specialized detectors for the motion pattern components (i.e., radial, circular and spiral motions) associated with self-motion [5, 6]. Neurophysiological studies reporting neurons sensitive to motion patterns in the dorsal medial superior temporal area (MSTd) support the existence of such mechanisms [7-10], and in conjunction with psychophysical studies suggest a strong link between the patterns of neural activity and motion-based perceptual performance [11, 12]. Through the combination of human psychophysical performance and biologically constrained modeling we investigate the computational role of simple recurrent connections within a population of MSTd-like units. Based on the known visual motion properties within MSTd we ask what neural structures are computationally sufficient to encode psychophysical performance on a series of motion pattern tasks. 2 M o t i o n pa t t e r n d i sc r i m i n a t i o n Using motion pattern stimuli consistent with previous studies [5, 6], we have developed a set of novel psychophysical tasks designed to facilitate a more direct comparison between human perceptual performance and the visual motion properties of cells in MSTd that have been found to underlie the discrimination of motion patterns [11, 12]. The psychophysical tasks, referred to as the graded motion pattern (GMP) and shifted center-of-motion (COM) tasks, are outlined in Fig. 1. Using a temporal two-alternative-forced-choice task we measured discrimination thresholds to global changes in the patterns of complex motion (GMP task), [13], and shifts in the center-of-motion (COM task). Stimuli were presented with central fixation using a constant stimulus paradigm and consisted of dynamic random dot displays presented in a 24o annular region (central 4o removed). In each task, the stimulus duration was randomly perturbed across presentations (440±40 msec) to control for timing-based cues, and dots moved coherently through a radial speed Figure 2: a) GMP thresholds across 8 'test' motions at two mean dot speeds for two observers. Performance varied continuously with thresholds for radial motions (φ=0, 180o) significantly lower than those for circular motions (φ=90,270o), (p<0.001; t(37)=3.39). b) COM thresholds at three mean dot speeds for two observers. As with the GMP task, performance varied continuously with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47). gradient in directions consistent with the global motion pattern presented. Discrimination thresholds were obtained across eight ‘test’ motions corresponding to expansion, contraction, CW and CCW rotation, and the four intermediate spiral motions. To minimize adaptation to specific motion patterns, opposing motions (e.g., expansion/ contraction) were interleaved across paired presentations. 2.1 Results Discrimination thresholds are reported here from a subset of the observer population consisting of three experienced psychophysical observers, one of which was naïve to the purpose of the psychophysical tasks. For each condition, performance is reported as the mean and standard error averaged across 8-12 thresholds. Across observers and dot speeds GMP thresholds followed a distinct trend in the stimulus space [13], with radial motions (expansion/contraction) significantly lower than circular motions (CW/CCW rotation), (p<0.001; t(37)=3.39), (Fig. 2a). While thresholds for the intermediate spiral motions were not significantly different from circular motions (p=0.223, t(60)=0.74), the trends across 'test' motions were well fit within the stimulus space (SB: r>0.82, SC: r>0.77) by sinusoids whose period and phase were 196 ± 10o and -72 ± 20o respectively (Fig. 1a). When the radial speed gradient was removed by randomizing the spatial distribution of dot speeds, threshold performance increased significantly across observers (p<0.05; t(17)=1.91), particularly for circular motions (p<0.005; t(25)=3.31), (data not shown). Such performance suggests a perceptual contribution associated with the presence of the speed gradient and is particularly interesting given the fact that the speed gradient did not contribute computationally relevant information to the task. However, the speed gradient did convey information regarding the integrative structure of the global motion field and as such suggests a preference of the underlying motion mechanisms for spatially structured speed information. Similar trends in performance were observed in the COM task across observers and dot speeds. Discrimination thresholds varied continuously as a function of the 'test' motion with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47) and could be well fit by a sinusoidal trend line (e.g. SB at 3 deg/s: r>0.91, period = 178 ± 10 o and phase = -70 ± 25o), (Fig. 2b). 2.2 A local or global task? The consistency of the cyclic threshold profile in stimuli that restricted the temporal integration of individual dot motions [13], and simultaneously contained all directions of motion, generally argues against a primary role for local motion mechanisms in the psychophysical tasks. While the psychophysical literature has reported a wide variety of “local” motion direction anisotropies whose properties are reminiscent of the results observed here, e.g. [14], all would predict equivalent thresholds for radial and circular motions for a set of uniformly distributed and/or spatially restricted motion direction mechanisms. Together with the computational impact of the speed gradient and psychophysical studies supporting the existence of wide-field motion pattern mechanisms [5, 6], these results suggest that the threshold differences across the GMP and COM tasks may be associated with variations in the computational properties across a series of specialized motion pattern mechanisms. 3 A computational model The similarities between the motion pattern stimuli used to quantify human perception and the visual motion properties of cells in MSTd suggests that MSTd may play a computational role in the psychophysical tasks. To examine this hypothesis, we constructed a population of MSTd-like units whose visual motion properties were consistent with the reported neurophysiology (see [13] for details). Across the population, the distribution of receptive field centers was uniform across polar angle and followed a gamma distribution Γ(5,6) across eccenticity [7]. For each unit, visual motion responses followed a gaussian tuning profile as a function of the stimulus flow angle G( φ), (σi=60±30o; [10]), and the distance of the stimulus COM from the unit’s receptive field center Gsat(xi, yi, σs=19o), Eq. 1, such that its preferred motion response was position invariant to small shifts in the COM [10] and degraded continuously for large shifts [9]. Within the model, simulations were categorized according to the distribution of preferred motions represented across the population (one reported in MSTd and a uniform control). The first distribution simulated an expansion bias in which the density of preferred motions decreased symmetrically from expansions to contraction [10]. The second distribution simulated a uniform preference for all motions and was used as a control to quantify the effects of an expansion bias on psychophysical performance. Throughout the paper we refer to simulations containing these distributions as ‘Expansion-biased’ and ‘Uniform’ respectively. 3.1 Extracting perceptual estimates from the neural code For each stimulus presentation, the ith unit’s response was calculated as the average firing rate, Ri, from the product of its motion pattern and spatial tuning profiles, ( ) Ri = Rmax G min[φ − φi ] ,σ ti G sati (x− xi , y − y i ,σ s ) + P (λ = 12 ) (1) where Rmax is the maximum preferred stimulus response (spikes/s), min[ ] refers to the minimum angular distance between the stimulus flow angle φ and the unit’s preferred motion φi, Gsat is the unit’s spatial tuning profile saturated within the central 5±3o, σti and σs are the standard deviations of the unit’s motion pattern and Figure 3: Model vs. psychophysical performance for independently responding units. Model thresholds are reported as the average (±1 S.E.) across five simulated populations. a) GMP thresholds were highest for contracting motions and lowest for expanding motions across all Expansion-biased populations. b) Comparable trends in performance were observed for COM thresholds. Comparison with the Uniform control simulations in both tasks (2000 units shown here) indicates that thresholds closely followed the distribution of preferred motions simulated within the model. spatial tuning profiles respectively, (xi,yi) is the spatial location of the unit’s receptive field center, (x,y) is the spatial location of the stimulus COM, and P(λ=12) is the background activity simulated as an uncorrelated Poisson process. The psychophysical tasks were simulated using a modified center-of-gravity ^ approach to decode estimates of the stimulus properties, i.e. flow angle (φ ) and ˆ ˆ COM location in the visual field (x, y ) , from the neural population ∑ xi Ri ∑ y i Ri v , i , ∑ φ i Ri ∑ Ri i i i i (xˆ, yˆ , φˆ) = i∑ R (2) v where φi is the unit vector in the stimulus space (Fig. 1a) corresponding to the unit’s preferred motion. For each set of paired stimuli, psychophysical judgments were made by comparing the estimated stimulus properties according to the discrimination criteria, specified in the psychophysical tasks. As with the psychophysical experiments, discrimination thresholds were computed using a leastsquares fit to percent correct performance across constant stimulus levels. 3.2 Simulation 1: Independent neural responses In the first series of simulations, GMP and COM thresholds were quantified across three populations (500, 1000, and 2000 units) of independently responding units for each simulated distribution (Expansion-biased and Uniform). Across simulations, both the range in thresholds and their trends across ‘test’ motions were compared with human psychophysical performance to quantify the effects of population size and an expansion biased preferred motion distribution on model performance. Over the psychophysical range of interest (φp ± 7o), GMP thresholds for contracting motions were at chance across all Expansion-biased populations, (Fig. 3a). While thresholds for expanding motions were generally consistent with those for human observers, those for circular motions remained significantly higher for all but the largest populations. Similar trends in performance were observed for the COM task, (Fig. 3b). Here the range of COM thresholds was well matched with human performance for simulations containing 1000 units, however, the trends across motion patterns remained inconsistent even for the largest populations. Figure 4: Proposed recurrent connection profile between motion pattern units. a) Across the motion pattern space connection strength followed an inverse gaussian profile such that the ith unit (with preferred motion φi) systematically inhibited units with anti-preferred motions centered at 180+φi. b) Across the visual field connection strength followed a difference-of-gaussians profile as a function of the relative distance between receptive field centers such that spatially local units are mutually excitatory (σRe=10o) and more distant units were mutually inhibitory (σRi=80o). For simulations containing a uniform distribution of preferred motions, the threshold range was consistent with human performance on both tasks, however, the trend across motion patterns was generally flat. What variability did occur was due primarily to the discrete sampling of preferred motions across the population. Comparison of the discrimination thresholds for the Expansion-biased and Uniform populations indicates that the trend across thresholds was closely matched to the underlying distributions of preferred motions. This result in due in part to the nearequal weighting of independently responding units and can be explained to a first approximation by the proportional increase in the signal-to-noise ratio across the population as a function of the density of units responsive to a given 'test' motion. 3.3 Simulation 2: An interconnected neural structure In a second series of simulations, we examined the computational effect of adding recurrent connections between units. If the distribution of preferred motions in MSTd is in fact biased towards expansions, as the neurophysiology suggests, it seems unlikely that independent estimates of the visual motion information would be sufficient to yield the threshold profiles observed in the psychophysical tasks. We hypothesize that a simple fixed architecture of excitatory and/or inhibitory connections is sufficient to account for the cyclic trends in discrimination thresholds. Specifically, we propose that a recurrent connection profile whose strength varies as a function of (a) the similarity between preferred motion patterns and (b) the distance between receptive field centers, is computationally sufficient to recover the trends in GMP/COM performance (Fig. 4), wij = S R e − ( xi − x j )2 + ( yi − y j )2 2 2σ R e − SR e 2 − −(min[ φi − φ j ])2 ( xi − x j )2 + ( yi − y j )2 2 2 σ Ri − Sφ e 2σ I2 (3) Figure 5: Model vs. psychophysical performance for populations containing recurrent connections (σI=80o). As the number of units increased for Expansionbiased populations, discrimination thresholds decreased to psychophysical levels and the sinusoidal trend in thresholds emerged for both the (a) GMP and (b) COM tasks. Sinusoidal trends were established for as few as 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were (193.8 ± 11.7o, -70.0 ± 22.6o) and (168.2 ± 13.7o, -118.8 ± 31.8o) for the GMP and COM tasks respectively. where wij is the strength of the recurrent connection between ith and jth units, (xi,yi) and (xj,yj) denote the spatial locations of their receptive field centers, σRe (=10o) and σRi (=80o) together define the spatial extent of a difference-of-gaussians interaction between receptive field centers, and SR and Sφ scale the connection strength. To examine the effects of the spread of motion pattern-specific inhibition and connection strength in the model, σI, Sφ, and SR were considered free parameters. Within the parameter space used to define recurrent connections (i.e., σI, Sφ and SR), Monte Carlo simulations of Expansion-biased model performance (1000 units) yielded regions of high correlation on both tasks (with respect to the psychophysical thresholds, r>0.7) that were consistent across independently simulated populations. Typically these regions were well defined over a broad range such that there was significant overlap between tasks (e.g., for the GMP task (SR=0.03), σI=[45,120o], Sφ=[0.03,0.3] and for the COM task (σI=80o), Sφ = [0.03,0.08], SR = [0.005, 0.04]). Fig. 5 shows averaged threshold performance for simulations of interconnected units drawn from the highly correlated regions of the (σI, Sφ, SR) parameter space. For populations not explicitly examined in the Monte Carlo simulations connection strengths (Sφ, SR) were scaled inversely with population size to maintain an equivalent level of recurrent activity. With the incorporation of recurrent connections, the sinusoidal trend in GMP and COM thresholds emerged for Expansion-biased populations as the number of units increased. In both tasks the cyclic threshold profiles were established for 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were consistent with human performance. Unlike the Expansion-biased populations, Uniform populations were not significantly affected by the presence of recurrent connections (Fig. 5). Both the range in thresholds and the flat trend across motion patterns were well matched to those in Section 3.2. Together these results suggest that the sinusoidal trends in GMP and COM performance may be mediated by the combined contribution of the recurrent interconnections and the bias in preferred motions across the population. 4 D i s c u s s i on Using a biologically constrained computational model in conjunction with human psychophysical performance on two motion pattern tasks we have shown that the visual motion information encoded across an interconnected population of cells responsive to motion patterns, such as those in MSTd, is computationally sufficient to extract perceptual estimates consistent with human performance. Specifically, we have shown that the cyclic trend in psychophysical performance observed across tasks, (a) cannot be reproduced using populations of independently responding units and (b) is dependent, in part, on the presence of an expanding motion bias in the distribution of preferred motions across the neural population. The model’s performance suggests the presence of specific recurrent structures within motion pattern responsive areas, such as MSTd, whose strength varies as a function of the similarity between preferred motion patterns and the distance between receptive field centers. While such structures have not been explicitly examined in MSTd and other higher visual motion areas there is anecdotal support for the presence of inhibitory connections [8]. Together, these results suggest that robust processing of the motion patterns associated with self-motion and optic flow may be mediated, in part, by recurrent structures in extrastriate visual motion areas whose distributions of preferred motions are biased strongly in favor of expanding motions. Acknowledgments This work was supported by National Institutes of Health grant EY-2R01-07861-13 to L.M.V. References [1] Malach, R., Schirman, T., Harel, M., Tootell, R., & Malonek, D., (1997), Cerebral Cortex, 7(4): 386-393. [2] Gilbert, C. D., (1992), Neuron, 9: 1-13. [3] Koechlin, E., Anton, J., & Burnod, Y., (1999), Biological Cybernetics, 80: 2544. [4] Stemmler, M., Usher, M., & Niebur, E., (1995), Science, 269: 1877-1880. [5] Burr, D. C., Morrone, M. C., & Vaina, L. M., (1998), Vision Research, 38(12): 1731-1743. [6] Meese, T. S. & Harris, S. J., (2002), Vision Research, 42: 1073-1080. [7] Tanaka, K. & Saito, H. A., (1989), Journal of Neurophysiology, 62(3): 626-641. [8] Duffy, C. J. & Wurtz, R. H., (1991), Journal of Neurophysiology, 65(6): 13461359. [9] Duffy, C. J. & Wurtz, R. H., (1995), Journal of Neuroscience, 15(7): 5192-5208. [10] Graziano, M. S., Anderson, R. A., & Snowden, R., (1994), Journal of Neuroscience, 14(1): 54-67. [11] Celebrini, S. & Newsome, W., (1994), Journal of Neuroscience, 14(7): 41094124. [12] Celebrini, S. & Newsome, W. T., (1995), Journal of Neurophysiology, 73(2): 437-448. [13] Beardsley, S. A. & Vaina, L. M., (2001), Journal of Computational Neuroscience, 10: 255-280. [14] Matthews, N. & Qian, N., (1999), Vision Research, 39: 2205-2211.
5 0.46888968 119 nips-2003-Local Phase Coherence and the Perception of Blur
Author: Zhou Wang, Eero P. Simoncelli
Abstract: unkown-abstract
6 0.42213437 190 nips-2003-Unsupervised Color Decomposition Of Histologically Stained Tissue Samples
7 0.42045474 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays
8 0.40820488 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation
9 0.37811917 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
10 0.36792284 10 nips-2003-A Low-Power Analog VLSI Visual Collision Detector
11 0.36654189 157 nips-2003-Plasticity Kernels and Temporal Statistics
12 0.34974974 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models
13 0.34190723 115 nips-2003-Linear Dependent Dimensionality Reduction
14 0.33412927 43 nips-2003-Bounded Invariance and the Formation of Place Fields
15 0.32833675 139 nips-2003-Nonlinear Filtering of Electron Micrographs by Means of Support Vector Regression
16 0.32433677 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
17 0.32277134 12 nips-2003-A Model for Learning the Semantics of Pictures
18 0.30666965 195 nips-2003-When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?
19 0.30454451 162 nips-2003-Probabilistic Inference of Speech Signals from Phaseless Spectrograms
20 0.29553831 130 nips-2003-Model Uncertainty in Classical Conditioning
topicId topicWeight
[(0, 0.04), (11, 0.061), (29, 0.021), (30, 0.027), (35, 0.072), (53, 0.09), (66, 0.015), (69, 0.012), (71, 0.054), (76, 0.057), (85, 0.064), (87, 0.289), (91, 0.104), (99, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.80348438 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence
Author: Amit Gruber, Yair Weiss
Abstract: The problem of “Structure From Motion” is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis to perform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. 1
2 0.66882986 78 nips-2003-Gaussian Processes in Reinforcement Learning
Author: Malte Kuss, Carl E. Rasmussen
Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.
3 0.53888285 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion
Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler
Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1
4 0.53412628 107 nips-2003-Learning Spectral Clustering
Author: Francis R. Bach, Michael I. Jordan
Abstract: Spectral clustering refers to a class of techniques which rely on the eigenstructure of a similarity matrix to partition points into disjoint clusters with points in the same cluster having high similarity and points in different clusters having low similarity. In this paper, we derive a new cost function for spectral clustering based on a measure of error between a given partition and a solution of the spectral relaxation of a minimum normalized cut problem. Minimizing this cost function with respect to the partition leads to a new spectral clustering algorithm. Minimizing with respect to the similarity matrix leads to an algorithm for learning the similarity matrix. We develop a tractable approximation of our cost function that is based on the power method of computing eigenvectors. 1
5 0.53377593 12 nips-2003-A Model for Learning the Semantics of Pictures
Author: Victor Lavrenko, R. Manmatha, Jiwoon Jeon
Abstract: We propose an approach to learning the semantics of images which allows us to automatically annotate an image with keywords and to retrieve images based on text queries. We do this using a formalism that models the generation of annotated images. We assume that every image is divided into regions, each described by a continuous-valued feature vector. Given a training set of images with annotations, we compute a joint probabilistic model of image features and words which allow us to predict the probability of generating a word given the image regions. This may be used to automatically annotate and retrieve images given a word as a query. Experiments show that our model significantly outperforms the best of the previously reported results on the tasks of automatic image annotation and retrieval. 1
6 0.53315562 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
7 0.53292537 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games
8 0.5323807 113 nips-2003-Learning with Local and Global Consistency
9 0.52960134 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
10 0.52953672 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
11 0.52947199 125 nips-2003-Maximum Likelihood Estimation of a Stochastic Integrate-and-Fire Neural Model
12 0.52927846 158 nips-2003-Policy Search by Dynamic Programming
13 0.52873492 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
14 0.52846831 189 nips-2003-Tree-structured Approximations by Expectation Propagation
15 0.52778411 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms
16 0.52733719 168 nips-2003-Salient Boundary Detection using Ratio Contour
17 0.52715623 30 nips-2003-Approximability of Probability Distributions
18 0.52610224 73 nips-2003-Feature Selection in Clustering Problems
19 0.52503204 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
20 0.52426565 81 nips-2003-Geometric Analysis of Constrained Curves