nips nips2003 nips2003-106 knowledge-graph by maker-knowledge-mining

106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion


Source: pdf

Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler

Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. [sent-7, score-0.366]

2 We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. [sent-8, score-0.723]

3 We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. [sent-10, score-0.328]

4 Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. [sent-11, score-0.751]

5 We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. [sent-12, score-0.205]

6 1 Introduction We can generally think of a non-rigid object’s motion as consisting of a rigid component plus a non-rigid deformation. [sent-13, score-0.441]

7 turning left or right) while deforming (due to changing facial expressions). [sent-16, score-0.074]

8 If we view this non-rigid motion from a single camera view, the shape and motion are ambiguous: for any hypothetical rigid motion, a corresponding 3D shape can be devised that fits the image observations. [sent-17, score-1.35]

9 Even if camera calibration and rigid motion are known, a depth ambiguity remains. [sent-18, score-0.537]

10 Despite this apparent ambiguity, humans interpret the shape and motion of non-rigid objects with relative ease; clearly, more assumptions about the nature of the deformations are used by humans. [sent-19, score-0.716]

11 We argue that, by assuming that the 3D shape is drawn from some non-uniform PDF, we can reconstruct 3D non-rigid shape from 2D motion unambiguously. [sent-21, score-0.858]

12 We demonstrate this approach by modeling the PDF as a Gaussian distribution (more specifically, as a factor analyzer), and describe a novel EM algorithm for simultaneously learning the 3D shapes, the rigid motion, and the parameters of the Gaussian. [sent-24, score-0.251]

13 We also generalize this approach by modeling the shape as a Linear Dynamical System (LDS). [sent-25, score-0.32]

14 Our algorithm can be thought of as a structure-from-motion (SFM) algorithm with a learning component: we assume that a set of labeled point tracks have been extracted from a raw video sequence, and the goal is to estimate 3D shape, camera motion, and a deformation PDF. [sent-26, score-0.292]

15 Our algorithm is well-suited to reconstruction in the case of missing data, such as due to occlusions and other tracking outliers. [sent-27, score-0.176]

16 However, we show significant improvements over previous algorithms even when all tracks are visible. [sent-28, score-0.096]

17 Our missing-data technique can be viewed as generalizing previous algorithms for SFM with missing data (e. [sent-31, score-0.104]

18 In work concurrent to our own, Gruber and Weiss [7] also apply EM to SFM; their work focuses on the rigid case with known noise, and applies temporal smoothing to rigid motion parameters rather than shape. [sent-34, score-0.686]

19 2 Deformation, Shape, and Ambiguities We now formalize the problem of interpreting non-rigid shape and motion. [sent-35, score-0.299]

20 We assume that a scene consists of J scene points sj,t , where j is an index over scene points, and t is an index over image frames. [sent-36, score-0.115]

21 , pJ,t ] and the 3D shape into a 3 × J matrix St = [s1,t , . [sent-41, score-0.317]

22 sJ,t ] gives the equivalent form Pt = Rt (St + Dt ) + N (2) where Dt = dt 1T contains J copies of the translation matrix dt . [sent-44, score-0.338]

23 Note that rigid motion of the object and rigid motion of the camera are interchangeable. [sent-45, score-0.917]

24 Our goal is to estimate the time-varying shape St and motion (Rt , Dt ) from the observed projections Pt . [sent-46, score-0.599]

25 Without any constraints on the 3D shape sj,t , this problem is extremely ambiguous [11]. [sent-47, score-0.322]

26 For example, given a shape St and motion (Rt , Dt ) and an arbitrary orthonormal matrix At , we can produce a new shape At St and motion (Rt A−1 , At Dt ) that together give identical 2D t projections as the original model, even if a different matrix At is applied in every frame. [sent-48, score-1.173]

27 Together, S and Vk are referred to as the shape basis. [sent-53, score-0.299]

28 Equivalently, the space of possible shapes may be described by linear combinations of basis shapes, by selecting K + 1 linearly independent points in the space. [sent-54, score-0.155]

29 However, this model contains ambiguities, since, for some 3D shape and motion, there will still be ways to combine different weights and a different rigid motion to produce the same 3D shape. [sent-57, score-0.754]

30 Since we are performing a 2D projection, an additional depth ambiguity occurs. [sent-58, score-0.073]

31 For example, whenever there exist weights wk such that Rt Vk wk = 0 and Vk wk = 0, these weights define a linear space of distinct 3D shapes (with weights zt,k + αwk ) that give identical 2D projections. [sent-59, score-0.329]

32 (When the number of basis shapes is small, these ambiguities are rarer and may not make a dramatic impact. [sent-60, score-0.185]

33 As the number of basis shapes grows, the problem is more likely to become unconstrained, eventually approaching the totally unconstrained case described above. [sent-62, score-0.173]

34 The ambiguity and overfitting may be resolved by introducing regularization terms that penalize large deformations, and then solving for 3D shape in a least-squares sense. [sent-63, score-0.415]

35 Soatto ¯ and Yezzi [11] use a regularization term equivalent to t ||St − S||2 . [sent-64, score-0.079]

36 However, this regularization may be too restrictive in many cases and too loose in others. [sent-65, score-0.061]

37 For example, when tracking a face, deformations of the jaw are much more likely than deformations of the nose. [sent-66, score-0.352]

38 Moreover, the weight for this regularization term must be specified by hand 1 . [sent-67, score-0.079]

39 Alternatively, Brand [3] proposes placing a user-specified Gaussian prior on the deformation basis and a prior on the deformations based on an initial estimate. [sent-68, score-0.337]

40 Suppose we assume that shapes St are drawn from a probabilitity distribution p(St |θ) with known parameters θ. [sent-70, score-0.132]

41 The non-rigid shape and motion are estimated by maximizing p(S, R, D|P, θ, σ 2 ) ∝ p(P|S, R, D, θ, σ 2 )p(S, R, D|θ, σ 2 ) ∝ t p(Pt |St , Rt , Dt , σ 2 )p(St |θ) (4) (5) assuming uniform priors on Rt , and Dt . [sent-71, score-0.559]

42 The projection likelihood p(Pt |St , Rt , Dt , σ 2 ) is a spherical Gaussian (Equation 2). [sent-72, score-0.073]

43 The negative log-posterior − ln p(S, R, D, θ|P) corresponds to a standard least-squares formulation for SFM, plus a regularization term − ln p(St |θ). [sent-73, score-0.152]

44 If we set p(St |θ) to be a spherical Gaussian with ¯ a specified variance (e. [sent-75, score-0.046]

45 p(St |θ) = N (S; σ 2 I)) then we obtain the simple regularization used previously — the problem is constrained, but by a weak regularization term with a user-specified weight (variance). [sent-77, score-0.14]

46 Our approach is to simultaneously estimate the rigid motion and learn the shape PDF. [sent-79, score-0.77]

47 In other words, we estimate R, D, θ, and σ 2 to maximize p(R, D, θ, σ 2 |P) = p(R, D, θ, S, σ 2 |P)dS (6) ∝ p(P|R, D, S, σ 2 )p(S|θ)dS (7) The key idea is that we can estimate shape and motion while learning the parameters of the PDF p(S|θ) over shapes. [sent-80, score-0.623]

48 (Our method marginalizes over the unknown shapes S t , rather than solving for estimates of shape. [sent-81, score-0.129]

49 This means that the regularization terms need not be set manually, and can thus be much more sophisticated and have many more parameters than previous methods. [sent-85, score-0.1]

50 In practice, we find that this leads to significantly improved reconstructions over user-specified shape PDFs. [sent-86, score-0.354]

51 We demonstrate the approach by modeling the shape PDF as a general Gaussian. [sent-87, score-0.32]

52 We later generalize this approach to model shape as an LDS, leading to temporal correlations in the shape PDF. [sent-90, score-0.649]

53 One way to see this is to consider the terms of − ln p(R, D, θ|P) in the case of the Gaussian prior PDF: in addition to the datafitting term and the regularization term, there is a “normalization constant” term of T ln |φ|, where T is the number of frames and φ is the covariance of the shape PDF. [sent-93, score-0.452]

54 Hence, the optimal solution trades-off between (a) fitting the projection data, (b) fitting the shapes St to the shape PDF (regularizing), and (c) minimizing the variance of the shape PDF as much as possible. [sent-95, score-0.783]

55 3 Learning a Gaussian shape distribution We now describe our algorithm in detail. [sent-97, score-0.299]

56 maximize p(Rt , Dt , S, V, σ 2 |Pt ) ∝ ¯ ¯ p(Pt |Rt , Dt , S, V, σ 2 ) = t p(Pt , zt |Rt , Dt , S, V, σ 2 )p(zt )dzt t 3. [sent-102, score-0.244]

57 First, define ft to be the vector of point tracks ft = vec(Pt ) = [x1,t , y1,t , . [sent-105, score-0.452]

58 Note that ft is the same variable as Pt , but written as a vector rather than a matrix2 . [sent-109, score-0.178]

59 Expanding ft we have ft = vec(Pt ) = vec(Rt St + Rt Dt + Nt ) (9) K ¯ vec(Rt Vk )zk,t + vec(Rt S) + vec(Rt Dt ) + vec(Nt ) = (10) k=1 = Mt zt + ¯t + Tt + vec(Nt ) f (11) ¯t = vec(Rt S) and ¯ where Mt = [vec(Rt V1 ), . [sent-110, score-0.6]

60 Note that the marginal t t distribution over shape — as well as its projection — is Gaussian: T p(ft |ψ) = p(ft |zt , ψ)p(zt |ψ)dzt = N (ft |Tt + ¯t ; Mt MT + σ 2 I) f t ¯ where ψ encapsulates the model parameters S, Vk , Rt , Dt and σ 2 . [sent-123, score-0.372]

61 We can also rewrite the z t ˜z shape equation as vec(Rt St ) = (I ⊗ Rt )vec(St ) = (I ⊗ Rt )H˜t , by using the identity T vec(ABC) = (C ⊗ A)vec(B). [sent-128, score-0.333]

62 2 The vec operator stacks the columns of a matrix into a vector, e. [sent-132, score-0.416]

63 Given a set of point tracks P (equivalently, f ), we can estimate the motion and deformation model using EM; the algorithm is similar to EM for factor analysis [6]. [sent-136, score-0.53]

64 We estimate the distribution over zt given the current motion and shape estimates, for each frame t. [sent-138, score-0.861]

65 Defining q(zt ) to be the distribution to be estimated in frame t, it can be computed as q(zt ) β = p(zt |ft , ψ) = N (zt |β(ft − ¯t − Tt ); I − βMt ) f (14) (15) = MT (Mt MT + σ 2 I)−1 t t (16) The matrix inversion lemma may be used to accelerate the computation of β. [sent-139, score-0.055]

66 We define the expectations µt ≡ Eq [zt ] and φt ≡ Eq [zt zT ] and compute them as: t µt φt = β(ft − ¯t − Tt ) f = I − βMt + (17) µ t µT t (18) µT t φt 1 µt ˜ We also define µt = E[˜t ] = [1, µT ]T and φ = E[˜t˜T ] = ˜ z z zt t . [sent-140, score-0.244]

67 We estimate the motion parameters by minimizing Q(P, ψ) = Eq(z1 ),. [sent-142, score-0.303]

68 ,q(zT ) [− log p(P|ψ)] 2 = t 2 Eq(zt ) [||ft − vec(Rt St ) − Tt )|| /(2σ )] + 2JT log √ (19) 2πσ 2 (20) ¯ This function is quadratic in the shape parameters (S, Vk ), in the rigid motion parameters (Rt , Tt ) and in the gaussian noise variance parameter σ 2 . [sent-145, score-0.85]

69 J j Vkj µtk )) (23) ((ftj − tt )˜T HT )|| µt ˜ j (24) k Since the system of equations in Equation 21 is large and sparse, we solve it using conjugate gradient. [sent-154, score-0.192]

70 If any of the point tracks are missing, they are also filled in during the M-step. [sent-157, score-0.096]

71 Let f t∗ denote the elements of a frame of tracking data that are not observed; they are estimated as ft∗ ← ¯t∗ + M∗ µt + T∗ f (25) t t where (∗ ) indicates rows that correspond to the missing data. [sent-158, score-0.179]

72 ¯ Once EM has converged, the maximum likelihood shapes may be computed as S t = S + Vk µt,k . [sent-160, score-0.11]

73 k 4 Learning dynamics Many real deformations contain some temporal smoothness. [sent-161, score-0.208]

74 We model temporal behavior of deformations using a Linear Dynamical System (LDS). [sent-162, score-0.208]

75 In this model, Equation 8 is replaced with z0 zt ∼ N (0; I) = Φzt−1 + n, n ∼ N (0; Q) (26) (27) where Φ is an arbitrary unknown K × K matrix, and Q is a K × K covariance matrix. [sent-163, score-0.244]

76 In the M-step, we apply the same shape and motion updates as in the previous section; additionally, we update Φ and Q in the same way as in Shumway and Stoffer’s algorithm. [sent-167, score-0.559]

77 In other words, this reconstruction algorithm learns 3D shape with temporal smoothing, while learning the temporal smoothness term. [sent-168, score-0.477]

78 ILSQ optimizes Equations 2 and 3 by alternating optimization of each of the unknowns (rotation, basis shapes, and coefficients). [sent-171, score-0.065]

79 For both algorithms, the rigid motion is initialized by Tomasi-Kanade [12], and random initialization of the shape basis and coefficients. [sent-174, score-0.768]

80 We tested the algorithms on a synthetic animation of a deforming shark in Figure 1. [sent-177, score-0.151]

81 The motion consists of rigid rotation plus deformations generated by K = 2 basis shapes. [sent-178, score-0.684]

82 4 By enforcing temporal smoothness 3 In our experience, ILSQ always performs better than the algorithm of Bregler et al. [sent-182, score-0.072]

83 Each algorithm was given 2D tracks as inputs; reconstructions are shown here from a different viewpoint than the inputs to the algorithm. [sent-186, score-0.151]

84 Ground-truth features are shown as blue dots; reconstructions are red circles. [sent-187, score-0.055]

85 Note that, although ILSQ gets approximately the correct shape in most cases, it misses details, whereas EM gives very accurate results most of the time. [sent-188, score-0.299]

86 for t=148) are corrected by EM-LDS through temporal smoothing. [sent-191, score-0.051]

87 EM-LDS was able to correct some of the deformation errors of EM-Gaussian. [sent-192, score-0.135]

88 The average Z error for EM-LDS on the shark sequence after 100 EM iterations is 1. [sent-193, score-0.069]

89 Videos of the shark reconstructions and the Matlab software used for these experiments are available from http://movement. [sent-195, score-0.124]

90 In highly-constrained cases — low-rank motion, no image noise, and no missing data — ILSQ achieved reasonably good results. [sent-198, score-0.132]

91 Figure 2(a) shows the results of reconstruction with missing data; the ILSQ results degrade much faster as the percentage of missing data increases. [sent-201, score-0.263]

92 Figure 2(b) shows the effect of changing the complexity of the model, while leaving the complexity of the data fixed. [sent-202, score-0.064]

93 6 Discussion and future work We have described an approach to non-rigid structure-from-motion with a probabilistic deformation model, and demonstrated its usefulness in the case of a Gaussian deformation model. [sent-204, score-0.27]

94 We expect that more sophisticated distributions can be used to model more complex non-rigid shapes in video. [sent-205, score-0.127]

95 5 1 0 0 10 20 30 % missing data (a) 40 50 0 1 2 3 4 5 6 K (b) Figure 2: Error comparison between ILSQ and EM-Gaussian on random basis shapes. [sent-210, score-0.149]

96 As the percentage of missing feature tracks per frame increases, ILSQ degenerates much more rapidly than EM-Gaussian. [sent-212, score-0.258]

97 separating rigid from non-rigid motion in fully-observed data, as in Soatto and Yezzi’s work [11]. [sent-214, score-0.424]

98 Thanks to Hrishikesh Deshpande for assisting with an early version of this project, and to Stefano Soatto for discussing deformation ambiguities. [sent-217, score-0.135]

99 An approach to time series smoothing and forecasting using the em algorithm. [sent-282, score-0.11]

100 Shape and motion from image streams under orthography: A factorization method. [sent-295, score-0.305]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vec', 0.398), ('rt', 0.348), ('shape', 0.299), ('motion', 0.26), ('ilsq', 0.26), ('zt', 0.244), ('tt', 0.192), ('ft', 0.178), ('st', 0.165), ('rigid', 0.164), ('deformations', 0.157), ('mt', 0.149), ('dt', 0.146), ('pdf', 0.137), ('deformation', 0.135), ('sfm', 0.12), ('vk', 0.111), ('shapes', 0.11), ('missing', 0.104), ('pt', 0.101), ('tracks', 0.096), ('em', 0.085), ('soatto', 0.075), ('lds', 0.069), ('shark', 0.069), ('shumway', 0.069), ('regularization', 0.061), ('ambiguity', 0.055), ('reconstructions', 0.055), ('deforming', 0.052), ('ftj', 0.052), ('stoffer', 0.052), ('torresani', 0.052), ('yezzi', 0.052), ('bregler', 0.051), ('projection', 0.051), ('temporal', 0.051), ('eq', 0.048), ('basis', 0.045), ('wk', 0.042), ('rotation', 0.041), ('camera', 0.04), ('tracking', 0.038), ('frame', 0.037), ('nt', 0.036), ('cb', 0.035), ('dzt', 0.035), ('reconstruction', 0.034), ('cvpr', 0.031), ('gaussian', 0.031), ('weights', 0.031), ('ambiguities', 0.03), ('animation', 0.03), ('underconstrained', 0.03), ('vectorized', 0.03), ('ht', 0.029), ('scene', 0.029), ('object', 0.029), ('noise', 0.028), ('image', 0.028), ('translation', 0.028), ('ln', 0.028), ('analyzer', 0.027), ('gruber', 0.027), ('hertzmann', 0.027), ('ah', 0.027), ('simultaneously', 0.026), ('orthographic', 0.026), ('morphable', 0.026), ('smoothing', 0.025), ('tting', 0.025), ('lt', 0.024), ('variance', 0.024), ('desire', 0.023), ('ambiguous', 0.023), ('spherical', 0.022), ('parameters', 0.022), ('changing', 0.022), ('dynamical', 0.022), ('estimate', 0.021), ('complexity', 0.021), ('percentage', 0.021), ('smoothness', 0.021), ('ds', 0.021), ('modeling', 0.021), ('learns', 0.021), ('optimizes', 0.02), ('projections', 0.019), ('estimates', 0.019), ('matrix', 0.018), ('depth', 0.018), ('unconstrained', 0.018), ('term', 0.018), ('factor', 0.018), ('equation', 0.018), ('medical', 0.017), ('factorization', 0.017), ('sophisticated', 0.017), ('plus', 0.017), ('rewrite', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler

Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1

2 0.21720676 81 nips-2003-Geometric Analysis of Constrained Curves

Author: Anuj Srivastava, Washington Mio, Xiuwen Liu, Eric Klassen

Abstract: We present a geometric approach to statistical shape analysis of closed curves in images. The basic idea is to specify a space of closed curves satisfying given constraints, and exploit the differential geometry of this space to solve optimization and inference problems. We demonstrate this approach by: (i) defining and computing statistics of observed shapes, (ii) defining and learning a parametric probability model on shape space, and (iii) designing a binary hypothesis test on this space. 1

3 0.19427334 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

Author: Amit Gruber, Yair Weiss

Abstract: The problem of “Structure From Motion” is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis to perform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. 1

4 0.12453744 7 nips-2003-A Functional Architecture for Motion Pattern Processing in MSTd

Author: Scott A. Beardsley, Lucia M. Vaina

Abstract: Psychophysical studies suggest the existence of specialized detectors for component motion patterns (radial, circular, and spiral), that are consistent with the visual motion properties of cells in the dorsal medial superior temporal area (MSTd) of non-human primates. Here we use a biologically constrained model of visual motion processing in MSTd, in conjunction with psychophysical performance on two motion pattern tasks, to elucidate the computational mechanisms associated with the processing of widefield motion patterns encountered during self-motion. In both tasks discrimination thresholds varied significantly with the type of motion pattern presented, suggesting perceptual correlates to the preferred motion bias reported in MSTd. Through the model we demonstrate that while independently responding motion pattern units are capable of encoding information relevant to the visual motion tasks, equivalent psychophysical performance can only be achieved using interconnected neural populations that systematically inhibit non-responsive units. These results suggest the cyclic trends in psychophysical performance may be mediated, in part, by recurrent connections within motion pattern responsive areas whose structure is a function of the similarity in preferred motion patterns and receptive field locations between units. 1 In trod u ction A major challenge in computational neuroscience is to elucidate the architecture of the cortical circuits for sensory processing and their effective role in mediating behavior. In the visual motion system, biologically constrained models are playing an increasingly important role in this endeavor by providing an explanatory substrate linking perceptual performance and the visual properties of single cells. Single cell studies indicate the presence of complex interconnected structures in middle temporal and primary visual cortex whose most basic horizontal connections can impart considerable computational power to the underlying neural population [1, 2]. Combined psychophysical and computational studies support these findings Figure 1: a) Schematic of the graded motion pattern (GMP) task. Discrimination pairs of stimuli were created by perturbing the flow angle (φ) of each 'test' motion (with average dot speed, vav), by ±φp in the stimulus space spanned by radial and circular motions. b) Schematic of the shifted center-of-motion (COM) task. Discrimination pairs of stimuli were created by shifting the COM of the ‘test’ motion to the left and right of a central fixation point. For each motion pattern the COM was shifted within the illusory inner aperture and was never explicitly visible. and suggest that recurrent connections may play a significant role in encoding the visual motion properties associated with various psychophysical tasks [3, 4]. Using this methodology our goal is to elucidate the computational mechanisms associated with the processing of wide-field motion patterns encountered during self-motion. In the human visual motion system, psychophysical studies suggest the existence of specialized detectors for the motion pattern components (i.e., radial, circular and spiral motions) associated with self-motion [5, 6]. Neurophysiological studies reporting neurons sensitive to motion patterns in the dorsal medial superior temporal area (MSTd) support the existence of such mechanisms [7-10], and in conjunction with psychophysical studies suggest a strong link between the patterns of neural activity and motion-based perceptual performance [11, 12]. Through the combination of human psychophysical performance and biologically constrained modeling we investigate the computational role of simple recurrent connections within a population of MSTd-like units. Based on the known visual motion properties within MSTd we ask what neural structures are computationally sufficient to encode psychophysical performance on a series of motion pattern tasks. 2 M o t i o n pa t t e r n d i sc r i m i n a t i o n Using motion pattern stimuli consistent with previous studies [5, 6], we have developed a set of novel psychophysical tasks designed to facilitate a more direct comparison between human perceptual performance and the visual motion properties of cells in MSTd that have been found to underlie the discrimination of motion patterns [11, 12]. The psychophysical tasks, referred to as the graded motion pattern (GMP) and shifted center-of-motion (COM) tasks, are outlined in Fig. 1. Using a temporal two-alternative-forced-choice task we measured discrimination thresholds to global changes in the patterns of complex motion (GMP task), [13], and shifts in the center-of-motion (COM task). Stimuli were presented with central fixation using a constant stimulus paradigm and consisted of dynamic random dot displays presented in a 24o annular region (central 4o removed). In each task, the stimulus duration was randomly perturbed across presentations (440±40 msec) to control for timing-based cues, and dots moved coherently through a radial speed Figure 2: a) GMP thresholds across 8 'test' motions at two mean dot speeds for two observers. Performance varied continuously with thresholds for radial motions (φ=0, 180o) significantly lower than those for circular motions (φ=90,270o), (p<0.001; t(37)=3.39). b) COM thresholds at three mean dot speeds for two observers. As with the GMP task, performance varied continuously with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47). gradient in directions consistent with the global motion pattern presented. Discrimination thresholds were obtained across eight ‘test’ motions corresponding to expansion, contraction, CW and CCW rotation, and the four intermediate spiral motions. To minimize adaptation to specific motion patterns, opposing motions (e.g., expansion/ contraction) were interleaved across paired presentations. 2.1 Results Discrimination thresholds are reported here from a subset of the observer population consisting of three experienced psychophysical observers, one of which was naïve to the purpose of the psychophysical tasks. For each condition, performance is reported as the mean and standard error averaged across 8-12 thresholds. Across observers and dot speeds GMP thresholds followed a distinct trend in the stimulus space [13], with radial motions (expansion/contraction) significantly lower than circular motions (CW/CCW rotation), (p<0.001; t(37)=3.39), (Fig. 2a). While thresholds for the intermediate spiral motions were not significantly different from circular motions (p=0.223, t(60)=0.74), the trends across 'test' motions were well fit within the stimulus space (SB: r>0.82, SC: r>0.77) by sinusoids whose period and phase were 196 ± 10o and -72 ± 20o respectively (Fig. 1a). When the radial speed gradient was removed by randomizing the spatial distribution of dot speeds, threshold performance increased significantly across observers (p<0.05; t(17)=1.91), particularly for circular motions (p<0.005; t(25)=3.31), (data not shown). Such performance suggests a perceptual contribution associated with the presence of the speed gradient and is particularly interesting given the fact that the speed gradient did not contribute computationally relevant information to the task. However, the speed gradient did convey information regarding the integrative structure of the global motion field and as such suggests a preference of the underlying motion mechanisms for spatially structured speed information. Similar trends in performance were observed in the COM task across observers and dot speeds. Discrimination thresholds varied continuously as a function of the 'test' motion with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47) and could be well fit by a sinusoidal trend line (e.g. SB at 3 deg/s: r>0.91, period = 178 ± 10 o and phase = -70 ± 25o), (Fig. 2b). 2.2 A local or global task? The consistency of the cyclic threshold profile in stimuli that restricted the temporal integration of individual dot motions [13], and simultaneously contained all directions of motion, generally argues against a primary role for local motion mechanisms in the psychophysical tasks. While the psychophysical literature has reported a wide variety of “local” motion direction anisotropies whose properties are reminiscent of the results observed here, e.g. [14], all would predict equivalent thresholds for radial and circular motions for a set of uniformly distributed and/or spatially restricted motion direction mechanisms. Together with the computational impact of the speed gradient and psychophysical studies supporting the existence of wide-field motion pattern mechanisms [5, 6], these results suggest that the threshold differences across the GMP and COM tasks may be associated with variations in the computational properties across a series of specialized motion pattern mechanisms. 3 A computational model The similarities between the motion pattern stimuli used to quantify human perception and the visual motion properties of cells in MSTd suggests that MSTd may play a computational role in the psychophysical tasks. To examine this hypothesis, we constructed a population of MSTd-like units whose visual motion properties were consistent with the reported neurophysiology (see [13] for details). Across the population, the distribution of receptive field centers was uniform across polar angle and followed a gamma distribution Γ(5,6) across eccenticity [7]. For each unit, visual motion responses followed a gaussian tuning profile as a function of the stimulus flow angle G( φ), (σi=60±30o; [10]), and the distance of the stimulus COM from the unit’s receptive field center Gsat(xi, yi, σs=19o), Eq. 1, such that its preferred motion response was position invariant to small shifts in the COM [10] and degraded continuously for large shifts [9]. Within the model, simulations were categorized according to the distribution of preferred motions represented across the population (one reported in MSTd and a uniform control). The first distribution simulated an expansion bias in which the density of preferred motions decreased symmetrically from expansions to contraction [10]. The second distribution simulated a uniform preference for all motions and was used as a control to quantify the effects of an expansion bias on psychophysical performance. Throughout the paper we refer to simulations containing these distributions as ‘Expansion-biased’ and ‘Uniform’ respectively. 3.1 Extracting perceptual estimates from the neural code For each stimulus presentation, the ith unit’s response was calculated as the average firing rate, Ri, from the product of its motion pattern and spatial tuning profiles, ( ) Ri = Rmax G min[φ − φi ] ,σ ti G sati (x− xi , y − y i ,σ s ) + P (λ = 12 ) (1) where Rmax is the maximum preferred stimulus response (spikes/s), min[ ] refers to the minimum angular distance between the stimulus flow angle φ and the unit’s preferred motion φi, Gsat is the unit’s spatial tuning profile saturated within the central 5±3o, σti and σs are the standard deviations of the unit’s motion pattern and Figure 3: Model vs. psychophysical performance for independently responding units. Model thresholds are reported as the average (±1 S.E.) across five simulated populations. a) GMP thresholds were highest for contracting motions and lowest for expanding motions across all Expansion-biased populations. b) Comparable trends in performance were observed for COM thresholds. Comparison with the Uniform control simulations in both tasks (2000 units shown here) indicates that thresholds closely followed the distribution of preferred motions simulated within the model. spatial tuning profiles respectively, (xi,yi) is the spatial location of the unit’s receptive field center, (x,y) is the spatial location of the stimulus COM, and P(λ=12) is the background activity simulated as an uncorrelated Poisson process. The psychophysical tasks were simulated using a modified center-of-gravity ^ approach to decode estimates of the stimulus properties, i.e. flow angle (φ ) and ˆ ˆ COM location in the visual field (x, y ) , from the neural population   ∑ xi Ri ∑ y i Ri v  , i , ∑ φ i Ri  ∑ Ri i   i i   i (xˆ, yˆ , φˆ) =  i∑ R  (2) v where φi is the unit vector in the stimulus space (Fig. 1a) corresponding to the unit’s preferred motion. For each set of paired stimuli, psychophysical judgments were made by comparing the estimated stimulus properties according to the discrimination criteria, specified in the psychophysical tasks. As with the psychophysical experiments, discrimination thresholds were computed using a leastsquares fit to percent correct performance across constant stimulus levels. 3.2 Simulation 1: Independent neural responses In the first series of simulations, GMP and COM thresholds were quantified across three populations (500, 1000, and 2000 units) of independently responding units for each simulated distribution (Expansion-biased and Uniform). Across simulations, both the range in thresholds and their trends across ‘test’ motions were compared with human psychophysical performance to quantify the effects of population size and an expansion biased preferred motion distribution on model performance. Over the psychophysical range of interest (φp ± 7o), GMP thresholds for contracting motions were at chance across all Expansion-biased populations, (Fig. 3a). While thresholds for expanding motions were generally consistent with those for human observers, those for circular motions remained significantly higher for all but the largest populations. Similar trends in performance were observed for the COM task, (Fig. 3b). Here the range of COM thresholds was well matched with human performance for simulations containing 1000 units, however, the trends across motion patterns remained inconsistent even for the largest populations. Figure 4: Proposed recurrent connection profile between motion pattern units. a) Across the motion pattern space connection strength followed an inverse gaussian profile such that the ith unit (with preferred motion φi) systematically inhibited units with anti-preferred motions centered at 180+φi. b) Across the visual field connection strength followed a difference-of-gaussians profile as a function of the relative distance between receptive field centers such that spatially local units are mutually excitatory (σRe=10o) and more distant units were mutually inhibitory (σRi=80o). For simulations containing a uniform distribution of preferred motions, the threshold range was consistent with human performance on both tasks, however, the trend across motion patterns was generally flat. What variability did occur was due primarily to the discrete sampling of preferred motions across the population. Comparison of the discrimination thresholds for the Expansion-biased and Uniform populations indicates that the trend across thresholds was closely matched to the underlying distributions of preferred motions. This result in due in part to the nearequal weighting of independently responding units and can be explained to a first approximation by the proportional increase in the signal-to-noise ratio across the population as a function of the density of units responsive to a given 'test' motion. 3.3 Simulation 2: An interconnected neural structure In a second series of simulations, we examined the computational effect of adding recurrent connections between units. If the distribution of preferred motions in MSTd is in fact biased towards expansions, as the neurophysiology suggests, it seems unlikely that independent estimates of the visual motion information would be sufficient to yield the threshold profiles observed in the psychophysical tasks. We hypothesize that a simple fixed architecture of excitatory and/or inhibitory connections is sufficient to account for the cyclic trends in discrimination thresholds. Specifically, we propose that a recurrent connection profile whose strength varies as a function of (a) the similarity between preferred motion patterns and (b) the distance between receptive field centers, is computationally sufficient to recover the trends in GMP/COM performance (Fig. 4), wij = S R e − ( xi − x j )2 + ( yi − y j )2 2 2σ R e − SR e 2 − −(min[ φi − φ j ])2 ( xi − x j )2 + ( yi − y j )2 2 2 σ Ri − Sφ e 2σ I2 (3) Figure 5: Model vs. psychophysical performance for populations containing recurrent connections (σI=80o). As the number of units increased for Expansionbiased populations, discrimination thresholds decreased to psychophysical levels and the sinusoidal trend in thresholds emerged for both the (a) GMP and (b) COM tasks. Sinusoidal trends were established for as few as 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were (193.8 ± 11.7o, -70.0 ± 22.6o) and (168.2 ± 13.7o, -118.8 ± 31.8o) for the GMP and COM tasks respectively. where wij is the strength of the recurrent connection between ith and jth units, (xi,yi) and (xj,yj) denote the spatial locations of their receptive field centers, σRe (=10o) and σRi (=80o) together define the spatial extent of a difference-of-gaussians interaction between receptive field centers, and SR and Sφ scale the connection strength. To examine the effects of the spread of motion pattern-specific inhibition and connection strength in the model, σI, Sφ, and SR were considered free parameters. Within the parameter space used to define recurrent connections (i.e., σI, Sφ and SR), Monte Carlo simulations of Expansion-biased model performance (1000 units) yielded regions of high correlation on both tasks (with respect to the psychophysical thresholds, r>0.7) that were consistent across independently simulated populations. Typically these regions were well defined over a broad range such that there was significant overlap between tasks (e.g., for the GMP task (SR=0.03), σI=[45,120o], Sφ=[0.03,0.3] and for the COM task (σI=80o), Sφ = [0.03,0.08], SR = [0.005, 0.04]). Fig. 5 shows averaged threshold performance for simulations of interconnected units drawn from the highly correlated regions of the (σI, Sφ, SR) parameter space. For populations not explicitly examined in the Monte Carlo simulations connection strengths (Sφ, SR) were scaled inversely with population size to maintain an equivalent level of recurrent activity. With the incorporation of recurrent connections, the sinusoidal trend in GMP and COM thresholds emerged for Expansion-biased populations as the number of units increased. In both tasks the cyclic threshold profiles were established for 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were consistent with human performance. Unlike the Expansion-biased populations, Uniform populations were not significantly affected by the presence of recurrent connections (Fig. 5). Both the range in thresholds and the flat trend across motion patterns were well matched to those in Section 3.2. Together these results suggest that the sinusoidal trends in GMP and COM performance may be mediated by the combined contribution of the recurrent interconnections and the bias in preferred motions across the population. 4 D i s c u s s i on Using a biologically constrained computational model in conjunction with human psychophysical performance on two motion pattern tasks we have shown that the visual motion information encoded across an interconnected population of cells responsive to motion patterns, such as those in MSTd, is computationally sufficient to extract perceptual estimates consistent with human performance. Specifically, we have shown that the cyclic trend in psychophysical performance observed across tasks, (a) cannot be reproduced using populations of independently responding units and (b) is dependent, in part, on the presence of an expanding motion bias in the distribution of preferred motions across the neural population. The model’s performance suggests the presence of specific recurrent structures within motion pattern responsive areas, such as MSTd, whose strength varies as a function of the similarity between preferred motion patterns and the distance between receptive field centers. While such structures have not been explicitly examined in MSTd and other higher visual motion areas there is anecdotal support for the presence of inhibitory connections [8]. Together, these results suggest that robust processing of the motion patterns associated with self-motion and optic flow may be mediated, in part, by recurrent structures in extrastriate visual motion areas whose distributions of preferred motions are biased strongly in favor of expanding motions. Acknowledgments This work was supported by National Institutes of Health grant EY-2R01-07861-13 to L.M.V. References [1] Malach, R., Schirman, T., Harel, M., Tootell, R., & Malonek, D., (1997), Cerebral Cortex, 7(4): 386-393. [2] Gilbert, C. D., (1992), Neuron, 9: 1-13. [3] Koechlin, E., Anton, J., & Burnod, Y., (1999), Biological Cybernetics, 80: 2544. [4] Stemmler, M., Usher, M., & Niebur, E., (1995), Science, 269: 1877-1880. [5] Burr, D. C., Morrone, M. C., & Vaina, L. M., (1998), Vision Research, 38(12): 1731-1743. [6] Meese, T. S. & Harris, S. J., (2002), Vision Research, 42: 1073-1080. [7] Tanaka, K. & Saito, H. A., (1989), Journal of Neurophysiology, 62(3): 626-641. [8] Duffy, C. J. & Wurtz, R. H., (1991), Journal of Neurophysiology, 65(6): 13461359. [9] Duffy, C. J. & Wurtz, R. H., (1995), Journal of Neuroscience, 15(7): 5192-5208. [10] Graziano, M. S., Anderson, R. A., & Snowden, R., (1994), Journal of Neuroscience, 14(1): 54-67. [11] Celebrini, S. & Newsome, W., (1994), Journal of Neuroscience, 14(7): 41094124. [12] Celebrini, S. & Newsome, W. T., (1995), Journal of Neurophysiology, 73(2): 437-448. [13] Beardsley, S. A. & Vaina, L. M., (2001), Journal of Computational Neuroscience, 10: 255-280. [14] Matthews, N. & Qian, N., (1999), Vision Research, 39: 2205-2211.

5 0.12195854 37 nips-2003-Automatic Annotation of Everyday Movements

Author: Deva Ramanan, David A. Forsyth

Abstract: This paper describes a system that can annotate a video sequence with: a description of the appearance of each actor; when the actor is in view; and a representation of the actor’s activity while in view. The system does not require a fixed background, and is automatic. The system works by (1) tracking people in 2D and then, using an annotated motion capture dataset, (2) synthesizing an annotated 3D motion sequence matching the 2D tracks. The 3D motion capture data is manually annotated off-line using a class structure that describes everyday motions and allows motion annotations to be composed — one may jump while running, for example. Descriptions computed from video of real motions show that the method is accurate.

6 0.11664601 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

7 0.11479385 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification

8 0.11273254 143 nips-2003-On the Dynamics of Boosting

9 0.11259157 148 nips-2003-Online Passive-Aggressive Algorithms

10 0.11249363 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

11 0.10712256 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

12 0.10578137 53 nips-2003-Discriminating Deformable Shape Classes

13 0.097888902 41 nips-2003-Boosting versus Covering

14 0.077536181 114 nips-2003-Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

15 0.060981285 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

16 0.059725866 102 nips-2003-Large Scale Online Learning

17 0.057674225 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

18 0.05255552 166 nips-2003-Reconstructing MEG Sources with Unknown Correlations

19 0.051438645 10 nips-2003-A Low-Power Analog VLSI Visual Collision Detector

20 0.051416129 146 nips-2003-Online Learning of Non-stationary Sequences


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.167), (1, 0.002), (2, 0.03), (3, -0.131), (4, -0.073), (5, -0.076), (6, 0.174), (7, 0.118), (8, 0.117), (9, -0.129), (10, -0.102), (11, 0.072), (12, 0.11), (13, 0.058), (14, -0.081), (15, -0.049), (16, 0.201), (17, -0.158), (18, 0.089), (19, -0.43), (20, 0.098), (21, -0.049), (22, 0.073), (23, 0.047), (24, 0.023), (25, 0.076), (26, 0.035), (27, -0.004), (28, 0.014), (29, 0.043), (30, 0.048), (31, -0.118), (32, 0.112), (33, -0.006), (34, -0.004), (35, 0.098), (36, -0.067), (37, 0.063), (38, -0.066), (39, -0.028), (40, 0.063), (41, 0.018), (42, 0.024), (43, 0.046), (44, 0.037), (45, 0.048), (46, 0.053), (47, 0.066), (48, 0.068), (49, 0.093)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97901988 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler

Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1

2 0.62587762 81 nips-2003-Geometric Analysis of Constrained Curves

Author: Anuj Srivastava, Washington Mio, Xiuwen Liu, Eric Klassen

Abstract: We present a geometric approach to statistical shape analysis of closed curves in images. The basic idea is to specify a space of closed curves satisfying given constraints, and exploit the differential geometry of this space to solve optimization and inference problems. We demonstrate this approach by: (i) defining and computing statistics of observed shapes, (ii) defining and learning a parametric probability model on shape space, and (iii) designing a binary hypothesis test on this space. 1

3 0.5576278 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

Author: Amit Gruber, Yair Weiss

Abstract: The problem of “Structure From Motion” is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplified camera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation many elements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis to perform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences. 1

4 0.48123178 53 nips-2003-Discriminating Deformable Shape Classes

Author: Salvador Ruiz-correa, Linda G. Shapiro, Marina Meila, Gabriel Berson

Abstract: We present and empirically test a novel approach for categorizing 3-D free form object shapes represented by range data . In contrast to traditional surface-signature based systems that use alignment to match specific objects, we adapted the newly introduced symbolic-signature representation to classify deformable shapes [10]. Our approach constructs an abstract description of shape classes using an ensemble of classifiers that learn object class parts and their corresponding geometrical relationships from a set of numeric and symbolic descriptors. We used our classification engine in a series of large scale discrimination experiments on two well-defined classes that share many common distinctive features. The experimental results suggest that our method outperforms traditional numeric signature-based methodologies. 1 1

5 0.47041753 7 nips-2003-A Functional Architecture for Motion Pattern Processing in MSTd

Author: Scott A. Beardsley, Lucia M. Vaina

Abstract: Psychophysical studies suggest the existence of specialized detectors for component motion patterns (radial, circular, and spiral), that are consistent with the visual motion properties of cells in the dorsal medial superior temporal area (MSTd) of non-human primates. Here we use a biologically constrained model of visual motion processing in MSTd, in conjunction with psychophysical performance on two motion pattern tasks, to elucidate the computational mechanisms associated with the processing of widefield motion patterns encountered during self-motion. In both tasks discrimination thresholds varied significantly with the type of motion pattern presented, suggesting perceptual correlates to the preferred motion bias reported in MSTd. Through the model we demonstrate that while independently responding motion pattern units are capable of encoding information relevant to the visual motion tasks, equivalent psychophysical performance can only be achieved using interconnected neural populations that systematically inhibit non-responsive units. These results suggest the cyclic trends in psychophysical performance may be mediated, in part, by recurrent connections within motion pattern responsive areas whose structure is a function of the similarity in preferred motion patterns and receptive field locations between units. 1 In trod u ction A major challenge in computational neuroscience is to elucidate the architecture of the cortical circuits for sensory processing and their effective role in mediating behavior. In the visual motion system, biologically constrained models are playing an increasingly important role in this endeavor by providing an explanatory substrate linking perceptual performance and the visual properties of single cells. Single cell studies indicate the presence of complex interconnected structures in middle temporal and primary visual cortex whose most basic horizontal connections can impart considerable computational power to the underlying neural population [1, 2]. Combined psychophysical and computational studies support these findings Figure 1: a) Schematic of the graded motion pattern (GMP) task. Discrimination pairs of stimuli were created by perturbing the flow angle (φ) of each 'test' motion (with average dot speed, vav), by ±φp in the stimulus space spanned by radial and circular motions. b) Schematic of the shifted center-of-motion (COM) task. Discrimination pairs of stimuli were created by shifting the COM of the ‘test’ motion to the left and right of a central fixation point. For each motion pattern the COM was shifted within the illusory inner aperture and was never explicitly visible. and suggest that recurrent connections may play a significant role in encoding the visual motion properties associated with various psychophysical tasks [3, 4]. Using this methodology our goal is to elucidate the computational mechanisms associated with the processing of wide-field motion patterns encountered during self-motion. In the human visual motion system, psychophysical studies suggest the existence of specialized detectors for the motion pattern components (i.e., radial, circular and spiral motions) associated with self-motion [5, 6]. Neurophysiological studies reporting neurons sensitive to motion patterns in the dorsal medial superior temporal area (MSTd) support the existence of such mechanisms [7-10], and in conjunction with psychophysical studies suggest a strong link between the patterns of neural activity and motion-based perceptual performance [11, 12]. Through the combination of human psychophysical performance and biologically constrained modeling we investigate the computational role of simple recurrent connections within a population of MSTd-like units. Based on the known visual motion properties within MSTd we ask what neural structures are computationally sufficient to encode psychophysical performance on a series of motion pattern tasks. 2 M o t i o n pa t t e r n d i sc r i m i n a t i o n Using motion pattern stimuli consistent with previous studies [5, 6], we have developed a set of novel psychophysical tasks designed to facilitate a more direct comparison between human perceptual performance and the visual motion properties of cells in MSTd that have been found to underlie the discrimination of motion patterns [11, 12]. The psychophysical tasks, referred to as the graded motion pattern (GMP) and shifted center-of-motion (COM) tasks, are outlined in Fig. 1. Using a temporal two-alternative-forced-choice task we measured discrimination thresholds to global changes in the patterns of complex motion (GMP task), [13], and shifts in the center-of-motion (COM task). Stimuli were presented with central fixation using a constant stimulus paradigm and consisted of dynamic random dot displays presented in a 24o annular region (central 4o removed). In each task, the stimulus duration was randomly perturbed across presentations (440±40 msec) to control for timing-based cues, and dots moved coherently through a radial speed Figure 2: a) GMP thresholds across 8 'test' motions at two mean dot speeds for two observers. Performance varied continuously with thresholds for radial motions (φ=0, 180o) significantly lower than those for circular motions (φ=90,270o), (p<0.001; t(37)=3.39). b) COM thresholds at three mean dot speeds for two observers. As with the GMP task, performance varied continuously with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47). gradient in directions consistent with the global motion pattern presented. Discrimination thresholds were obtained across eight ‘test’ motions corresponding to expansion, contraction, CW and CCW rotation, and the four intermediate spiral motions. To minimize adaptation to specific motion patterns, opposing motions (e.g., expansion/ contraction) were interleaved across paired presentations. 2.1 Results Discrimination thresholds are reported here from a subset of the observer population consisting of three experienced psychophysical observers, one of which was naïve to the purpose of the psychophysical tasks. For each condition, performance is reported as the mean and standard error averaged across 8-12 thresholds. Across observers and dot speeds GMP thresholds followed a distinct trend in the stimulus space [13], with radial motions (expansion/contraction) significantly lower than circular motions (CW/CCW rotation), (p<0.001; t(37)=3.39), (Fig. 2a). While thresholds for the intermediate spiral motions were not significantly different from circular motions (p=0.223, t(60)=0.74), the trends across 'test' motions were well fit within the stimulus space (SB: r>0.82, SC: r>0.77) by sinusoids whose period and phase were 196 ± 10o and -72 ± 20o respectively (Fig. 1a). When the radial speed gradient was removed by randomizing the spatial distribution of dot speeds, threshold performance increased significantly across observers (p<0.05; t(17)=1.91), particularly for circular motions (p<0.005; t(25)=3.31), (data not shown). Such performance suggests a perceptual contribution associated with the presence of the speed gradient and is particularly interesting given the fact that the speed gradient did not contribute computationally relevant information to the task. However, the speed gradient did convey information regarding the integrative structure of the global motion field and as such suggests a preference of the underlying motion mechanisms for spatially structured speed information. Similar trends in performance were observed in the COM task across observers and dot speeds. Discrimination thresholds varied continuously as a function of the 'test' motion with thresholds for radial motions significantly lower than those for circular motions, (p<0.001; t(37)=4.47) and could be well fit by a sinusoidal trend line (e.g. SB at 3 deg/s: r>0.91, period = 178 ± 10 o and phase = -70 ± 25o), (Fig. 2b). 2.2 A local or global task? The consistency of the cyclic threshold profile in stimuli that restricted the temporal integration of individual dot motions [13], and simultaneously contained all directions of motion, generally argues against a primary role for local motion mechanisms in the psychophysical tasks. While the psychophysical literature has reported a wide variety of “local” motion direction anisotropies whose properties are reminiscent of the results observed here, e.g. [14], all would predict equivalent thresholds for radial and circular motions for a set of uniformly distributed and/or spatially restricted motion direction mechanisms. Together with the computational impact of the speed gradient and psychophysical studies supporting the existence of wide-field motion pattern mechanisms [5, 6], these results suggest that the threshold differences across the GMP and COM tasks may be associated with variations in the computational properties across a series of specialized motion pattern mechanisms. 3 A computational model The similarities between the motion pattern stimuli used to quantify human perception and the visual motion properties of cells in MSTd suggests that MSTd may play a computational role in the psychophysical tasks. To examine this hypothesis, we constructed a population of MSTd-like units whose visual motion properties were consistent with the reported neurophysiology (see [13] for details). Across the population, the distribution of receptive field centers was uniform across polar angle and followed a gamma distribution Γ(5,6) across eccenticity [7]. For each unit, visual motion responses followed a gaussian tuning profile as a function of the stimulus flow angle G( φ), (σi=60±30o; [10]), and the distance of the stimulus COM from the unit’s receptive field center Gsat(xi, yi, σs=19o), Eq. 1, such that its preferred motion response was position invariant to small shifts in the COM [10] and degraded continuously for large shifts [9]. Within the model, simulations were categorized according to the distribution of preferred motions represented across the population (one reported in MSTd and a uniform control). The first distribution simulated an expansion bias in which the density of preferred motions decreased symmetrically from expansions to contraction [10]. The second distribution simulated a uniform preference for all motions and was used as a control to quantify the effects of an expansion bias on psychophysical performance. Throughout the paper we refer to simulations containing these distributions as ‘Expansion-biased’ and ‘Uniform’ respectively. 3.1 Extracting perceptual estimates from the neural code For each stimulus presentation, the ith unit’s response was calculated as the average firing rate, Ri, from the product of its motion pattern and spatial tuning profiles, ( ) Ri = Rmax G min[φ − φi ] ,σ ti G sati (x− xi , y − y i ,σ s ) + P (λ = 12 ) (1) where Rmax is the maximum preferred stimulus response (spikes/s), min[ ] refers to the minimum angular distance between the stimulus flow angle φ and the unit’s preferred motion φi, Gsat is the unit’s spatial tuning profile saturated within the central 5±3o, σti and σs are the standard deviations of the unit’s motion pattern and Figure 3: Model vs. psychophysical performance for independently responding units. Model thresholds are reported as the average (±1 S.E.) across five simulated populations. a) GMP thresholds were highest for contracting motions and lowest for expanding motions across all Expansion-biased populations. b) Comparable trends in performance were observed for COM thresholds. Comparison with the Uniform control simulations in both tasks (2000 units shown here) indicates that thresholds closely followed the distribution of preferred motions simulated within the model. spatial tuning profiles respectively, (xi,yi) is the spatial location of the unit’s receptive field center, (x,y) is the spatial location of the stimulus COM, and P(λ=12) is the background activity simulated as an uncorrelated Poisson process. The psychophysical tasks were simulated using a modified center-of-gravity ^ approach to decode estimates of the stimulus properties, i.e. flow angle (φ ) and ˆ ˆ COM location in the visual field (x, y ) , from the neural population   ∑ xi Ri ∑ y i Ri v  , i , ∑ φ i Ri  ∑ Ri i   i i   i (xˆ, yˆ , φˆ) =  i∑ R  (2) v where φi is the unit vector in the stimulus space (Fig. 1a) corresponding to the unit’s preferred motion. For each set of paired stimuli, psychophysical judgments were made by comparing the estimated stimulus properties according to the discrimination criteria, specified in the psychophysical tasks. As with the psychophysical experiments, discrimination thresholds were computed using a leastsquares fit to percent correct performance across constant stimulus levels. 3.2 Simulation 1: Independent neural responses In the first series of simulations, GMP and COM thresholds were quantified across three populations (500, 1000, and 2000 units) of independently responding units for each simulated distribution (Expansion-biased and Uniform). Across simulations, both the range in thresholds and their trends across ‘test’ motions were compared with human psychophysical performance to quantify the effects of population size and an expansion biased preferred motion distribution on model performance. Over the psychophysical range of interest (φp ± 7o), GMP thresholds for contracting motions were at chance across all Expansion-biased populations, (Fig. 3a). While thresholds for expanding motions were generally consistent with those for human observers, those for circular motions remained significantly higher for all but the largest populations. Similar trends in performance were observed for the COM task, (Fig. 3b). Here the range of COM thresholds was well matched with human performance for simulations containing 1000 units, however, the trends across motion patterns remained inconsistent even for the largest populations. Figure 4: Proposed recurrent connection profile between motion pattern units. a) Across the motion pattern space connection strength followed an inverse gaussian profile such that the ith unit (with preferred motion φi) systematically inhibited units with anti-preferred motions centered at 180+φi. b) Across the visual field connection strength followed a difference-of-gaussians profile as a function of the relative distance between receptive field centers such that spatially local units are mutually excitatory (σRe=10o) and more distant units were mutually inhibitory (σRi=80o). For simulations containing a uniform distribution of preferred motions, the threshold range was consistent with human performance on both tasks, however, the trend across motion patterns was generally flat. What variability did occur was due primarily to the discrete sampling of preferred motions across the population. Comparison of the discrimination thresholds for the Expansion-biased and Uniform populations indicates that the trend across thresholds was closely matched to the underlying distributions of preferred motions. This result in due in part to the nearequal weighting of independently responding units and can be explained to a first approximation by the proportional increase in the signal-to-noise ratio across the population as a function of the density of units responsive to a given 'test' motion. 3.3 Simulation 2: An interconnected neural structure In a second series of simulations, we examined the computational effect of adding recurrent connections between units. If the distribution of preferred motions in MSTd is in fact biased towards expansions, as the neurophysiology suggests, it seems unlikely that independent estimates of the visual motion information would be sufficient to yield the threshold profiles observed in the psychophysical tasks. We hypothesize that a simple fixed architecture of excitatory and/or inhibitory connections is sufficient to account for the cyclic trends in discrimination thresholds. Specifically, we propose that a recurrent connection profile whose strength varies as a function of (a) the similarity between preferred motion patterns and (b) the distance between receptive field centers, is computationally sufficient to recover the trends in GMP/COM performance (Fig. 4), wij = S R e − ( xi − x j )2 + ( yi − y j )2 2 2σ R e − SR e 2 − −(min[ φi − φ j ])2 ( xi − x j )2 + ( yi − y j )2 2 2 σ Ri − Sφ e 2σ I2 (3) Figure 5: Model vs. psychophysical performance for populations containing recurrent connections (σI=80o). As the number of units increased for Expansionbiased populations, discrimination thresholds decreased to psychophysical levels and the sinusoidal trend in thresholds emerged for both the (a) GMP and (b) COM tasks. Sinusoidal trends were established for as few as 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were (193.8 ± 11.7o, -70.0 ± 22.6o) and (168.2 ± 13.7o, -118.8 ± 31.8o) for the GMP and COM tasks respectively. where wij is the strength of the recurrent connection between ith and jth units, (xi,yi) and (xj,yj) denote the spatial locations of their receptive field centers, σRe (=10o) and σRi (=80o) together define the spatial extent of a difference-of-gaussians interaction between receptive field centers, and SR and Sφ scale the connection strength. To examine the effects of the spread of motion pattern-specific inhibition and connection strength in the model, σI, Sφ, and SR were considered free parameters. Within the parameter space used to define recurrent connections (i.e., σI, Sφ and SR), Monte Carlo simulations of Expansion-biased model performance (1000 units) yielded regions of high correlation on both tasks (with respect to the psychophysical thresholds, r>0.7) that were consistent across independently simulated populations. Typically these regions were well defined over a broad range such that there was significant overlap between tasks (e.g., for the GMP task (SR=0.03), σI=[45,120o], Sφ=[0.03,0.3] and for the COM task (σI=80o), Sφ = [0.03,0.08], SR = [0.005, 0.04]). Fig. 5 shows averaged threshold performance for simulations of interconnected units drawn from the highly correlated regions of the (σI, Sφ, SR) parameter space. For populations not explicitly examined in the Monte Carlo simulations connection strengths (Sφ, SR) were scaled inversely with population size to maintain an equivalent level of recurrent activity. With the incorporation of recurrent connections, the sinusoidal trend in GMP and COM thresholds emerged for Expansion-biased populations as the number of units increased. In both tasks the cyclic threshold profiles were established for 1000 units and were well fit (r>0.9) by sinusoids whose periods and phases were consistent with human performance. Unlike the Expansion-biased populations, Uniform populations were not significantly affected by the presence of recurrent connections (Fig. 5). Both the range in thresholds and the flat trend across motion patterns were well matched to those in Section 3.2. Together these results suggest that the sinusoidal trends in GMP and COM performance may be mediated by the combined contribution of the recurrent interconnections and the bias in preferred motions across the population. 4 D i s c u s s i on Using a biologically constrained computational model in conjunction with human psychophysical performance on two motion pattern tasks we have shown that the visual motion information encoded across an interconnected population of cells responsive to motion patterns, such as those in MSTd, is computationally sufficient to extract perceptual estimates consistent with human performance. Specifically, we have shown that the cyclic trend in psychophysical performance observed across tasks, (a) cannot be reproduced using populations of independently responding units and (b) is dependent, in part, on the presence of an expanding motion bias in the distribution of preferred motions across the neural population. The model’s performance suggests the presence of specific recurrent structures within motion pattern responsive areas, such as MSTd, whose strength varies as a function of the similarity between preferred motion patterns and the distance between receptive field centers. While such structures have not been explicitly examined in MSTd and other higher visual motion areas there is anecdotal support for the presence of inhibitory connections [8]. Together, these results suggest that robust processing of the motion patterns associated with self-motion and optic flow may be mediated, in part, by recurrent structures in extrastriate visual motion areas whose distributions of preferred motions are biased strongly in favor of expanding motions. Acknowledgments This work was supported by National Institutes of Health grant EY-2R01-07861-13 to L.M.V. References [1] Malach, R., Schirman, T., Harel, M., Tootell, R., & Malonek, D., (1997), Cerebral Cortex, 7(4): 386-393. [2] Gilbert, C. D., (1992), Neuron, 9: 1-13. [3] Koechlin, E., Anton, J., & Burnod, Y., (1999), Biological Cybernetics, 80: 2544. [4] Stemmler, M., Usher, M., & Niebur, E., (1995), Science, 269: 1877-1880. [5] Burr, D. C., Morrone, M. C., & Vaina, L. M., (1998), Vision Research, 38(12): 1731-1743. [6] Meese, T. S. & Harris, S. J., (2002), Vision Research, 42: 1073-1080. [7] Tanaka, K. & Saito, H. A., (1989), Journal of Neurophysiology, 62(3): 626-641. [8] Duffy, C. J. & Wurtz, R. H., (1991), Journal of Neurophysiology, 65(6): 13461359. [9] Duffy, C. J. & Wurtz, R. H., (1995), Journal of Neuroscience, 15(7): 5192-5208. [10] Graziano, M. S., Anderson, R. A., & Snowden, R., (1994), Journal of Neuroscience, 14(1): 54-67. [11] Celebrini, S. & Newsome, W., (1994), Journal of Neuroscience, 14(7): 41094124. [12] Celebrini, S. & Newsome, W. T., (1995), Journal of Neurophysiology, 73(2): 437-448. [13] Beardsley, S. A. & Vaina, L. M., (2001), Journal of Computational Neuroscience, 10: 255-280. [14] Matthews, N. & Qian, N., (1999), Vision Research, 39: 2205-2211.

6 0.42189747 37 nips-2003-Automatic Annotation of Everyday Movements

7 0.32508814 143 nips-2003-On the Dynamics of Boosting

8 0.31320047 41 nips-2003-Boosting versus Covering

9 0.30343789 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

10 0.28383663 95 nips-2003-Insights from Machine Learning Applied to Human Visual Classification

11 0.25131491 66 nips-2003-Extreme Components Analysis

12 0.24988207 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

13 0.24948236 85 nips-2003-Human and Ideal Observers for Detecting Image Curves

14 0.23161067 27 nips-2003-Analytical Solution of Spike-timing Dependent Plasticity Based on Synaptic Biophysics

15 0.21173157 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

16 0.21024312 10 nips-2003-A Low-Power Analog VLSI Visual Collision Detector

17 0.20427087 114 nips-2003-Limiting Form of the Sample Covariance Eigenspectrum in PCA and Kernel PCA

18 0.19504406 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

19 0.19413368 148 nips-2003-Online Passive-Aggressive Algorithms

20 0.19220716 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.079), (9, 0.29), (11, 0.064), (30, 0.015), (35, 0.048), (53, 0.091), (58, 0.01), (66, 0.021), (71, 0.036), (76, 0.04), (85, 0.098), (87, 0.024), (91, 0.079), (99, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8451972 61 nips-2003-Entrainment of Silicon Central Pattern Generators for Legged Locomotory Control

Author: Francesco Tenore, Ralph Etienne-Cummings, M. A. Lewis

Abstract: We have constructed a second generation CPG chip capable of generating the necessary timing to control the leg of a walking machine. We demonstrate improvements over a previous chip by moving toward a significantly more versatile device. This includes a larger number of silicon neurons, more sophisticated neurons including voltage dependent charging and relative and absolute refractory periods, and enhanced programmability of neural networks. This chip builds on the basic results achieved on a previous chip and expands its versatility to get closer to a self-contained locomotion controller for walking robots. 1

same-paper 2 0.81087404 106 nips-2003-Learning Non-Rigid 3D Shape from 2D Motion

Author: Lorenzo Torresani, Aaron Hertzmann, Christoph Bregler

Abstract: This paper presents an algorithm for learning the time-varying shape of a non-rigid 3D object from uncalibrated 2D tracking data. We model shape motion as a rigid component (rotation and translation) combined with a non-rigid deformation. Reconstruction is ill-posed if arbitrary deformations are allowed. We constrain the problem by assuming that the object shape at each time instant is drawn from a Gaussian distribution. Based on this assumption, the algorithm simultaneously estimates 3D shape and motion for each time frame, learns the parameters of the Gaussian, and robustly fills-in missing data points. We then extend the algorithm to model temporal smoothness in object shape, thus allowing it to handle severe cases of missing data. 1

3 0.71769291 187 nips-2003-Training a Quantum Neural Network

Author: Bob Ricks, Dan Ventura

Abstract: Most proposals for quantum neural networks have skipped over the problem of how to train the networks. The mechanics of quantum computing are different enough from classical computing that the issue of training should be treated in detail. We propose a simple quantum neural network and a training method for it. It can be shown that this algorithm works in quantum systems. Results on several real-world data sets show that this algorithm can train the proposed quantum neural networks, and that it has some advantages over classical learning algorithms. 1

4 0.52974904 78 nips-2003-Gaussian Processes in Reinforcement Learning

Author: Malte Kuss, Carl E. Rasmussen

Abstract: We exploit some useful properties of Gaussian process (GP) regression models for reinforcement learning in continuous state spaces and discrete time. We demonstrate how the GP model allows evaluation of the value function in closed form. The resulting policy iteration algorithm is demonstrated on a simple problem with a two dimensional state space. Further, we speculate that the intrinsic ability of GP models to characterise distributions of functions would allow the method to capture entire distributions over future values instead of merely their expectation, which has traditionally been the focus of much of reinforcement learning.

5 0.52171326 113 nips-2003-Learning with Local and Global Consistency

Author: Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, Bernhard Schölkopf

Abstract: We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data. 1

6 0.51887542 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

7 0.51592082 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

8 0.51584578 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection

9 0.51462573 3 nips-2003-AUC Optimization vs. Error Rate Minimization

10 0.51308435 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition

11 0.51034683 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

12 0.5102998 20 nips-2003-All learning is Local: Multi-agent Learning in Global Reward Games

13 0.50894111 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

14 0.50853819 172 nips-2003-Semi-Supervised Learning with Trees

15 0.50632393 124 nips-2003-Max-Margin Markov Networks

16 0.50586998 12 nips-2003-A Model for Learning the Semantics of Pictures

17 0.50548583 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

18 0.50510317 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

19 0.50462633 28 nips-2003-Application of SVMs for Colour Classification and Collision Detection with AIBO Robots

20 0.5024237 48 nips-2003-Convex Methods for Transduction