nips nips2004 nips2004-83 knowledge-graph by maker-knowledge-mining

83 nips-2004-Incremental Learning for Visual Tracking

Source: pdf

Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang

Abstract: Most existing tracking algorithms construct a representation of a target object prior to the tracking task starts, and utilize invariant features to handle appearance variation of the target caused by lighting, pose, and view angle change. In this paper, we present an efﬁcient and effective online algorithm that incrementally learns and adapts a low dimensional eigenspace representation to reﬂect appearance changes of the target, thereby facilitating the tracking task. Furthermore, our incremental method correctly updates the sample mean and the eigenbasis, whereas existing incremental subspace update methods ignore the fact the sample mean varies over time. The tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle ﬁlter is incorporated for propagating sample distributions over time. Numerous experiments demonstrate the effectiveness of the proposed tracking algorithm in indoor and outdoor environments where the target objects undergo large pose and lighting changes. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Most existing tracking algorithms construct a representation of a target object prior to the tracking task starts, and utilize invariant features to handle appearance variation of the target caused by lighting, pose, and view angle change. [sent-6, score-1.622]

2 In this paper, we present an efﬁcient and effective online algorithm that incrementally learns and adapts a low dimensional eigenspace representation to reﬂect appearance changes of the target, thereby facilitating the tracking task. [sent-7, score-1.216]

3 Furthermore, our incremental method correctly updates the sample mean and the eigenbasis, whereas existing incremental subspace update methods ignore the fact the sample mean varies over time. [sent-8, score-0.654]

4 The tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle ﬁlter is incorporated for propagating sample distributions over time. [sent-9, score-0.431]

5 Numerous experiments demonstrate the effectiveness of the proposed tracking algorithm in indoor and outdoor environments where the target objects undergo large pose and lighting changes. [sent-10, score-1.124]

6 1 Introduction The main challenges of visual tracking can be attributed to the difﬁculty in handling appearance variability of a target object. [sent-11, score-0.896]

7 Intrinsic appearance variabilities include pose variation and shape deformation of a target object, whereas extrinsic illumination change, camera motion, camera viewpoint, and occlusions inevitably cause large appearance variation. [sent-12, score-1.481]

8 Due to the nature of the tracking problem, it is imperative for a tracking algorithm to model such appearance variation. [sent-13, score-0.996]

9 Here we developed a method that, during visual tracking, constantly and efﬁciently updates a low dimensional eigenspace representation of the appearance of the target object. [sent-14, score-0.997]

10 The advantages of this adaptive subspace representation are several folds. [sent-15, score-0.215]

11 The eigenspace representation provides a compact notion of the “thing” being tracked rather than treating the target as a set of independent pixels, i. [sent-16, score-0.558]

12 The use of an incremental method continually updates the eigenspace to reﬂect the appearance change caused by intrinsic and extrinsic factors, thereby facilitating the tracking process. [sent-19, score-1.295]

13 To estimate the locations of the target objects in consecutive frames, we used a sampling algorithm with likelihood estimates, which is in direct contrast to other tracking methods that usually solve complex optimization problems using gradient-descent approach. [sent-20, score-0.608]

14 First, the proposed algorithm does not require any training images of the target object before the tracking task starts. [sent-22, score-0.715]

15 That is, our tracker learns a low dimensional eigenspace representation on-line and incrementally updates it as time progresses (We assume, like most tracking algorithms, that the target region has been initialized in the ﬁrst frame). [sent-23, score-1.248]

16 Based on the eigenspace model with updates, an effective likelihood estimation function is developed. [sent-25, score-0.243]

17 Third, we extend the R-SVD algorithm [6] so that both the sample mean and eigenbasis are correctly updated as new data arrive. [sent-26, score-0.351]

18 Though there are numerous subspace update algorithms in the literature, only the method by Hall et al. [sent-27, score-0.237]

19 Finally, the proposed tracker is extended to use a robust error norm for likelihood estimation in the presence of noisy data or partial occlusions, thereby rendering more accurate and robust tracking results. [sent-31, score-0.701]

20 [4] proposed a tracking algorithm using a pre-trained view-based eigenbasis representation and a robust error norm. [sent-33, score-0.714]

21 Instead of relying on the popular brightness constancy working principal, they advocated the use of subspace constancy assumption for visual tracking. [sent-34, score-0.283]

22 Although their algorithm demonstrated excellent empirical results, it requires to build a set of view-based eigenbases before the tracking task starts. [sent-35, score-0.451]

23 Furthermore, their method assumes that certain factors, such as illumination conditions, do not change signiﬁcantly as the eigenbasis, once constructed, is not updated. [sent-36, score-0.26]

24 Hager and Belhumeur [7] presented a tracking algorithm to handle the geometry and illumination variations of target objects. [sent-37, score-0.725]

25 Their method extends a gradient-based optical ﬂow algorithm to incorporate research ﬁndings in [2] for object tracking under varying illumination conditions. [sent-38, score-0.674]

26 Prior to the tracking task starts, a set of illumination basis needs to be constructed at a ﬁxed pose in order to account for appearance variation of the target due to lighting changes. [sent-39, score-1.433]

27 Consequently, it is not clear whether this method is effective if a target object undergoes changes in illumination with arbitrary pose. [sent-40, score-0.608]

28 In [9] Isard and Blake developed the Condensation algorithm for contour tracking in which multiple plausible interpretations are propagated over time. [sent-41, score-0.379]

29 Though their probabilistic approach has demonstrated success in tracking contours in clutter, the representation scheme is rather primitive, i. [sent-42, score-0.405]

30 , curves or splines, and is not updated as the appearance of a target varies due to pose or illumination change. [sent-44, score-0.853]

31 Mixture models have been used to describe appearance change for motion estimation [3] [10]. [sent-45, score-0.436]

32 [3] four possible causes are identiﬁed in a mixture model for estimating appearance change in consecutive frames, and thereby more reliable image motion can be obtained. [sent-47, score-0.586]

33 [10] in which they use three components and wavelet ﬁlters to account for appearance changes during tracking. [sent-49, score-0.349]

34 Their method is able to handle variations in pose, illumination and expression. [sent-50, score-0.195]

35 However, their WSL appearance model treats pixels within the target region independently, and therefore does not have notion of the “thing” being tracked. [sent-51, score-0.591]

36 In contrast to the eigentracking algorithm [4], our algorithm does not require a training phase but learns the eigenbases on-line during the object tracking process, and constantly updates this representation as the appearance changes due to pose, view angle, and illumination variation. [sent-53, score-1.376]

37 In addition, the learned representation can be utilized for other tasks such as object recognition. [sent-56, score-0.198]

38 In this work, an eigenspace representation is learned directly from pixel values within a target object in the image space. [sent-57, score-0.682]

39 Experiments show that good tracking results can be obtained with this representation without resorting to wavelets as used in [10], and better performance can potentially be achieved using wavelet ﬁlters. [sent-58, score-0.405]

40 Note also that the view-based eigenspace representation has demonstrated its ability to model appearance of objects at different pose [13], and under different lighting conditions [2]. [sent-59, score-1.004]

41 3 Incremental Learning for Tracking We present the details of the proposed incremental learning algorithm for object tracking in this section. [sent-60, score-0.585]

42 1 Incremental Update of Eigenbasis and Mean The appearance of a target object may change drastically due to intrinsic and extrinsic factors as discussed earlier. [sent-62, score-0.749]

43 Therefore it is important to develop an efﬁcient algorithm to update the eigenspace as the tracking task progresses. [sent-63, score-0.631]

44 Numerous algorithms have been developed to update eigenbasis from a time-varying covariance matrix as more data arrive [6] [8] [11] [5]. [sent-64, score-0.307]

45 However, most methods assume zero mean in updating the eigenbasis except the method by Hall et al. [sent-65, score-0.301]

46 [8] in which they consider the change of the mean when updating eigenbasis as each new datum arrives. [sent-66, score-0.403]

47 We extend the work of the classic R-SVD method [6] in which we update the eigenbasis while taking the shift of the sample mean into account. [sent-68, score-0.396]

48 Exploiting the properties of orthonormal bases and block structures, the R-SVD algorithm computes the new eigenbasis efﬁciently. [sent-83, score-0.298]

49 One problem with the R-SVD algorithm is that the eigenbasis U is computed from AA with the zero mean assumption. [sent-85, score-0.301]

50 We modify the R-SVD algorithm and compute the eigenbasis with mean update. [sent-86, score-0.301]

51 In numerous vision problems, we can further exploit the low dimensional approximation of image data and put larger weights on the recent observations, or equivalently downweight the contributions of previous observations. [sent-117, score-0.181]

52 For example as the appearance of a target object gradually changes, we may want to put more weights on recent observations in updating the eigenbasis since they are more likely to be similar to the current appearance of the target. [sent-118, score-1.237]

53 2 Sequential Inference Model The visual tracking problem is cast as an inference problem with a Markov model and hidden state variable, where a state variable Xt describes the afﬁne motion parameters (and thereby the location) of the target at time t. [sent-123, score-0.755]

55 3 Dynamical and Observation Models The motion of a target object between two consecutive frames can be approximated by an afﬁne image warping. [sent-133, score-0.53]

56 In this work, we use the six parameters of afﬁne transform to model the state transition from Xt−1 to Xt of a target object being tracked. [sent-134, score-0.323]

57 Let Xt = (xt , yt , θt , st , αt , φt ) where xt , yt , θt , st , αt , φt , denote x, y translation, rotation angle, scale, aspect ratio, and skew direction at time t. [sent-135, score-0.222]

58 Given an image patch predicated by Xt , we assume the observed image It was generated from a subspace spanned by U centered at µ. [sent-141, score-0.261]

59 The probability that a sample being generated from the subspace is inversely proportional to the distance d from the sample to the reference point (i. [sent-142, score-0.253]

60 , center) of the subspace, which can be decomposed into the distance-to-subspace, dt , and the distance-within-subspace from the projected sample to the subspace center, dw . [sent-144, score-0.203]

61 This distance formulation, based on a orthonormal subspace and its complement space, is similar to [12] in spirit. [sent-145, score-0.189]

62 The probability of a sample generated from a subspace, pdt (It |Xt ), is governed by a Gaussian distribution: pdt (It | Xt ) = N (It ; µ, U U + εI) where I is an identity matrix, µ is the mean, and εI term corresponds to the additive Gaussian noise in the observation process. [sent-146, score-0.201]

63 Within a subspace, the likelihood of the projected sample can be modeled by the Mahalanobis distance from the mean as follows: pdw (It | Xt ) = N (It ; µ, U Σ−2 U ) where µ is the center of the subspace and Σ is the matrix of singular values corresponding to the columns of U . [sent-150, score-0.285]

64 Put together, the likelihood of a sample being generated from the subspace is governed by p(It |Xt ) = pdt (It |Xt ) pdw (It |Xt ) = N (It ; µ, U U + εI) N (It ; µ, U Σ−2 U ) (1) Given a drawn sample Xt and the corresponding image region It , we aim to compute p(It |Xt ) using (1). [sent-151, score-0.483]

65 , the pixels that are not likely to appear inside the target region given the current eigenspace). [sent-154, score-0.281]

66 This robust error norm is helpful especially when we use a rectangular region to enclose the target (which inevitably contains some noisy background pixels). [sent-156, score-0.393]

67 4 Experiments To test the performance of our proposed tracker, we collected a number of videos recorded in indoor and outdoor environments where the targets change pose in different lighting conditions. [sent-157, score-0.621]

68 For the eigenspace representation, each target image region is resized to 32 × 32 patch, and the number of eigenvectors used in all experiments is set to 16 though fewer eigenvectors may also work well. [sent-159, score-0.618]

69 We present some tracking results in this section and more tracking results as well as videos can be found at http://vision. [sent-161, score-0.686]

70 1 Experimental Results Figure 1 shows the tracking results using a challenging sequence recorded with a moving digital camera in which a person moves from a dark room toward a bright area while changing his pose, moving underneath spot lights, changing facial expressions and taking off glasses. [sent-165, score-0.617]

71 All the eigenbases are constructed automatically from scratch and constantly updated to model the appearance of the target object while undergoing appearance changes. [sent-166, score-1.092]

72 Even with the signiﬁcant camera motion and low frame rate (which makes the motions between frames more signiﬁcant, or equivalently to tracking fast moving objects), our tracker stays stably on the target throughout the sequence. [sent-167, score-0.953]

73 The second sequence contains an animal doll moving in different pose, scale, and lighting conditions as shown in Figure 2. [sent-168, score-0.269]

74 Experimental results demonstrate that our tracker is able to follow the target as it undergoes large pose change, cluttered background, and lighting variation. [sent-169, score-0.746]

75 Notice that the non-convex target object is localized with an enclosing rectangular window, and thus it inevitably contains some background pixels in its appearance representation. [sent-170, score-0.773]

76 The robust error norm enables the tracker to ignore background pixels and estimate the target location correctly. [sent-171, score-0.524]

77 The results also show that our algorithm faithfully Figure 1: A person moves from dark toward bright area with large lighting and pose changes. [sent-172, score-0.434]

78 The images in the second row shows the current sample mean, tracked region, reconstructed image, and the reconstruction error respectively. [sent-173, score-0.228]

79 Figure 2: An animal doll moving with large pose, lighting variation in a cluttered background. [sent-175, score-0.323]

80 models the appearance of the target, as shown in eigenbases and reconstructed images, in the presence of noisy background pixels. [sent-176, score-0.514]

81 We recorded a sequence to demonstrate that our tracker performs well in outdoor environment where lighting conditions change drastically. [sent-177, score-0.521]

82 As shown in Figure 3, the cast shadow changes the appearance of the target face drastically. [sent-179, score-0.581]

83 Furthermore, the combined pose and lighting variation with low frame rate makes the tracking task extremely difﬁcult. [sent-180, score-0.78]

84 Nevertheless, the results show that our tracker successfully follows the target accurately and robustly. [sent-181, score-0.351]

85 Due to heavy shadows and drastic lighting change, other tracking methods based on gradient, contour, or color information are unlikely to perform well in this case. [sent-182, score-0.614]

86 It is well known that the appearance of an object undergoing pose change can be modeled well by view-based Figure 3: A person moves underneath a trellis with large illumination change and cast shadows while changing his pose. [sent-185, score-1.166]

87 Meanwhile at ﬁxed pose, the appearance of an object in different illumination conditions can be approximated well by a low dimensional subspace [2]. [sent-188, score-0.826]

88 Our empirical results show that these variations can be learned on-line without any prior training phase, and also the changes caused by cast and attached shadows can still be approximated by a linear subspace to some extent. [sent-189, score-0.293]

89 Typically, the failure happens when there is a combination of fast pose change and drastic illumination change. [sent-191, score-0.453]

90 To demonstrate the potency of our modiﬁed R-SVD algorithm in faithfully modeling the object appearance, we compare the reconstructed images using our method and a conventional SVD algorithm. [sent-195, score-0.334]

91 The ﬁgure and average reconstruction error shows that our modiﬁed R-SVD method is able to effectively model the object appearance without losing detailed information. [sent-201, score-0.446]

92 5 Conclusions and Future Work We have presented an appearance-based tracker that incrementally learns a low dimensional eigenspace representation for object tracking while the target undergoes pose, illumination and appearance changes. [sent-202, score-1.799]

93 Whereas most tracking algorithms operate on the premise that the object appearance or ambient environment lighting condition does not change as time progresses, our method adapts the model representation to reﬂect appearance variation of the target, thereby facilitating the tracking task. [sent-203, score-1.917]

94 In contrast to the existing incremental subspace methods, our R-SVD method updates the mean and eigenbasis accurately and efﬁciently, and thereby learns a good eigenspace representation to faithfully model the appearance of the target being tracked. [sent-204, score-1.581]

95 Our experiments demonstrate the effectiveness of the proposed tracker in indoor and outdoor environments where the target objects undergo large pose and lighting changes. [sent-205, score-0.945]

96 Our algorithm can be extended to construct a set of eigenbases for modeling nonlinear aspects of appearance variation more precisely and automatically. [sent-207, score-0.472]

97 What is the set of images of an object under all possible lighting conditions. [sent-225, score-0.368]

98 A framework for modeling appearance change in image sequence. [sent-234, score-0.429]

99 Eigentracking: Robust matching and tracking of articulated objects using view-based representation. [sent-242, score-0.388]

100 Real-time tracking of image regions with changes in geometry and illumination. [sent-260, score-0.436]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tracking', 0.343), ('appearance', 0.31), ('eigenbasis', 0.262), ('eigenspace', 0.243), ('xt', 0.222), ('illumination', 0.195), ('ip', 0.191), ('target', 0.187), ('lighting', 0.183), ('tracker', 0.164), ('pose', 0.161), ('subspace', 0.153), ('svd', 0.149), ('ir', 0.139), ('iq', 0.137), ('object', 0.136), ('eigenbases', 0.108), ('incremental', 0.106), ('sq', 0.094), ('outdoor', 0.068), ('ur', 0.068), ('tracked', 0.066), ('sp', 0.066), ('updates', 0.066), ('change', 0.065), ('condensation', 0.065), ('indoor', 0.065), ('sr', 0.064), ('thing', 0.064), ('thereby', 0.063), ('reconstructed', 0.063), ('representation', 0.062), ('motion', 0.061), ('frames', 0.059), ('camera', 0.057), ('visual', 0.056), ('pdt', 0.056), ('shadows', 0.056), ('vision', 0.056), ('pixels', 0.056), ('image', 0.054), ('ii', 0.054), ('variation', 0.054), ('undergoes', 0.051), ('underneath', 0.051), ('extrinsic', 0.051), ('faithfully', 0.051), ('inevitably', 0.051), ('sample', 0.05), ('images', 0.049), ('eigenvectors', 0.048), ('facilitating', 0.048), ('occlusions', 0.048), ('robust', 0.047), ('objects', 0.045), ('update', 0.045), ('cast', 0.045), ('nm', 0.043), ('moving', 0.043), ('doll', 0.043), ('pdw', 0.043), ('trellis', 0.043), ('av', 0.043), ('recorded', 0.041), ('constantly', 0.041), ('changes', 0.039), ('frame', 0.039), ('person', 0.039), ('numerous', 0.039), ('governed', 0.039), ('mean', 0.039), ('learns', 0.039), ('particle', 0.038), ('environments', 0.038), ('region', 0.038), ('hager', 0.037), ('eigentracking', 0.037), ('progresses', 0.037), ('constancy', 0.037), ('vp', 0.037), ('belhumeur', 0.037), ('datum', 0.037), ('norm', 0.037), ('incrementally', 0.037), ('contour', 0.036), ('orthonormal', 0.036), ('conventional', 0.035), ('im', 0.034), ('fleet', 0.034), ('undergo', 0.034), ('consecutive', 0.033), ('background', 0.033), ('af', 0.033), ('observations', 0.032), ('dimensional', 0.032), ('isard', 0.032), ('drastic', 0.032), ('jepson', 0.032), ('splines', 0.032), ('proceedings', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 83 nips-2004-Incremental Learning for Visual Tracking

Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang

2 0.39471635 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang

Abstract: This paper presents an adaptive discriminative generative model that generalizes the conventional Fisher Linear Discriminant algorithm and renders a proper probabilistic interpretation. Within the context of object tracking, we aim to ﬁnd a discriminative generative model that best separates the target from the background. We present a computationally efﬁcient algorithm to constantly update this discriminative model as time progresses. While most tracking algorithms operate on the premise that the object appearance or ambient lighting condition does not signiﬁcantly change as time progresses, our method adapts a discriminative generative model to reﬂect appearance variation of the target and background, thereby facilitating the tracking task in ever-changing environments. Numerous experiments show that our method is able to learn a discriminative generative model for tracking target objects undergoing large pose and lighting changes.

3 0.20044436 73 nips-2004-Generative Affine Localisation and Tracking

Author: John Winn, Andrew Blake

Abstract: We present an extension to the Jojic and Frey (2001) layered sprite model which allows for layers to undergo afﬁne transformations. This extension allows for afﬁne object pose to be inferred whilst simultaneously learning the object shape and appearance. Learning is carried out by applying an augmented variational inference algorithm which includes a global search over a discretised transform space followed by a local optimisation. To aid correct convergence, we use bottom-up cues to restrict the space of possible afﬁne transformations. We present results on a number of video sequences and show how the model can be extended to track an object whose appearance changes throughout the sequence. 1

4 0.15232739 40 nips-2004-Common-Frame Model for Object Recognition

Author: Pierre Moreels, Pietro Perona

Abstract: A generative probabilistic model for objects in images is presented. An object consists of a constellation of features. Feature appearance and pose are modeled probabilistically. Scene images are generated by drawing a set of objects from a given database, with random clutter sprinkled on the remaining image surface. Occlusion is allowed. We study the case where features from the same object share a common reference frame. Moreover, parameters for shape and appearance densities are shared across features. This is to be contrasted with previous work on probabilistic ‘constellation’ models where features depend on each other, and each feature and model have different pose and appearance statistics [1, 2]. These two differences allow us to build models containing hundreds of features, as well as to train each model from a single example. Our model may also be thought of as a probabilistic revisitation of Lowe’s model [3, 4]. We propose an efﬁcient entropy-minimization inference algorithm that constructs the best interpretation of a scene as a collection of objects and clutter. We test our ideas with experiments on two image databases. We compare with Lowe’s algorithm and demonstrate better performance, in particular in presence of large amounts of background clutter.

5 0.1512101 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

Author: Le Lu, Gregory D. Hager, Laurent Younes

Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we ﬁrst extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then reﬁned using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.

6 0.13311227 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

7 0.12326261 99 nips-2004-Learning Hyper-Features for Visual Identification

8 0.10505424 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

9 0.10298184 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters

10 0.10068383 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

11 0.099237919 44 nips-2004-Conditional Random Fields for Object Recognition

12 0.090357378 175 nips-2004-Stable adaptive control with online learning

13 0.086173624 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

14 0.08594881 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

15 0.079326376 48 nips-2004-Convergence and No-Regret in Multiagent Learning

16 0.078629792 160 nips-2004-Seeing through water

17 0.077211499 28 nips-2004-Bayesian inference in spiking neurons

18 0.071333855 179 nips-2004-Surface Reconstruction using Learned Shape Models

19 0.071201406 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data

20 0.067148313 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.232), (1, -0.011), (2, 0.088), (3, -0.261), (4, 0.299), (5, -0.157), (6, 0.008), (7, -0.203), (8, -0.037), (9, 0.066), (10, 0.007), (11, 0.074), (12, 0.153), (13, 0.133), (14, 0.011), (15, -0.004), (16, 0.0), (17, -0.104), (18, -0.093), (19, 0.025), (20, -0.131), (21, -0.047), (22, 0.154), (23, -0.02), (24, -0.067), (25, 0.055), (26, 0.021), (27, 0.051), (28, 0.131), (29, 0.048), (30, -0.039), (31, -0.016), (32, 0.056), (33, 0.153), (34, -0.012), (35, 0.121), (36, -0.044), (37, 0.176), (38, -0.028), (39, -0.051), (40, 0.072), (41, -0.021), (42, -0.024), (43, 0.043), (44, -0.061), (45, -0.004), (46, -0.008), (47, 0.062), (48, 0.01), (49, -0.064)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97757447 83 nips-2004-Incremental Learning for Visual Tracking

Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang

2 0.86593378 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang

3 0.80816007 73 nips-2004-Generative Affine Localisation and Tracking

Author: John Winn, Andrew Blake

4 0.53019553 40 nips-2004-Common-Frame Model for Object Recognition

Author: Pierre Moreels, Pietro Perona

5 0.47113389 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

Author: Le Lu, Gregory D. Hager, Laurent Younes

6 0.41194806 99 nips-2004-Learning Hyper-Features for Visual Identification

7 0.40918288 29 nips-2004-Beat Tracking the Graphical Model Way

8 0.38664278 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

9 0.38476095 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters

10 0.33351704 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces

11 0.32804793 44 nips-2004-Conditional Random Fields for Object Recognition

12 0.29707986 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities

13 0.2902987 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

14 0.28220326 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data

15 0.28134584 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

16 0.27116698 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations

17 0.26812065 206 nips-2004-Worst-Case Analysis of Selective Sampling for Linear-Threshold Algorithms

18 0.26784083 193 nips-2004-Theories of Access Consciousness

19 0.26177549 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

20 0.25590119 183 nips-2004-Temporal-Difference Networks

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.063), (15, 0.169), (26, 0.045), (31, 0.017), (33, 0.121), (35, 0.033), (39, 0.023), (50, 0.044), (71, 0.399), (76, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83061302 83 nips-2004-Incremental Learning for Visual Tracking

Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang

2 0.76968992 24 nips-2004-Approximately Efficient Online Mechanism Design

Author: David C. Parkes, Dimah Yanovsky, Satinder P. Singh

Abstract: Online mechanism design (OMD) addresses the problem of sequential decision making in a stochastic environment with multiple self-interested agents. The goal in OMD is to make value-maximizing decisions despite this self-interest. In previous work we presented a Markov decision process (MDP)-based approach to OMD in large-scale problem domains. In practice the underlying MDP needed to solve OMD is too large and hence the mechanism must consider approximations. This raises the possibility that agents may be able to exploit the approximation for selﬁsh gain. We adopt sparse-sampling-based MDP algorithms to implement efﬁcient policies, and retain truth-revelation as an approximate BayesianNash equilibrium. Our approach is empirically illustrated in the context of the dynamic allocation of WiFi connectivity to users in a coffeehouse. 1

3 0.63309097 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

Author: Laurent Zwald, Gilles Blanchard, Pascal Massart, Régis Vert

Abstract: This paper investigates the effect of Kernel Principal Component Analysis (KPCA) within the classiﬁcation framework, essentially the regularization properties of this dimensionality reduction method. KPCA has been previously used as a pre-processing step before applying an SVM but we point out that this method is somewhat redundant from a regularization point of view and we propose a new algorithm called Kernel Projection Machine to avoid this redundancy, based on an analogy with the statistical framework of regression for a Gaussian white noise model. Preliminary experimental results show that this algorithm reaches the same performances as an SVM. 1

4 0.6093418 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang

5 0.53196812 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters

Author: Tim K. Marks, J. C. Roddey, Javier R. Movellan, John R. Hershey

Abstract: We present a generative model and stochastic ﬁltering algorithm for simultaneous tracking of 3D position and orientation, non-rigid motion, object texture, and background texture using a single camera. We show that the solution to this problem is formally equivalent to stochastic ﬁltering of conditionally Gaussian processes, a problem for which well known approaches exist [3, 8]. We propose an approach based on Monte Carlo sampling of the nonlinear component of the process (object motion) and exact ﬁltering of the object and background textures given the sampled motion. The smoothness of image sequences in time and space is exploited by using Laplace’s method to generate proposal distributions for importance sampling [7]. The resulting inference algorithm encompasses both optic ﬂow and template-based tracking as special cases, and elucidates the conditions under which these methods are optimal. We demonstrate an application of the system to 3D non-rigid face tracking. 1 Background Recent algorithms track morphable objects by solving optic ﬂow equations, subject to the constraint that the tracked points belong to an object whose non-rigid deformations are linear combinations of a set of basic shapes [10, 2, 11]. These algorithms require precise initialization of the object pose and tend to drift out of alignment on long video sequences. We present G-ﬂow, a generative model and stochastic ﬁltering formulation of tracking that address the problems of initialization and error recovery in a principled manner. We deﬁne a non-rigid object by the 3D locations of n vertices. The object is a linear combination of k ﬁxed morph bases, with coefﬁcients c = [c1 , c2 , · · · , ck ]T . The ﬁxed 3 × k matrix hi contains the position of the ith vertex in all k morph bases. The transformation from object-centered to image coordinates consists of a rotation, weak perspective projection, and translation. Thus xi , the 2D location of the ith vertex on the image plane, is xi = grhi c + l, (1) where r is the 3 × 3 rotation matrix, l is the 2 × 1 translation vector, and g = 1 0 0 is the 010 projection matrix. The object pose, ut , comprises both the rigid motion parameters and the morph parameters at time t: ut = {r(t), l(t), c(t)}. (2) 1.1 Optic ﬂow Let yt represent the current image, and let xi (ut ) index the image pixel that is rendered by the ith object vertex when the object assumes pose ut . Suppose that we know ut−1 , the pose at time t − 1, and we want to ﬁnd ut , the pose at time t. This problem can be solved by minimizing the following form with respect to ut : ut = argmin ˆ ut 1 2 n 2 [yt (xi (ut )) − yt−1 (xi (ut−1 ))] . (3) i=1 In the special case in which the xi (ut ) are neighboring points that move with the same 2D displacement, this reduces to the standard Lucas-Kanade optic ﬂow algorithm [9, 1]. Recent work [10, 2, 11] has shown that in the general case, this optimization problem can be solved efﬁciently using the Gauss-Newton method. We will take advantage of this fact to develop an efﬁcient stochastic inference algorithm within the framework of G-ﬂow. Notational conventions Unless otherwise stated, capital letters are used for random variables, small letters for speciﬁc values taken by random variables, and Greek letters for ﬁxed model parameters. Subscripted colons indicate sequences: e.g., X1:t = X1 · · · Xt . The term In stands for the n × n identity matrix, E for expected value, V ar for the covariance matrix, and V ar−1 for the inverse of the covariance matrix (precision matrix). 2 The Generative Model for G-Flow Figure 1: Left: a(Ut ) determines which texel (color at a vertex of the object model or a pixel of the background model) is responsible for rendering each image pixel. Right: G-ﬂow video generation model: At time t, the object’s 3D pose, Ut , is used to project the object texture, Vt , into 2D. This projection is combined with the background texture, Bt , to generate the observed image, Yt . We model the image sequence Y as a stochastic process generated by three hidden causes, U , V , and B, as shown in the graphical model (Figure 1, right). The m × 1 random vector Yt represents the m-pixel image at time t. The n × 1 random vector Vt and the m × 1 random vector Bt represent the n-texel object texture and the m-texel background texture, respectively. As illustrated in Figure 1, left, the object pose, Ut , determines onto which image pixels the object and background texels project at time t. This is formulated using the projection function a(Ut ). For a given pose, ut , the projection a(ut ) is a block matrix, def a(ut ) = av (ut ) ab (ut ) . Here av (ut ), the object projection function, is an m × n matrix of 0s and 1s that tells onto which image pixel each object vertex projects; e.g., a 1 at row j, column i it means that the ith object point projects onto image pixel j. Matrix ab plays the same role for background pixels. Assuming the foreground mapping is one-toone, we let ab = Im −av (ut )av (ut )T , expressing the simple occlusion constraint that every image pixel is rendered by object or background, but not both. In the G-ﬂow generative model: Vt Yt = a(Ut ) + Wt Wt ∼ N (0, σw Im ), σw > 0 Bt (4) Ut ∼ p(ut | ut−1 ) v v Vt = Vt−1 + Zt−1 Zt−1 ∼ N (0, Ψv ), Ψv is diagonal b b Bt = Bt−1 + Zt−1 Zt−1 ∼ N (0, Ψb ), Ψb is diagonal where p(ut | ut−1 ) is the pose transition distribution, and Z v , Z b , W are independent of each other, of the initial conditions, and over time. The form of the pose distribution is left unspeciﬁed since the algorithm proposed here does not require the pose distribution or the pose dynamics to be Gaussian. For the initial conditions, we require that the variance of V1 and the variance of B1 are both diagonal. Non-rigid 3D tracking is a difﬁcult nonlinear ﬁltering problem because changing the pose has a nonlinear effect on the image pixels. Fortunately, the problem has a rich structure that we can exploit: under the G-ﬂow model, video generation is a conditionally Gaussian process [3, 6, 4, 5]. If the speciﬁc values taken by the pose sequence, u1:t , were known, then the texture processes, V and B, and the image process, Y , would be jointly Gaussian. This suggests the following scheme: we could use particle ﬁltering to obtain a distribution of pose experts (each expert corresponds to a highly probable sample of pose, u1:t ). For each expert we could then use Kalman ﬁltering equations to infer the posterior distribution of texture given the observed images. This method is known in the statistics community as a Monte Carlo ﬁltering solution for conditionally Gaussian processes [3, 4], and in the machine learning community as Rao-Blackwellized particle ﬁltering [6, 5]. We found that in addition to Rao-Blackwellization, it was also critical to use Laplace’s method to generate the proposal distributions for importance sampling [7]. In the context of G-ﬂow, we accomplished this by performing an optic ﬂow-like optimization, using an efﬁcient algorithm similar to those in [10, 2]. 3 Inference Our goal is to ﬁnd an expression for the ﬁltering distribution, p(ut , vt , bt | y1:t ). Using the law of total probability, we have the following equation for the ﬁltering distribution: p(ut , vt , bt | y1:t ) = p(ut , vt , bt | u1:t−1 , y1:t ) p(u1:t−1 | y1:t ) du1:t−1 Opinion of expert (5) Credibility of expert We can think of the integral in (5) as a sum over a distribution of experts, where each expert corresponds to a single pose history, u1:t−1 . Based on its hypothesis about pose history, each expert has an opinion about the current pose of the object, Ut , and the texture maps of the object and background, Vt and Bt . Each expert also has a credibility, a scalar that measures how well the expert’s opinion matches the observed image yt . Thus, (5) can be interpreted as follows: The ﬁltering distribution at time t is obtained by integrating over the entire ensemble of experts the opinion of each expert weighted by that expert’s credibility. The opinion distribution of expert u1:t−1 can be factorized into the expert’s opinion about the pose Ut times the conditional distribution of texture Vt , Bt given pose: p(ut , vt , bt | u1:t−1 , y1:t ) = p(ut | u1:t−1 , y1:t ) p(vt , bt | u1:t , y1:t ) (6) Opinion of expert Pose Opinion Texture Opinion given pose The rest of this section explains how we evaluate each term in (5) and (6). We cover the distribution of texture given pose in 3.1, pose opinion in 3.2, and credibility in 3.3. 3.1 Texture opinion given pose The distribution of Vt and Bt given the pose history u1:t is Gaussian with mean and covariance that can be obtained using the Kalman ﬁlter estimation equations: −1 V ar−1 (Vt , Bt | u1:t , y1:t ) = V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw a(ut ) E(Vt , Bt | u1:t , y1:t ) = V ar(Vt , Bt | u1:t , y1:t ) −1 × V ar−1 (Vt , Bt | u1:t−1 , y1:t−1 )E(Vt , Bt | u1:t−1 , y1:t−1 ) + a(ut )T σw yt (7) (8) This requires p(Vt , Bt |u1:t−1 , y1:t−1 ), which we get from the Kalman prediction equations: E(Vt , Bt | u1:t−1 , y1:t−1 ) = E(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) V ar(Vt , Bt | u1:t−1 , y1:t−1 ) = V ar(Vt−1 , Bt−1 | u1:t−1 , y1:t−1 ) + (9) Ψv 0 0 Ψb (10) In (9), the expected value E(Vt , Bt | u1:t−1 , y1:t−1 ) consists of texture maps (templates) for the object and background. In (10), V ar(Vt , Bt | u1:t−1 , y1:t−1 ) represents the degree of uncertainty about each texel in these texture maps. Since this is a diagonal matrix, we can refer to the mean and variance of each texel individually. For the ith texel in the object texture map, we use the following notation: µv (i) t v σt (i) def = ith element of E(Vt | u1:t−1 , y1:t−1 ) def = (i, i)th element of V ar(Vt | u1:t−1 , y1:t−1 ) b Similarly, deﬁne µb (j) and σt (j) as the mean and variance of the jth texel in the backt ground texture map. (This notation leaves the dependency on u1:t−1 and y1:t−1 implicit.) 3.2 Pose opinion Based on its current texture template (derived from the history of poses and images up to time t−1) and the new image yt , each expert u1:t−1 has a pose opinion, p(ut |u1:t−1 , y1:t ), a probability distribution representing that expert’s beliefs about the pose at time t. Since the effect of ut on the likelihood function is nonlinear, we will not attempt to ﬁnd an analytical solution for the pose opinion distribution. However, due to the spatio-temporal smoothness of video signals, it is possible to estimate the peak and variance of an expert’s pose opinion. 3.2.1 Estimating the peak of an expert’s pose opinion We want to estimate ut (u1:t−1 ), the value of ut that maximizes the pose opinion. Since ˆ p(ut | u1:t−1 , y1:t ) = p(y1:t−1 | u1:t−1 ) p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ), p(y1:t | u1:t−1 ) (11) def ut (u1:t−1 ) = argmax p(ut | u1:t−1 , y1:t ) = argmax p(ut | ut−1 ) p(yt | u1:t , y1:t−1 ). ˆ ut ut (12) We now need an expression for the ﬁnal term in (12), the predictive distribution p(yt | u1:t , y1:t−1 ). By integrating out the hidden texture variables from p(yt , vt , bt | u1:t , y1:t−1 ), and using the conditional independence relationships deﬁned by the graphical model (Figure 1, right), we can derive: 1 m log p(yt | u1:t , y1:t−1 ) = − log 2π − log |V ar(Yt | u1:t , y1:t−1 )| 2 2 n v 2 1 (yt (xi (ut )) − µt (i)) 1 (yt (j) − µb (j))2 t − − , (13) v (i) + σ b 2 i=1 σt 2 σt (j) + σw w j∈X (ut ) where xi (ut ) is the image pixel rendered by the ith object vertex when the object assumes pose ut , and X (ut ) is the set of all image pixels rendered by the object under pose ut . Combining (12) and (13), we can derive ut (u1:t−1 ) = argmin − log p(ut | ut−1 ) ˆ (14) ut + 1 2 n i=1 [yt (xi (ut )) − µv (i)]2 [yt (xi (ut )) − µb (xi (ut ))]2 t t b − − log[σt (xi (ut )) + σw ] v b σt (i) + σw σt (xi (ut )) + σw Foreground term Background terms Note the similarity between (14) and constrained optic ﬂow (3). For example, focus on the foreground term in (14) and ignore the weights in the denominator. The previous image yt−1 from (3) has been replaced by µv (·), the estimated object texture based on the images t and poses up to time t − 1. As in optic ﬂow, we can ﬁnd the pose estimate ut (u1:t−1 ) ˆ efﬁciently using the Gauss-Newton method. 3.2.2 Estimating the distribution of an expert’s pose opinion We estimate the distribution of an expert’s pose opinion using a combination of Laplace’s method and importance sampling. Suppose at time t − 1 we are given a sample of experts (d) (d) indexed by d, each endowed with a pose sequence u1:t−1 , a weight wt−1 , and the means and variances of Gaussian distributions for object and background texture. For each expert (d) (d) u1:t−1 , we use (14) to compute ut , the peak of the pose distribution at time t according ˆ (d) to that expert. Deﬁne σt as the inverse Hessian matrix of (14) at this peak, the Laplace ˆ estimate of the covariance matrix of the expert’s opinion. We then generate a set of s (d,e) (d) independent samples {ut : e = 1, · · · , s} from a Gaussian distribution with mean ut ˆ (d) (d) (d) and variance proportional to σt , g(·|ˆt , αˆt ), where the parameter α > 0 determines ˆ u σ the sharpness of the sampling distribution. (Note that letting α → 0 would be equivalent to (d,e) (d) simply setting the new pose equal to the peak of the pose opinion, ut = ut .) To ﬁnd ˆ the parameters of this Gaussian proposal distribution, we use the Gauss-Newton method, ignoring the second of the two background terms in (14). (This term is not ignored in the importance sampling step.) To reﬁne our estimate of the pose opinion we use importance sampling. We assign each sample from the proposal distribution an importance weight wt (d, e) that is proportional to the ratio between the posterior distribution and the proposal distribution: s (d) p(ut | u1:t−1 , y1:t ) = ˆ (d,e) δ(ut − ut ) wt (d, e) s f =1 wt (d, f ) (15) e=1 (d,e) (d) (d) (d,e) p(ut | ut−1 )p(yt | u1:t−1 , ut , y1:t−1 ) wt (d, e) = (16) (d,e) (d) (d) g(ut | ut , αˆt ) ˆ σ (d,e) (d) The numerator of (16) is proportional to p(ut |u1:t−1 , y1:t ) by (12), and the denominator of (16) is the sampling distribution. 3.3 Estimating an expert’s credibility (d) The credibility of the dth expert, p(u1:t−1 | y1:t ), is proportional to the product of a prior term and a likelihood term: (d) (d) p(u1:t−1 | y1:t−1 )p(yt | u1:t−1 , y1:t−1 ) (d) p(u1:t−1 | y1:t ) = . (17) p(yt | y1:t−1 ) Regarding the likelihood, p(yt |u1:t−1 , y1:t−1 ) = p(yt , ut |u1:t−1 , y1:t−1 )dut = p(yt |u1:t , y1:t−1 )p(ut |ut−1 )dut (18) (d,e) We already generated a set of samples {ut : e = 1, · · · , s} that estimate the pose opin(d) ion of the dth expert, p(ut | u1:t−1 , y1:t ). We can now use these samples to estimate the likelihood for the dth expert: (d) p(yt | u1:t−1 , y1:t−1 ) = (d) (d) p(yt | u1:t−1 , ut , y1:t−1 )p(ut | ut−1 )dut (19) (d) (d) (d) (d) = p(yt | u1:t−1 , ut , y1:t−1 )g(ut | ut , αˆt ) ˆ σ 3.4 p(ut | ut−1 ) s e=1 dut ≈ wt (d, e) s Updating the ﬁltering distribution g(ut | (d) (d) ut , αˆt ) ˆ σ Once we have calculated the opinion and credibility of each expert u1:t−1 , we evaluate the integral in (5) as a weighted sum over experts. The credibilities of all of the experts are normalized to sum to 1. New experts u1:t (children) are created from the old experts u1:t−1 (parents) by appending a pose ut to the parent’s history of poses u1:t−1 . Every expert in the new generation is created as follows: One parent is chosen to sire the child. The probability of being chosen is proportional to the parent’s credibility. The child’s value of ut is chosen at random from its parent’s pose opinion (the weighted samples described in Section 3.2.2). 4 Relation to Optic Flow and Template Matching In basic template-matching, the same time-invariant texture map is used to track every frame in the video sequence. Optic ﬂow can be thought of as template-matching with a template that is completely reset at each frame for use in the subsequent frame. In most cases, optimal inference under G-ﬂow involves a combination of optic ﬂow-based and template-based tracking, in which the texture template gradually evolves as new images are presented. Pure optic ﬂow and template-matching emerge as special cases. Optic Flow as a Special Case Suppose that the pose transition probability p(ut | ut−1 ) is uninformative, that the background is uninformative, that every texel in the initial object texture map has equal variance, V ar(V1 ) = κIn , and that the texture transition uncertainty is very high, Ψv → diag(∞). Using (7), (8), and (10), it follows that: µv (i) = [av (ut−1 )]T yt−1 = yt−1 (xi (ut−1 )) , t (20) i.e., the object texture map at time t is determined by the pixels from image yt−1 that according to pose ut−1 were rendered by the object. As a result, (14) reduces to: ut (u1:t−1 ) = argmin ˆ ut 1 2 n yt (xi (ut )) − yt−1 (xi (ut−1 )) 2 (21) i=1 which is identical to (3). Thus constrained optic ﬂow [10, 2, 11] is simply a special case of optimal inference under G-ﬂow, with a single expert and with sampling parameter α → 0. The key assumption that Ψv → diag(∞) means that the object’s texture is very different in adjacent frames. However, optic ﬂow is typically applied in situations in which the object’s texture in adjacent frames is similar. The optimal solution in such situations calls not for optic ﬂow, but for a texture map that integrates information across multiple frames. Template Matching as a Special Case Suppose the initial texture map is known precisely, V ar(V1 ) = 0, and the texture transition uncertainty is very low, Ψv → 0. By (7), (8), and (10), it follows that µv (i) = µv (i) = µv (i), i.e., the texture map does not change t t−1 1 over time, but remains ﬁxed at its initial value (it is a texture template). Then (14) becomes: n yt (xi (ut )) − µv (i) 1 ut (u1:t−1 ) = argmin ˆ ut 2 (22) i=1 where µv (i) is the ith texel of the ﬁxed texture template. This is the error function mini1 mized by standard template-matching algorithms. The key assumption that Ψv → 0 means the object’s texture is constant from each frame to the next, which is rarely true in real data. G-ﬂow provides a principled way to relax this unrealistic assumption of template methods. General Case In general, if the background is uninformative, then minimizing (14) results in a weighted combination of optic ﬂow and template matching, with the weight of each approach depending on the current level of certainty about the object template. In addition, when there is useful information in the background, G-ﬂow infers a model of the background which is used to improve tracking. Figure 2: G-ﬂow tracking an outdoor video. Results are shown for frames 1, 81, and 620. 5 Simulations We collected a video (30 frames/sec) of a subject in an outdoor setting who made a variety of facial expressions while moving her head. A later motion-capture session was used to create a 3D morphable model of her face, consisting of a set of 5 morph bases (k = 5). Twenty experts were initialized randomly near the correct pose on frame 1 of the video and propagated using G-ﬂow inference (assuming an uninformative background). See http://mplab.ucsd.edu for video. Figure 2 shows the distribution of experts for three frames. In each frame, every expert has a hypothesis about the pose (translation, rotation, scale, and morph coefﬁcients). The 38 points in the model are projected into the image according to each expert’s pose, yielding 760 red dots in each frame. In each frame, the mean of the experts gives a single hypothesis about the 3D non-rigid deformation of the face (lower right) as well as the rigid pose of the face (rotated 3D axes, lower left). Notice G-ﬂow’s ability to recover from error: bad initial hypotheses are weeded out, leaving only good hypotheses. To compare G-ﬂow’s performance versus deterministic constrained optic ﬂow algorithms such as [10, 2, 11] , we used both G-ﬂow and the method from [2] to track the same video sequence. We ran each tracker several times, introducing small errors in the starting pose. Figure 3: Average error over time for G-ﬂow (green) and for deterministic optic ﬂow [2] (blue). Results were averaged over 16 runs (deterministic algorithm) or 4 runs (G-ﬂow) and smoothed. As ground truth, the 2D locations of 6 points were hand-labeled in every 20th frame. The error at every 20th frame was calculated as the distance from these labeled locations to the inferred (tracked) locations, averaged across several runs. Figure 3 compares this tracking error as a function of time for the deterministic constrained optic ﬂow algorithm and for a 20-expert version of the G-ﬂow tracking algorithm. Notice that the deterministic system has a tendency to drift (increase in error) over time, whereas G-ﬂow can recover from drift. Acknowledgments Tim K. Marks was supported by NSF grant IIS-0223052 and NSF grant DGE-0333451 to GWC. John Hershey was supported by the UCDIMI grant D00-10084. J. Cooper Roddey was supported by the Swartz Foundation. Javier R. Movellan was supported by NSF grants IIS-0086107, IIS-0220141, and IIS-0223052, and by the UCDIMI grant D00-10084. References [1] Simon Baker and Iain Matthews. Lucas-kanade 20 years on: A unifying framework. International Journal of Computer Vision, 56(3):221–255, 2002. [2] M. Brand. Flexible ﬂow for 3D nonrigid tracking and shape recovery. In CVPR, volume 1, pages 315–322, 2001. [3] H. Chen, P. Kumar, and J. van Schuppen. On Kalman ﬁltering for conditionally gaussian systems with random matrices. Syst. Contr. Lett., 13:397–404, 1989. [4] R. Chen and J. Liu. Mixture Kalman ﬁlters. J. R. Statist. Soc. B, 62:493–508, 2000. [5] A. Doucet and C. Andrieu. Particle ﬁltering for partially observed gaussian state space models. J. R. Statist. Soc. B, 64:827–838, 2002. [6] A. Doucet, N. de Freitas, K. Murphy, and S. Russell. Rao-blackwellised particle ﬁltering for dynamic bayesian networks. In 16th Conference on Uncertainty in AI, pages 176–183, 2000. [7] A. Doucet, S. J. Godsill, and C. Andrieu. On sequential monte carlo sampling methods for bayesian ﬁltering. Statistics and Computing, 10:197–208, 2000. [8] Zoubin Ghahramani and Geoffrey E. Hinton. Variational learning for switching state-space models. Neural Computation, 12(4):831–864, 2000. [9] B. Lucas and T. Kanade. An iterative image registration technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artiﬁcial Intelligence, 1981. [10] L. Torresani, D. Yang, G. Alexander, and C. Bregler. Tracking and modeling non-rigid objects with rank constraints. In CVPR, pages 493–500, 2001. [11] Lorenzo Torresani, Aaron Hertzmann, and Christoph Bregler. Learning non-rigid 3d shape from 2d motion. In Advances in Neural Information Processing Systems 16. MIT Press, 2004.

6 0.50798178 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

7 0.50342506 73 nips-2004-Generative Affine Localisation and Tracking

8 0.48546287 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

9 0.48301822 12 nips-2004-A Temporal Kernel-Based Model for Tracking Hand Movements from Neural Activities

10 0.48219088 183 nips-2004-Temporal-Difference Networks

11 0.48189881 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

12 0.48154464 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression

13 0.48053497 159 nips-2004-Schema Learning: Experience-Based Construction of Predictive Action Models

14 0.48010629 168 nips-2004-Semigroup Kernels on Finite Sets

15 0.47981164 92 nips-2004-Kernel Methods for Implicit Surface Modeling

16 0.47950992 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

17 0.4788304 148 nips-2004-Probabilistic Computation in Spiking Populations

18 0.47865793 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

19 0.47812366 178 nips-2004-Support Vector Classification with Input Data Uncertainty

20 0.47673574 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection