nips nips2002 nips2002-87 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anitha Kannan, Nebojsa Jojic, Brendan J. Frey
Abstract: Dimensionality reduction techniques such as principal component analysis and factor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations such as translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction. However, due to enormous computational requirements of the EM algorithm for learning the model, O( ) where is the dimensionality of a data sample, MTCA was not practical for most applications. In this paper, we demonstrate how fast Fourier transforms can reduce the computation to the order of log . With this speedup, we show the effectiveness of MTCA in various applications - tracking, video textures, clustering video sequences, object recognition, and object detection in images. ¡ ¤ ¤ ¤ ¤
Reference: text
sentIndex sentText sentNum sentScore
1 Fast Transformation-Invariant Factor Analysis ¡ Nebojsa Jojic Anitha Kannan Brendan Frey University of Toronto, Toronto, Canada anitha, frey @psi. [sent-1, score-0.097]
2 com Abstract Dimensionality reduction techniques such as principal component analysis and factor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. [sent-4, score-0.243]
3 In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations such as translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction. [sent-5, score-0.727]
4 However, due to enormous computational requirements of the EM algorithm for learning the model, O( ) where is the dimensionality of a data sample, MTCA was not practical for most applications. [sent-6, score-0.084]
5 In this paper, we demonstrate how fast Fourier transforms can reduce the computation to the order of log . [sent-7, score-0.054]
6 With this speedup, we show the effectiveness of MTCA in various applications - tracking, video textures, clustering video sequences, object recognition, and object detection in images. [sent-8, score-0.637]
7 ¡ ¤ ¤ ¤ ¤ 1 Introduction Dimensionality reduction techniques such as principal component analysis [7] and factor analysis [1] linearly map high dimensional data samples onto points in a lower dimensional subspace. [sent-9, score-0.243]
8 In factor analysis, this mapping is defined by subspace origin and the subspace bases stored in the columns of the factor loading matrix, . [sent-10, score-0.598]
9 A mixture of factor analyzers learn to place the data into several learned subspaces. [sent-11, score-0.353]
10 In computer vision, this approach has been widely used in face modeling for learning facial expressions (e. [sent-12, score-0.174]
11 ¥ ¦ When the variability in the data is due, in part, to small transformations such as translations, scales and rotations, factor analyzer learns a linearized transformation manifold which is often sufficient ( [4], [11]). [sent-15, score-0.485]
12 However, for large transformations present in the data, linear approximation is insufficient. [sent-16, score-0.149]
13 For instance, a factor analyzer trained on a video sequence of a person walking tries to capture a linearized model of large translations (fig. [sent-17, score-0.686]
14 ) as opposed to learning local deformations such as motion of legs and hands (fig. [sent-19, score-0.197]
15 In [6], it was shown that a discrete hidden transformation variable , enables clustering and learning subspaces within clusters, invariant to global transformations. [sent-22, score-0.207]
16 However, experiments were done on images of very low resolution due to enormous computational cost of EM algorithm used for learning the model. [sent-23, score-0.092]
17 It is known that fast Fourier transform(FFT) § Figure 1: Mixture of transformed component analyzers (MTCA). [sent-24, score-0.257]
18 ¡ ¤ ¡ # We present experimental results showing the effectiveness of MTCA in various applications - tracking, video textures, clustering video sequences, object recognition and detection. [sent-32, score-0.622]
19 2 Fast inference and learning in MTCA Mixture of transformation-invariant component analyzers (MTCA) is used for transformation-invariant clustering and learning a subspace representation within each cluster. [sent-33, score-0.512]
20 The vector is a dimensional Gaus0 I random variable. [sent-38, score-0.041]
21 Cluster index, c is a C-valued discrete random variable with sian dimensional ( )latent image, has mean, probability distribution, . [sent-39, score-0.041]
22 The , and diagonal covariance, ; the x matrix is the factor loading matrix for class c. [sent-40, score-0.12]
23 An observation is obtained by applying a transformation , (with distribution ) on the latent image and adding independent Gaussian noise, . [sent-41, score-0.315]
24 The subspace coordinates , are used to generate a latent image, (without noise), and the horizontal and vertical image position and are used to shift the latent image to obtain . [sent-44, score-0.828]
25 In fact, , and shown in the figure are actually inferred from the captured video sequence (see sec. [sent-45, score-0.337]
26 9 ( X h)$X ged cbaY' ( X i0 $ 1 ( ' ( f 1 G 0 0 9 0 ` 0 0 I § ¥ Figure 2: Means and components in learned using (a) FA, (b) FA applied on data normalized using a correlation tracker, and (c) transformed component analysis (TCA) applied directly on data. [sent-49, score-0.242]
27 b` ( p Performing inference on transformations and class, § (1) ¦ ¦ § ¥ § § § § is (2) ¦ ¦ § ¥ The likelihood of requires evaluating the joint, § The parameters of the model are learned from i. [sent-53, score-0.278]
28 d training examples by maximizing their likelihood ( ) using an exact EM algorithm. [sent-55, score-0.042]
29 2) requires summing over all possible transformations and is very expensive. [sent-59, score-0.149]
30 In fact, each of inference and update equations in [6] has a complexity of . [sent-60, score-0.052]
31 We focus on Fourier domain at a considerably lower computational cost of inferring the means of and as examples for efficient computation. [sent-62, score-0.042]
32 # ¡ § We assume that data is represented in a coordinate system in which transformations are discrete shifts with wrap-around. [sent-70, score-0.266]
33 For translations, it is 2D rectangular grid, while for rotations and scales it is shifts in a radial grid (c. [sent-71, score-0.186]
34 We also assume that the posttransformation noise is isotropic, , so that covariances matrices,COV and COV become independent of . [sent-74, score-0.043]
35 In fact, for isotropic , it is possible to preset (in our experiments we set it to . [sent-75, score-0.067]
36 § S § # $ § First, we describe the notation that simplifies expressions for transformation that corresponds to a shift in input coordinates. [sent-81, score-0.164]
37 Define to be an integer vector in the coordinate system in which input is measured. [sent-82, score-0.051]
38 Vectors in the input coordinate system such as are defined this way. [sent-84, score-0.051]
39 For diagonal matrices such as , defines the element corresponding to pixel at coordinate . [sent-85, score-0.128]
40 This notation enables treating transformations corresponding to a shift be represented as a vector in the same system, so that a shift of by is represented as such that mod mod . [sent-86, score-0.393]
41 Factor lighting variation and tends to model small out-of-plane rotations ¨ ¦ ¤ ¡ ¤ ¢ £ In the appendix, we show that all expensive operations in inference and learning involve computing correlation ( ), or convolution( ). [sent-88, score-0.291]
42 for all shifts in frequency domain, while it is in the pixel domain. [sent-91, score-0.102]
43 " # 1 t( § ( 0)¤ In principal component analysis (PCA), where there is no noise, the data is projected to subspace through the projection matrix. [sent-100, score-0.343]
44 Similarly, in MTCA, we can derive that when and , the projection matrix is and it accounts for variances. [sent-101, score-0.051]
45 is the inverse of noise variances in input space, and is the inverse of the noise variances in the projected subspace. [sent-102, score-0.162]
46 The mean of subspace for a given , c, and is obtained by subtracting the mean of the latent image from the transformation-normalized and applying the projection matrix: . [sent-103, score-0.485]
47 with the probability map defined for all ¤ The inference on the latent image is given by its expected value: G ( I #¡% 1 2 1 ( I 4# d 0 G ! [sent-112, score-0.283]
48 § P Figure 4: Comparison of FA applied on data normalized for translations using correlation tracker and TCA. [sent-115, score-0.285]
49 (b) shift normalized frames, using correlation-based tracker and obtained through factor analysis model. [sent-117, score-0.312]
50 £ ¨ ¡ ¤ §¦ ¤ ¢ £ ¥ ¡ § ¦¡ ¢ £ ¨ ¦ ¡ ¤§&¤ ¢ Figure 5: Simulated walk sequence synthesized after training an AR model on the subspace and image motion parameters. [sent-119, score-0.474]
51 The first row contains a few frames from the sequence simulated directly from the model. [sent-121, score-0.223]
52 The second row contains a few frames from the video texture generated by picking the frames in the original sequence for which the recent subspace trajectory was similar to the one generated by the AR model. [sent-122, score-0.725]
53 ( p © 9 S EP ¨ § ¥ § § ¦ Q is ¦ Note that multiplication with x matrices above can be done efficiently by factorizing them and applying a sequence of vector multiplication from right to left. [sent-127, score-0.065]
54 ¤ ¤ 3 Experimental Results Clustering face poses. [sent-128, score-0.117]
55 3b the first column shows examples from a video sequence of one person with different facial poses walking across cluttered background. [sent-130, score-0.573]
56 We trained a transformation-invariant mixture of Gaussians (TMG) [6] with 3 clusters that learned means shown in Fig. [sent-131, score-0.164]
57 However, due to presence of lighting variations and small out-of-plane rotations in addition to big pose changes, it is difficult for TMG to learn these variations without many more classes. [sent-134, score-0.308]
58 We trained a MTCA model with 3 classes and 2 factors, initializing the parameters to those learned by TMG. [sent-135, score-0.077]
59 3a compares TMG means and components to those learned using MTCA. [sent-137, score-0.077]
60 The MTCA model learns to capture small variations around the cluster means. [sent-138, score-0.139]
61 For example, for the first cluster, the two subspace coordinates tend to model out-of-plane rotations and illumination changes (Fig. [sent-139, score-0.372]
62 3b, we compare , ( ), of TMG and MTCA for various training examples, illustrating better tracking and appearance modelling of MTCA. [sent-142, score-0.15]
63 ` ( a3 " f ` Figure 6: Clustering faces extracted from a personal database prepared using face detector. [sent-146, score-0.271]
64 (a) £ Training examples (b) Means, variances and components for two classes learned using MTCA. [sent-147, score-0.157]
65 (c) column contains several photos in which the detector [8] failed to find the face. [sent-148, score-0.146]
66 ¡ ¢ § £ ¥ ¡ E¦ ¤ ¨ ¥ ¦¤ § © ¨ £ ¦¡ ¡ ¥ Modeling a walking person. [sent-151, score-0.057]
67 shows three 165x285 frames from a video sequence of a person walking. [sent-154, score-0.461]
68 A regular PCA or FA will learn a representation that focuses more on learning linearized shifts, and less on the more interesting motion of hands and legs (Fig. [sent-156, score-0.285]
69 The traditional approach is to track the object using, for example,a correlation tracker and then learn the subspace model on normalized images. [sent-159, score-0.495]
70 The parameters learned in this fashion are shown in Fig. [sent-160, score-0.077]
71 Without previously learning a good model, the tracker fails to provide the perfect tracking necessary for precise subspace modelling of limb motion and thus the inferred subspace projection is blurred. [sent-162, score-0.748]
72 As TCA performs tracking and learns appearance model at the same time, not only does it avoids the tracker initialization that plagues the ”tracking first” approaches, but also provides perfectly aligned and infers a much cleaner projection . [sent-165, score-0.372]
73 9" D # $ D The TCA model can be used to create video textures based on frame reshuffling similar to [10]. [sent-168, score-0.314]
74 However, instead of shuffling frames based directly on pixel similarity, we use the subspace position and image position generated from an AR process [9], and for each t find the best frame u in the original video for which the window is the most similar to . [sent-169, score-0.767]
75 Then, generated transformation is applied on the normalized image . [sent-170, score-0.243]
76 5b and contains a bit sharper images than the ones simulated directly from the generative model, fig. [sent-172, score-0.125]
77 We let the simulated walk last longer than in the original sequence letting MTCA live on twice as wide frames. [sent-174, score-0.144]
78 ' " D Clustering and face recognition We used a standard face detector [8] to obtain 85 32x32 images of faces of 2 persons, from a personal photo database of a mother and her daughter. [sent-179, score-0.53]
79 We learned a MTCA model with 2 classes and 4 factors. [sent-183, score-0.077]
80 To model global lighting variation, we preset one of the factors to be uniform at . [sent-184, score-0.191]
81 We also preset another factor to be smoothly varying in brightness (see fig. [sent-189, score-0.139]
82 The other two components are learned and they model slight appearance deformation such as facial expressions. [sent-192, score-0.21]
83 The model learned to cluster faces in the training example with accuracy. [sent-193, score-0.246]
84 ¥ £ ¡ ¦¤4 ¢ An interesting application is to use the learned representation of the faces to detect and recognize faces in the original photos. [sent-194, score-0.309]
85 For instance, the face detector did not recognize faces in many photographs (for eg. [sent-195, score-0.297]
86 6c), which we were able to using the learned model (fig. [sent-197, score-0.077]
87 We increased the resolution of model parameters to match the resolution of photos (640x480), padding around the original parameters with uniform mean, zero factors and high variance. [sent-199, score-0.095]
88 We also incorporated 3 rotations and 4 scales as possible transformations, in addition to all possible shifts. [sent-201, score-0.12]
89 6c , we present three examples which were not in the training set and the face detector we used failed. [sent-203, score-0.223]
90 In all three cases MTCA detected and recognized the face correctly as belonging to class 2. [sent-204, score-0.117]
91 9 " D § 4 Conclusion Mixture of transformation-invariant component analyzers is a technique for modeling visual data that finds major clusters and major transformations in the data and learns subspace models of appearance. [sent-206, score-0.615]
92 In this paper, we have described how a fast implementation of learning the model is possible through efficient use of identities from matrix algebra and the use of fast Fourier transforms. [sent-207, score-0.108]
93 Appendix: EM updates Before performing inference in Estep, we pre-compute,for all classes the quantities that are independent of training examples: I I D D D £ D9C A d ¥ ¡ ¨ ! [sent-208, score-0.052]
94 To compute this Computing posteriors over distribution, we require the determinant of covariance matrix, COV and the Mahalanobis distance between input and the transformed latent image. [sent-226, score-0.181]
95 The determinant is simplified to COV The Mahalanobis distance between and the latent image is 9E n ` Hm H p oT' Tq i ' rq i C A D log time. [sent-227, score-0.265]
96 Modeling the manifolds of images of handwritten digits In IEEE Transactions on Neural Networks 1997 [6] Jojic, N. [sent-265, score-0.044]
97 Topographic transformation as a discrete latent variable In Advances in Neural Information Processing Systems 13. [sent-268, score-0.196]
98 & Irfan Essa Video textures In Proceedings of SIGGRAPH2000 [11] Simard, P. [sent-286, score-0.089]
99 Efficient pattern recognition using a new transformation distance In Advances in Neural Information Processing Systems 1993 [12] Turk, M. [sent-289, score-0.118]
100 Robust image registration using log-polar transform In Proceedings IEEE Intl. [sent-293, score-0.119]
wordName wordTfidf (topN-words)
[('mtca', 0.543), ('video', 0.225), ('tmg', 0.211), ('subspace', 0.203), ('transformations', 0.149), ('ac', 0.129), ('tca', 0.121), ('tracker', 0.12), ('analyzers', 0.12), ('rotations', 0.12), ('image', 0.119), ('face', 0.117), ('faces', 0.116), ('frames', 0.116), ('vp', 0.115), ('latent', 0.112), ('jojic', 0.105), ('frey', 0.097), ('uvp', 0.09), ('textures', 0.089), ('linearized', 0.089), ('clustering', 0.089), ('transformation', 0.084), ('translations', 0.083), ('shift', 0.08), ('ue', 0.079), ('learned', 0.077), ('cov', 0.077), ('lighting', 0.077), ('appearance', 0.076), ('tracking', 0.074), ('factor', 0.072), ('cu', 0.067), ('preset', 0.067), ('fa', 0.066), ('shifts', 0.066), ('sequence', 0.065), ('detector', 0.064), ('fourier', 0.064), ('em', 0.064), ('anitha', 0.06), ('hands', 0.06), ('lcw', 0.06), ('ph', 0.059), ('eg', 0.058), ('sg', 0.057), ('facial', 0.057), ('walking', 0.057), ('person', 0.055), ('fast', 0.054), ('cluster', 0.053), ('inference', 0.052), ('projection', 0.051), ('learns', 0.051), ('coordinate', 0.051), ('ar', 0.05), ('motion', 0.05), ('object', 0.049), ('coordinates', 0.049), ('ep', 0.048), ('enormous', 0.048), ('loading', 0.048), ('photos', 0.048), ('component', 0.048), ('factors', 0.047), ('inferred', 0.047), ('toronto', 0.047), ('legs', 0.045), ('mahalanobis', 0.045), ('clusters', 0.044), ('images', 0.044), ('mixture', 0.043), ('noise', 0.043), ('correlation', 0.042), ('simulated', 0.042), ('deformations', 0.042), ('mod', 0.042), ('examples', 0.042), ('learn', 0.041), ('principal', 0.041), ('element', 0.041), ('dimensional', 0.041), ('cb', 0.04), ('analyzer', 0.04), ('normalized', 0.04), ('generative', 0.039), ('hd', 0.038), ('personal', 0.038), ('poses', 0.038), ('variances', 0.038), ('walk', 0.037), ('dimensionality', 0.036), ('pixel', 0.036), ('transformed', 0.035), ('variations', 0.035), ('column', 0.034), ('position', 0.034), ('recognition', 0.034), ('subspaces', 0.034), ('determinant', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999917 87 nips-2002-Fast Transformation-Invariant Factor Analysis
Author: Anitha Kannan, Nebojsa Jojic, Brendan J. Frey
Abstract: Dimensionality reduction techniques such as principal component analysis and factor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations such as translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction. However, due to enormous computational requirements of the EM algorithm for learning the model, O( ) where is the dimensionality of a data sample, MTCA was not practical for most applications. In this paper, we demonstrate how fast Fourier transforms can reduce the computation to the order of log . With this speedup, we show the effectiveness of MTCA in various applications - tracking, video textures, clustering video sequences, object recognition, and object detection in images. ¡ ¤ ¤ ¤ ¤
2 0.15347065 39 nips-2002-Bayesian Image Super-Resolution
Author: Michael E. Tipping, Christopher M. Bishop
Abstract: The extraction of a single high-quality image from a set of lowresolution images is an important problem which arises in fields such as remote sensing, surveillance, medical imaging and the extraction of still images from video. Typical approaches are based on the use of cross-correlation to register the images followed by the inversion of the transformation from the unknown high resolution image to the observed low resolution images, using regularization to resolve the ill-posed nature of the inversion process. In this paper we develop a Bayesian treatment of the super-resolution problem in which the likelihood function for the image registration parameters is based on a marginalization over the unknown high-resolution image. This approach allows us to estimate the unknown point spread function, and is rendered tractable through the introduction of a Gaussian process prior over images. Results indicate a significant improvement over techniques based on MAP (maximum a-posteriori) point optimization of the high resolution image and associated registration parameters. 1
3 0.14362626 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
Author: Christopher Williams, Michalis K. Titsias
Abstract: We consider data which are images containing views of multiple objects. Our task is to learn about each of the objects present in the images. This task can be approached as a factorial learning problem, where each image must be explained by instantiating a model for each of the objects present with the correct instantiation parameters. A major problem with learning a factorial model is that as the number of objects increases, there is a combinatorial explosion of the number of configurations that need to be considered. We develop a method to extract object models sequentially from the data by making use of a robust statistical method, thus avoiding the combinatorial explosion, and present results showing successful extraction of objects from real images.
4 0.13648331 36 nips-2002-Automatic Alignment of Local Representations
Author: Yee W. Teh, Sam T. Roweis
Abstract: We present an automatic alignment procedure which maps the disparate internal representations learned by several local dimensionality reduction experts into a single, coherent global coordinate system for the original data space. Our algorithm can be applied to any set of experts, each of which produces a low-dimensional local representation of a highdimensional input. Unlike recent efforts to coordinate such models by modifying their objective functions [1, 2], our algorithm is invoked after training and applies an efficient eigensolver to post-process the trained models. The post-processing has no local optima and the size of the system it must solve scales with the number of local models rather than the number of original data points, making it more efficient than model-free algorithms such as Isomap [3] or LLE [4]. 1 Introduction: Local vs. Global Dimensionality Reduction Beyond density modelling, an important goal of unsupervised learning is to discover compact, informative representations of high-dimensional data. If the data lie on a smooth low dimensional manifold, then an excellent encoding is the coordinates internal to that manifold. The process of determining such coordinates is dimensionality reduction. Linear dimensionality reduction methods such as principal component analysis and factor analysis are easy to train but cannot capture the structure of curved manifolds. Mixtures of these simple unsupervised models [5, 6, 7, 8] have been used to perform local dimensionality reduction, and can provide good density models for curved manifolds, but unfortunately such mixtures cannot do dimensionality reduction. They do not describe a single, coherent low-dimensional coordinate system for the data since there is no pressure for the local coordinates of each component to agree. Roweis et al [1] recently proposed a model which performs global coordination of local coordinate systems in a mixture of factor analyzers (MFA). Their model is trained by maximizing the likelihood of the data, with an additional variational penalty term to encourage the internal coordinates of the factor analyzers to agree. While their model can trade off modelling the data and having consistent local coordinate systems, it requires a user given trade-off parameter, training is quite inefficient (although [2] describes an improved training algorithm for a more constrained model), and it has quite serious local minima problems (methods like LLE [4] or Isomap [3] have to be used for initialization). In this paper we describe a novel, automatic way to align the hidden representations used by each component of a mixture of dimensionality reducers into a single global representation of the data throughout space. Given an already trained mixture, the alignment is achieved by applying an eigensolver to a matrix constructed from the internal representations of the mixture components. Our method is efficient, simple to implement, and has no local optima in its optimization nor any learning rates or annealing schedules. 2 The Locally Linear Coordination Algorithm H 9¥ EI¡ CD66B9 ©9B 766 % G F 5 #
5 0.12539044 16 nips-2002-A Prototype for Automatic Recognition of Spontaneous Facial Actions
Author: M.S. Bartlett, G.C. Littlewort, T.J. Sejnowski, J.R. Movellan
Abstract: We present ongoing work on a project for automatic recognition of spontaneous facial actions. Spontaneous facial expressions differ substantially from posed expressions, similar to how continuous, spontaneous speech differs from isolated words produced on command. Previous methods for automatic facial expression recognition assumed images were collected in controlled environments in which the subjects deliberately faced the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. Here we explore an approach based on 3-D warping of images into canonical views. We evaluated the performance of the approach as a front-end for a spontaneous expression recognition system using support vector machines and hidden Markov models. This system employed general purpose learning mechanisms that can be applied to recognition of any facial movement. The system was tested for recognition of a set of facial actions defined by the Facial Action Coding System (FACS). We showed that 3D tracking and warping followed by machine learning techniques directly applied to the warped images, is a viable and promising technology for automatic facial expression recognition. One exciting aspect of the approach presented here is that information about movement dynamics emerged out of filters which were derived from the statistics of images.
6 0.11267658 25 nips-2002-An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
7 0.11028598 74 nips-2002-Dynamic Structure Super-Resolution
8 0.10324077 10 nips-2002-A Model for Learning Variance Components of Natural Images
9 0.088331133 18 nips-2002-Adaptation and Unsupervised Learning
10 0.086919852 2 nips-2002-A Bilinear Model for Sparse Coding
11 0.081977762 202 nips-2002-Unsupervised Color Constancy
12 0.081827484 98 nips-2002-Going Metric: Denoising Pairwise Data
13 0.080604069 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
14 0.07784117 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior
15 0.077226102 49 nips-2002-Charting a Manifold
16 0.075686112 27 nips-2002-An Impossibility Theorem for Clustering
17 0.071899064 57 nips-2002-Concurrent Object Recognition and Segmentation by Graph Partitioning
18 0.071093827 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
19 0.070805848 172 nips-2002-Recovering Articulated Model Topology from Observed Rigid Motion
20 0.070782162 53 nips-2002-Clustering with the Fisher Score
topicId topicWeight
[(0, -0.212), (1, -0.016), (2, -0.019), (3, 0.224), (4, -0.08), (5, 0.056), (6, 0.047), (7, -0.061), (8, 0.02), (9, 0.124), (10, 0.079), (11, 0.008), (12, -0.017), (13, -0.009), (14, -0.036), (15, 0.059), (16, -0.053), (17, 0.066), (18, 0.067), (19, 0.065), (20, -0.042), (21, -0.023), (22, 0.028), (23, 0.048), (24, -0.042), (25, 0.19), (26, -0.031), (27, -0.15), (28, 0.039), (29, 0.139), (30, 0.003), (31, 0.01), (32, 0.075), (33, 0.03), (34, -0.041), (35, 0.059), (36, -0.102), (37, -0.102), (38, 0.048), (39, -0.162), (40, -0.079), (41, -0.01), (42, 0.066), (43, -0.025), (44, -0.023), (45, 0.079), (46, 0.048), (47, -0.003), (48, 0.112), (49, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.94611859 87 nips-2002-Fast Transformation-Invariant Factor Analysis
Author: Anitha Kannan, Nebojsa Jojic, Brendan J. Frey
Abstract: Dimensionality reduction techniques such as principal component analysis and factor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations such as translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction. However, due to enormous computational requirements of the EM algorithm for learning the model, O( ) where is the dimensionality of a data sample, MTCA was not practical for most applications. In this paper, we demonstrate how fast Fourier transforms can reduce the computation to the order of log . With this speedup, we show the effectiveness of MTCA in various applications - tracking, video textures, clustering video sequences, object recognition, and object detection in images. ¡ ¤ ¤ ¤ ¤
2 0.57825089 16 nips-2002-A Prototype for Automatic Recognition of Spontaneous Facial Actions
Author: M.S. Bartlett, G.C. Littlewort, T.J. Sejnowski, J.R. Movellan
Abstract: We present ongoing work on a project for automatic recognition of spontaneous facial actions. Spontaneous facial expressions differ substantially from posed expressions, similar to how continuous, spontaneous speech differs from isolated words produced on command. Previous methods for automatic facial expression recognition assumed images were collected in controlled environments in which the subjects deliberately faced the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. Here we explore an approach based on 3-D warping of images into canonical views. We evaluated the performance of the approach as a front-end for a spontaneous expression recognition system using support vector machines and hidden Markov models. This system employed general purpose learning mechanisms that can be applied to recognition of any facial movement. The system was tested for recognition of a set of facial actions defined by the Facial Action Coding System (FACS). We showed that 3D tracking and warping followed by machine learning techniques directly applied to the warped images, is a viable and promising technology for automatic facial expression recognition. One exciting aspect of the approach presented here is that information about movement dynamics emerged out of filters which were derived from the statistics of images.
3 0.55016834 39 nips-2002-Bayesian Image Super-Resolution
Author: Michael E. Tipping, Christopher M. Bishop
Abstract: The extraction of a single high-quality image from a set of lowresolution images is an important problem which arises in fields such as remote sensing, surveillance, medical imaging and the extraction of still images from video. Typical approaches are based on the use of cross-correlation to register the images followed by the inversion of the transformation from the unknown high resolution image to the observed low resolution images, using regularization to resolve the ill-posed nature of the inversion process. In this paper we develop a Bayesian treatment of the super-resolution problem in which the likelihood function for the image registration parameters is based on a marginalization over the unknown high-resolution image. This approach allows us to estimate the unknown point spread function, and is rendered tractable through the introduction of a Gaussian process prior over images. Results indicate a significant improvement over techniques based on MAP (maximum a-posteriori) point optimization of the high resolution image and associated registration parameters. 1
4 0.53690559 36 nips-2002-Automatic Alignment of Local Representations
Author: Yee W. Teh, Sam T. Roweis
Abstract: We present an automatic alignment procedure which maps the disparate internal representations learned by several local dimensionality reduction experts into a single, coherent global coordinate system for the original data space. Our algorithm can be applied to any set of experts, each of which produces a low-dimensional local representation of a highdimensional input. Unlike recent efforts to coordinate such models by modifying their objective functions [1, 2], our algorithm is invoked after training and applies an efficient eigensolver to post-process the trained models. The post-processing has no local optima and the size of the system it must solve scales with the number of local models rather than the number of original data points, making it more efficient than model-free algorithms such as Isomap [3] or LLE [4]. 1 Introduction: Local vs. Global Dimensionality Reduction Beyond density modelling, an important goal of unsupervised learning is to discover compact, informative representations of high-dimensional data. If the data lie on a smooth low dimensional manifold, then an excellent encoding is the coordinates internal to that manifold. The process of determining such coordinates is dimensionality reduction. Linear dimensionality reduction methods such as principal component analysis and factor analysis are easy to train but cannot capture the structure of curved manifolds. Mixtures of these simple unsupervised models [5, 6, 7, 8] have been used to perform local dimensionality reduction, and can provide good density models for curved manifolds, but unfortunately such mixtures cannot do dimensionality reduction. They do not describe a single, coherent low-dimensional coordinate system for the data since there is no pressure for the local coordinates of each component to agree. Roweis et al [1] recently proposed a model which performs global coordination of local coordinate systems in a mixture of factor analyzers (MFA). Their model is trained by maximizing the likelihood of the data, with an additional variational penalty term to encourage the internal coordinates of the factor analyzers to agree. While their model can trade off modelling the data and having consistent local coordinate systems, it requires a user given trade-off parameter, training is quite inefficient (although [2] describes an improved training algorithm for a more constrained model), and it has quite serious local minima problems (methods like LLE [4] or Isomap [3] have to be used for initialization). In this paper we describe a novel, automatic way to align the hidden representations used by each component of a mixture of dimensionality reducers into a single global representation of the data throughout space. Given an already trained mixture, the alignment is achieved by applying an eigensolver to a matrix constructed from the internal representations of the mixture components. Our method is efficient, simple to implement, and has no local optima in its optimization nor any learning rates or annealing schedules. 2 The Locally Linear Coordination Algorithm H 9¥ EI¡ CD66B9 ©9B 766 % G F 5 #
5 0.524113 74 nips-2002-Dynamic Structure Super-Resolution
Author: Amos J. Storkey
Abstract: The problem of super-resolution involves generating feasible higher resolution images, which are pleasing to the eye and realistic, from a given low resolution image. This might be attempted by using simple filters for smoothing out the high resolution blocks or through applications where substantial prior information is used to imply the textures and shapes which will occur in the images. In this paper we describe an approach which lies between the two extremes. It is a generic unsupervised method which is usable in all domains, but goes beyond simple smoothing methods in what it achieves. We use a dynamic tree-like architecture to model the high resolution data. Approximate conditioning on the low resolution image is achieved through a mean field approach. 1
6 0.52391535 150 nips-2002-Multiple Cause Vector Quantization
7 0.51595658 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
8 0.47692153 18 nips-2002-Adaptation and Unsupervised Learning
9 0.47161931 25 nips-2002-An Asynchronous Hidden Markov Model for Audio-Visual Speech Recognition
10 0.45898893 63 nips-2002-Critical Lines in Symmetry of Mixture Models and its Application to Component Splitting
11 0.43957561 131 nips-2002-Learning to Classify Galaxy Shapes Using the EM Algorithm
12 0.41254658 190 nips-2002-Stochastic Neighbor Embedding
13 0.3867223 49 nips-2002-Charting a Manifold
14 0.37548339 138 nips-2002-Manifold Parzen Windows
15 0.37480438 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
16 0.37315685 100 nips-2002-Half-Lives of EigenFlows for Spectral Clustering
17 0.36867324 142 nips-2002-Maximum Likelihood and the Information Bottleneck
18 0.36349463 2 nips-2002-A Bilinear Model for Sparse Coding
19 0.35897183 137 nips-2002-Location Estimation with a Differential Update Network
20 0.35596469 31 nips-2002-Application of Variational Bayesian Approach to Speech Recognition
topicId topicWeight
[(1, 0.331), (11, 0.016), (23, 0.014), (42, 0.085), (54, 0.143), (55, 0.032), (67, 0.02), (68, 0.029), (74, 0.091), (87, 0.019), (92, 0.027), (98, 0.109)]
simIndex simValue paperId paperTitle
same-paper 1 0.81865734 87 nips-2002-Fast Transformation-Invariant Factor Analysis
Author: Anitha Kannan, Nebojsa Jojic, Brendan J. Frey
Abstract: Dimensionality reduction techniques such as principal component analysis and factor analysis are used to discover a linear mapping between high dimensional data samples and points in a lower dimensional subspace. In [6], Jojic and Frey introduced mixture of transformation-invariant component analyzers (MTCA) that can account for global transformations such as translations and rotations, perform clustering and learn local appearance deformations by dimensionality reduction. However, due to enormous computational requirements of the EM algorithm for learning the model, O( ) where is the dimensionality of a data sample, MTCA was not practical for most applications. In this paper, we demonstrate how fast Fourier transforms can reduce the computation to the order of log . With this speedup, we show the effectiveness of MTCA in various applications - tracking, video textures, clustering video sequences, object recognition, and object detection in images. ¡ ¤ ¤ ¤ ¤
2 0.80658424 51 nips-2002-Classifying Patterns of Visual Motion - a Neuromorphic Approach
Author: Jakob Heinzle, Alan Stocker
Abstract: We report a system that classifies and can learn to classify patterns of visual motion on-line. The complete system is described by the dynamics of its physical network architectures. The combination of the following properties makes the system novel: Firstly, the front-end of the system consists of an aVLSI optical flow chip that collectively computes 2-D global visual motion in real-time [1]. Secondly, the complexity of the classification task is significantly reduced by mapping the continuous motion trajectories to sequences of ’motion events’. And thirdly, all the network structures are simple and with the exception of the optical flow chip based on a Winner-Take-All (WTA) architecture. We demonstrate the application of the proposed generic system for a contactless man-machine interface that allows to write letters by visual motion. Regarding the low complexity of the system, its robustness and the already existing front-end, a complete aVLSI system-on-chip implementation is realistic, allowing various applications in mobile electronic devices.
3 0.69080698 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
Author: Eric P. Xing, Michael I. Jordan, Stuart Russell, Andrew Y. Ng
Abstract: Many algorithms rely critically on being given a good metric over their inputs. For instance, data can often be clustered in many “plausible” ways, and if a clustering algorithm such as K-means initially fails to find one that is meaningful to a user, the only recourse may be for the user to manually tweak the metric until sufficiently good clusters are found. For these and other applications requiring good metrics, it is desirable that we provide a more systematic way for users to indicate what they consider “similar.” For instance, we may ask them to provide examples. In this paper, we present an algorithm that, given examples of similar (and, , learns a distance metric over if desired, dissimilar) pairs of points in that respects these relationships. Our method is based on posing metric learning as a convex optimization problem, which allows us to give efficient, local-optima-free algorithms. We also demonstrate empirically that the learned metrics can be used to significantly improve clustering performance. £ ¤¢ £ ¥¢
4 0.57158327 16 nips-2002-A Prototype for Automatic Recognition of Spontaneous Facial Actions
Author: M.S. Bartlett, G.C. Littlewort, T.J. Sejnowski, J.R. Movellan
Abstract: We present ongoing work on a project for automatic recognition of spontaneous facial actions. Spontaneous facial expressions differ substantially from posed expressions, similar to how continuous, spontaneous speech differs from isolated words produced on command. Previous methods for automatic facial expression recognition assumed images were collected in controlled environments in which the subjects deliberately faced the camera. Since people often nod or turn their heads, automatic recognition of spontaneous facial behavior requires methods for handling out-of-image-plane head rotations. Here we explore an approach based on 3-D warping of images into canonical views. We evaluated the performance of the approach as a front-end for a spontaneous expression recognition system using support vector machines and hidden Markov models. This system employed general purpose learning mechanisms that can be applied to recognition of any facial movement. The system was tested for recognition of a set of facial actions defined by the Facial Action Coding System (FACS). We showed that 3D tracking and warping followed by machine learning techniques directly applied to the warped images, is a viable and promising technology for automatic facial expression recognition. One exciting aspect of the approach presented here is that information about movement dynamics emerged out of filters which were derived from the statistics of images.
5 0.56944501 172 nips-2002-Recovering Articulated Model Topology from Observed Rigid Motion
Author: Leonid Taycher, John Iii, Trevor Darrell
Abstract: Accurate representation of articulated motion is a challenging problem for machine perception. Several successful tracking algorithms have been developed that model human body as an articulated tree. We propose a learning-based method for creating such articulated models from observations of multiple rigid motions. This paper is concerned with recovering topology of the articulated model, when the rigid motion of constituent segments is known. Our approach is based on finding the Maximum Likelihood tree shaped factorization of the joint probability density function (PDF) of rigid segment motions. The topology of graphical model formed from this factorization corresponds to topology of the underlying articulated body. We demonstrate the performance of our algorithm on both synthetic and real motion capture data.
6 0.56712979 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
7 0.56323719 137 nips-2002-Location Estimation with a Differential Update Network
8 0.55902517 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
9 0.55834103 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
10 0.55396545 53 nips-2002-Clustering with the Fisher Score
11 0.55373782 124 nips-2002-Learning Graphical Models with Mercer Kernels
12 0.55111253 3 nips-2002-A Convergent Form of Approximate Policy Iteration
13 0.54975992 39 nips-2002-Bayesian Image Super-Resolution
14 0.54936618 68 nips-2002-Discriminative Densities from Maximum Contrast Estimation
15 0.54858041 46 nips-2002-Boosting Density Estimation
16 0.54841018 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
17 0.54836935 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
18 0.54811597 88 nips-2002-Feature Selection and Classification on Matrix Data: From Large Margins to Small Covering Numbers
19 0.54514813 169 nips-2002-Real-Time Particle Filters
20 0.54445499 10 nips-2002-A Model for Learning Variance Components of Natural Images