nips nips2004 nips2004-73 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: John Winn, Andrew Blake
Abstract: We present an extension to the Jojic and Frey (2001) layered sprite model which allows for layers to undergo affine transformations. This extension allows for affine object pose to be inferred whilst simultaneously learning the object shape and appearance. Learning is carried out by applying an augmented variational inference algorithm which includes a global search over a discretised transform space followed by a local optimisation. To aid correct convergence, we use bottom-up cues to restrict the space of possible affine transformations. We present results on a number of video sequences and show how the model can be extended to track an object whose appearance changes throughout the sequence. 1
Reference: text
sentIndex sentText sentNum sentScore
1 This extension allows for affine object pose to be inferred whilst simultaneously learning the object shape and appearance. [sent-7, score-0.537]
2 Learning is carried out by applying an augmented variational inference algorithm which includes a global search over a discretised transform space followed by a local optimisation. [sent-8, score-0.433]
3 We present results on a number of video sequences and show how the model can be extended to track an object whose appearance changes throughout the sequence. [sent-10, score-0.552]
4 1 Introduction Generative models provide a powerful and intuitive way to analyse images or video sequences. [sent-11, score-0.176]
5 Finally, it is possible to sample from generative models, for example, for the purposes of image or video editing. [sent-14, score-0.266]
6 One popular type of generative model represents images as a composition of layers [1] where each layer corresponds to the appearance and shape of an individual object or the background. [sent-15, score-0.736]
7 If the generative model is expressed probabilistically, Bayesian learning and inference techniques can then be applied to reverse the imaging process and infer the shape and appearance of individual objects in an unsupervised fashion [2]. [sent-16, score-0.498]
8 In a layered model, inference involves localising the pose of the layers in each image, which is hard because of the large space of possible object transformations that needs to be explored. [sent-18, score-0.486]
9 Previously, this has been dealt with by imposing restrictions on the space of object transformations, such as allowing only similarity transformations [3]. [sent-19, score-0.236]
10 Alternatively, if the images are known to belong to a video sequence, tracking constraints can be used to focus the search on a small area of transformation space consistent with a dynamic model of object motion [4]. [sent-20, score-0.449]
11 However, even in a video sequence, this technique relies on the object remaining in frame and moving relatively slowly. [sent-21, score-0.481]
12 In this paper, we extend the work of [3] and present an approach to object localisation which allows objects to undergo planar affine transformations and works well both in the frames of a video sequence and in unordered sets of images. [sent-22, score-0.547]
13 A two-layer generative model is defined and inference performed using a factorised variational approximation, including a global search over a discretised transform space followed by a local optimisation using conjugate gradients. [sent-23, score-0.6]
14 Finally, we extend our generative model to allow the object appearance in one image to depend on its appearance in the previous one. [sent-25, score-0.892]
15 Tracking appearance in this way gives improved performance for objects whose appearance changes slowly over time (e. [sent-26, score-0.607]
16 If the images are not frames of a video, or the object is out-of-frame or occluded in the previous image, then the system automatically reverts to using a learned foreground appearance model. [sent-29, score-0.851]
17 2 The generative image model This section describes the generative image model, which is illustrated in the Bayesian network of Figure 1. [sent-30, score-0.324]
18 This model consists of two layers, a foreground layer containing a single object and a background layer. [sent-31, score-0.534]
19 The background layer is assumed to be stationary and so its appearance vector b is set to be the same size as the image. [sent-36, score-0.406]
20 A mask mi has binary elements that indicate which b f T m x N Figure 1: The Bayesian network for the generative image model. [sent-37, score-0.519]
21 Common to all images are the background b, foreground object appearance f and mask prior π. [sent-39, score-1.148]
22 An affine transform T gives the position and pose of the object in each image. [sent-40, score-0.27]
23 The binary mask m defines the area of support of the foreground object and has a prior given by a transformed π. [sent-41, score-0.864]
24 The observed image x is generated by adding noise β separately to the transformed foreground appearance and the background and composing them together using the mask. [sent-42, score-0.729]
25 For illustration, the images underneath each node of the graph represent the inferred value of that node given a data set of hand images. [sent-43, score-0.16]
26 A priori, the appearance and mask of the object are not known. [sent-44, score-0.771]
27 The mask is set to be slightly larger than the image to allow the foreground object to overlap the edge of the image. [sent-46, score-0.824]
28 The foreground layer is represented by an appearance image vector f and a prior over its mask π, both of which are to be inferred from the image set. [sent-47, score-1.218]
29 The elements of π are real numbers in the range [0, 1] which indicate the probability that the corresponding mask pixels are on, as suggested in [5]. [sent-48, score-0.375]
30 The object appearance and mask prior are defined in a canonical, normalised pose; the actual position and pose of the object in the ith image is given by an affine transformation Ti . [sent-49, score-1.254]
31 With our images in vector form, we can consider a transformation T to be a sparse matrix where the jth row defines the linear interpolation of pixels that gives the jth pixel in the transformed image. [sent-50, score-0.282]
32 For example, a translation of an integer number of pixels is represented as a matrix whose entries Tjk are 1 if the translation of location k in the source image is location j in the destination image, and 0 otherwise. [sent-51, score-0.25]
33 Hence, the transformed foreground appearance is given by Tf and the transformed mask prior by Tπ. [sent-52, score-1.023]
34 Given the transformed mask prior, the conditional distribution for the kth mask pixel is P (mk = 1 | π, T) = (Tπ)k . [sent-53, score-0.775]
35 (1) The observed image x is generated by a composition of the transformed foreground appearance and the background plus some noise. [sent-54, score-0.729]
36 The conditional distribution for the kth image pixel is given by −1 −1 P (xk | b, f , m, T, β) = N (xk | (Tf )k , βf )mk N (xk | bk , βb )1−mk (2) where β = (βf , βb ) are the noise precisions for the foreground layer and the background layer respectively. [sent-55, score-0.6]
37 3 Factorised variational inference Given the above model and a set of images {x1 , . [sent-58, score-0.292]
38 , xN }, the inference task is to learn a posterior distribution over all other variables including the background, the foreground appearance and mask prior, the transformation and mask for each image and the noise precisions. [sent-61, score-1.419]
39 Instead, we turn to the approximate inference technique of variational inference [6]. [sent-63, score-0.324]
40 Variational inference involves defining a factorised variational distribution Q and then optimising to minimise the Kullback-Leibler divergence between Q and the true posterior distribution. [sent-64, score-0.441]
41 In this paper, we choose our variational distribution to be factorised with respect to each element of b, f , m and π and also with respect to βf and βb . [sent-66, score-0.26]
42 The optimisation of the Q distribution is achieved by firstly initialising the parameters of all factors and then iteratively updating each factor in turn so as to minimise the KL divergence whilst keeping all other factors fixed. [sent-70, score-0.252]
43 Where a message is shown as leaving the N plate, the destination node receives a set of N messages, one from each copy of the nodes within the plate. [sent-77, score-0.156]
44 Where a message is shown entering the N plate, the message is sent to all copies of the destination node. [sent-78, score-0.282]
45 All expectations are with respect to the variational distribution Q. [sent-79, score-0.17]
46 Using VMP makes it very much simpler and quicker to extend, modify, combine or compare probabilistic models; it gives the same results as applying factorised variational inference by hand and places no additional constraints on the form of the model. [sent-83, score-0.362]
47 In VMP, messages consisting of vectors of real numbers are sent to each node from its parent and children in the graph. [sent-84, score-0.172]
48 By expressing each variational factor as an exponential family distribution, the ‘natural parameter vector’ [8] of that distribution can be optimised using (3) by adding messages received at the corresponding node. [sent-86, score-0.33]
49 For example, if the prior over b is N (b | µ, γ −1 ), the parameter vector of the factor Q(b) = N (b | µ , γ −1 ) is updated from the messages received at b using prior natural param. [sent-87, score-0.19]
50 vector µγ 1 −2γ = µγ −1γ 2 received messages N + i=1 βbi (1 − mi ) ∗ xi − 1 βbi (1 − mi ) 2 . [sent-88, score-0.174]
51 Where a set of similar messages are sent corresponding to the pixels of an image, it is convenient to think instead of a single message where each element is itself an image. [sent-95, score-0.285]
52 4 Learning the object transformation Following [3], we decompose the layer transformation into a product of transformations and define a variational distribution that is separable over each. [sent-97, score-0.588]
53 The variational distribution over the combined transform T is given by (6) Q(T) = Q(Txy )Q(Trs )Q(Ta ). [sent-102, score-0.233]
54 x ∗ Txy Trs Ta f − 1 Txy Trs Ta f 2 2 log Q(Trs ) + z xy = T−1 m . [sent-107, score-0.157]
55 (Trs xy + βf T−1 (m ∗ x) xy rs Trs Ta f 2 where z xy and z Ta log π ) + T−1 (1 xy (7) − m) . [sent-108, score-0.581]
56 (Trs Ta f ) − βf T−1 m xy 1 2 + z rs (8) are constants which can be found by normalisation. [sent-110, score-0.184]
57 Finally, we define the variational distribution over Ta to be a delta function, (9) Q(Ta ) = δ(Ta − Ta ). [sent-115, score-0.17]
58 Unlike all the other variational factors, this cannot be optimised analytically. [sent-116, score-0.224]
59 (Ta log(1 − π) ) rs xy rs xy + βf T−1 T−1 (m ∗ x) . [sent-119, score-0.368]
60 (Ta f ) − 1 βf T−1 T−1 m 2 rs xy rs xy Ta f 2 . [sent-120, score-0.368]
61 This assumption appeared valid for the image sequences used in our experiments, even when the transformation of the foreground layer was not well approximated by a similarity transform alone. [sent-123, score-0.528]
62 The pose of the learned appearance and mask prior is undefined and so applying a transform to f and π and the inverse of the transform to each Ti results in an unchanged joint distribution. [sent-125, score-0.839]
63 When applying a variational technique, such non-identifiability leads to many more local minima in the KL divergence. [sent-126, score-0.195]
64 We partially resolve this issue by adding a constraint to this model that the expected mask π is centred, so that its centre of gravity is in the middle of the latent image. [sent-127, score-0.383]
65 Background Example frame #1 Foreground #1 Normalised frame #1 Object appearance & mask Example frame #2 Foreground #2 Normalised frame #2 Figure 3: Tracking a hand undergoing extreme affine transformation. [sent-129, score-1.367]
66 The first column shows the learned background and masked object appearance. [sent-130, score-0.212]
67 The second and third columns contain two frames from the sequence along with the foreground segmentation for each. [sent-131, score-0.348]
68 The final column shows each frame transformed by the inverse of the inferred object transform. [sent-132, score-0.481]
69 In each image the red outline surrounds the area where the transformed mask prior π is greater than 0. [sent-133, score-0.574]
70 For example, the inferred mask m i for each frame is very informative about the location of the object in that frame. [sent-137, score-0.738]
71 Instead, we use a conservative, hand-constructed bound on Txy based on the assumption that, during inference, the most probable mask under Q(mi ) consists of a (noisy) subset of the true mask pixels. [sent-139, score-0.698]
72 Suppose the true mask contains M non-zero pixels with second moment of area IM and the current most probable mask contains V non-zero pixels (V ≤ M ) with second moment of area IV . [sent-140, score-0.824]
73 A bound on c, the position of the centre of the inferred mask relative to the centre of true mask, is given by diag(ccT ) ≤ (M − V ) diag(IM /V − IV /M ). [sent-141, score-0.48]
74 The bound is deliberately constructed to be conservative; its purpose is to discard settings of Txy that have negligible probability under the model and so avoid local minima in the variational optimisation. [sent-144, score-0.194]
75 The use of this bound on Txy is intended as a very simple example of incorporating bottomup information to improve inference within a generative model. [sent-147, score-0.172]
76 Incorporating such proposals or bounds into a variational inference framework both speeds convergence and helps avoid local minima. [sent-149, score-0.247]
77 The first is of a hand rotating both parallel to the image plane and around its own axis, whilst also translating in three dimensions. [sent-151, score-0.163]
78 Our Matlab implementation took about a minute per frame to analyse the se- Appearance & mask Example frame #1 Foreground #1 Example frame #2 Foreground #2 Figure 4: Affine tracking of a semi-transparent object. [sent-153, score-0.955]
79 Appearance & mask First frame Foreground Last frame Foreground Figure 5: Tracking an object with changing appearance. [sent-154, score-0.878]
80 A person is tracked throughout a sequence despite their appearance changing dramatically from between first and last frames. [sent-155, score-0.392]
81 The blue outline shows the inferred mask m which differs slightly from π due to the object changing shape. [sent-156, score-0.61]
82 Figure 3 shows the expected values of the background and foreground layers under the optimised variational distribution, along with foreground segmentation results for two frames of the sequence. [sent-158, score-0.901]
83 The right hand column gives another indication of the accuracy of the inferred transformations by applying the inverse transformation to the entire frame and showing that the hand then has a consistent normalised position and size. [sent-159, score-0.459]
84 In a video of the hand showing the tracked outline,1 the outline appears to move smoothly and follow the hand with a high degree of accuracy, despite the system not using any temporal constraints. [sent-160, score-0.188]
85 Although the cyclist and her shadow are tracked correctly, the learned appearance is slightly inaccurate as the system is unable to capture the perspective foreshortening of the bicycle. [sent-162, score-0.373]
86 6 Tracking objects with changing appearance The model described so far makes the assumption that the appearance of the object does not change significantly from frame to frame. [sent-164, score-0.98]
87 If the set of images are actually frames from a video, we can model objects whose appearance changes slowly by allowing the model to use the object appearance in the previous frame as the basis for its appearance in the current frame. [sent-165, score-1.356]
88 However, we may not know if the images are video frames and, even if we do, the object may be occluded or out-of-frame in the previous image. [sent-166, score-0.429]
89 We can cope with this uncertainty by inferring automatically whether to use the previous frame or the learned appearance f . [sent-167, score-0.464]
90 The model is extended by introducing a binary variable si for each frame and define a new appearance variable gi = si f + (1 − si )T−1 xi−1 . [sent-169, score-0.577]
91 Hence gi either equals the foreground i−1 appearance f (if si = 1) or the transform-normalised previous frame (if si = 0). [sent-170, score-0.792]
92 The extended model is able to track an object even when its appearance changes significantly throughout the image sequence (see Figure 5). [sent-173, score-0.569]
93 Using the tracked appearance 1 Videos of results are available from http://johnwinn. [sent-175, score-0.337]
94 allows the foreground segmentation of each frame to be accurate even though the object is poorly modelled by the inferred appearance image. [sent-177, score-0.941]
95 If we introduce an abrupt change into the sequence, for example by reversing the second half of the sequence, s i is found to be ≈ 1 for the frame following the change. [sent-178, score-0.182]
96 In other words, the system has detected not to use the previous frame at this point, but to revert to using the latent appearance image f . [sent-179, score-0.582]
97 7 Discussion We have proposed a method for localising an object undergoing affine transform whilst simultaneously learning its shape and appearance. [sent-180, score-0.396]
98 This power of this method has been demonstrated by tracking moving objects in several real videos, including where the appearance of the object changes significantly from start to end. [sent-181, score-0.579]
99 The system makes no assumptions about the speed of motion of the object, requires no special initialisation and is robust to the object being temporarily occluded or moving out of frame. [sent-182, score-0.235]
100 A natural extension to this work is to allow multiple layers, with each layer having its own latent shape and appearance and set of affine transformations. [sent-183, score-0.412]
wordName wordTfidf (topN-words)
[('txy', 0.433), ('trs', 0.361), ('mask', 0.323), ('appearance', 0.282), ('foreground', 0.244), ('ta', 0.241), ('frame', 0.182), ('variational', 0.17), ('object', 0.166), ('vmp', 0.126), ('xy', 0.12), ('messages', 0.106), ('video', 0.104), ('af', 0.096), ('image', 0.091), ('factorised', 0.09), ('message', 0.085), ('layer', 0.078), ('inference', 0.077), ('tf', 0.076), ('frames', 0.074), ('discretised', 0.072), ('whilst', 0.072), ('generative', 0.071), ('transformations', 0.07), ('layers', 0.069), ('inferred', 0.067), ('transformed', 0.066), ('rs', 0.064), ('transform', 0.063), ('normalised', 0.063), ('tracking', 0.059), ('optimisation', 0.057), ('tracked', 0.055), ('jojic', 0.054), ('minimise', 0.054), ('optimised', 0.054), ('transformation', 0.052), ('pixels', 0.052), ('hi', 0.05), ('plate', 0.047), ('winn', 0.047), ('beta', 0.047), ('destination', 0.047), ('background', 0.046), ('images', 0.045), ('objects', 0.043), ('sent', 0.042), ('prior', 0.042), ('pose', 0.041), ('conservative', 0.04), ('occluded', 0.04), ('kth', 0.04), ('translations', 0.038), ('log', 0.037), ('cyclist', 0.036), ('localising', 0.036), ('undergoing', 0.034), ('mk', 0.034), ('mi', 0.034), ('bayesian', 0.033), ('centre', 0.033), ('iv', 0.033), ('centred', 0.031), ('ffts', 0.031), ('localisation', 0.031), ('sequence', 0.03), ('transforms', 0.03), ('translation', 0.03), ('moving', 0.029), ('im', 0.029), ('cues', 0.029), ('outline', 0.029), ('si', 0.029), ('undergo', 0.029), ('probable', 0.028), ('ith', 0.028), ('posterior', 0.027), ('layered', 0.027), ('analyse', 0.027), ('latent', 0.027), ('carried', 0.026), ('gi', 0.026), ('changing', 0.025), ('videos', 0.025), ('shifting', 0.025), ('frey', 0.025), ('shape', 0.025), ('applying', 0.025), ('kl', 0.025), ('node', 0.024), ('ne', 0.024), ('bound', 0.024), ('factors', 0.023), ('copies', 0.023), ('xk', 0.023), ('divergence', 0.023), ('area', 0.023), ('pixel', 0.023), ('jth', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 73 nips-2004-Generative Affine Localisation and Tracking
Author: John Winn, Andrew Blake
Abstract: We present an extension to the Jojic and Frey (2001) layered sprite model which allows for layers to undergo affine transformations. This extension allows for affine object pose to be inferred whilst simultaneously learning the object shape and appearance. Learning is carried out by applying an augmented variational inference algorithm which includes a global search over a discretised transform space followed by a local optimisation. To aid correct convergence, we use bottom-up cues to restrict the space of possible affine transformations. We present results on a number of video sequences and show how the model can be extended to track an object whose appearance changes throughout the sequence. 1
2 0.20044436 83 nips-2004-Incremental Learning for Visual Tracking
Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang
Abstract: Most existing tracking algorithms construct a representation of a target object prior to the tracking task starts, and utilize invariant features to handle appearance variation of the target caused by lighting, pose, and view angle change. In this paper, we present an efficient and effective online algorithm that incrementally learns and adapts a low dimensional eigenspace representation to reflect appearance changes of the target, thereby facilitating the tracking task. Furthermore, our incremental method correctly updates the sample mean and the eigenbasis, whereas existing incremental subspace update methods ignore the fact the sample mean varies over time. The tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle filter is incorporated for propagating sample distributions over time. Numerous experiments demonstrate the effectiveness of the proposed tracking algorithm in indoor and outdoor environments where the target objects undergo large pose and lighting changes. 1
3 0.18281205 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang
Abstract: This paper presents an adaptive discriminative generative model that generalizes the conventional Fisher Linear Discriminant algorithm and renders a proper probabilistic interpretation. Within the context of object tracking, we aim to find a discriminative generative model that best separates the target from the background. We present a computationally efficient algorithm to constantly update this discriminative model as time progresses. While most tracking algorithms operate on the premise that the object appearance or ambient lighting condition does not significantly change as time progresses, our method adapts a discriminative generative model to reflect appearance variation of the target and background, thereby facilitating the tracking task in ever-changing environments. Numerous experiments show that our method is able to learn a discriminative generative model for tracking target objects undergoing large pose and lighting changes.
4 0.16340959 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
Author: Le Lu, Gregory D. Hager, Laurent Younes
Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.
5 0.13279672 40 nips-2004-Common-Frame Model for Object Recognition
Author: Pierre Moreels, Pietro Perona
Abstract: A generative probabilistic model for objects in images is presented. An object consists of a constellation of features. Feature appearance and pose are modeled probabilistically. Scene images are generated by drawing a set of objects from a given database, with random clutter sprinkled on the remaining image surface. Occlusion is allowed. We study the case where features from the same object share a common reference frame. Moreover, parameters for shape and appearance densities are shared across features. This is to be contrasted with previous work on probabilistic ‘constellation’ models where features depend on each other, and each feature and model have different pose and appearance statistics [1, 2]. These two differences allow us to build models containing hundreds of features, as well as to train each model from a single example. Our model may also be thought of as a probabilistic revisitation of Lowe’s model [3, 4]. We propose an efficient entropy-minimization inference algorithm that constructs the best interpretation of a scene as a collection of objects and clutter. We test our ideas with experiments on two image databases. We compare with Lowe’s algorithm and demonstrate better performance, in particular in presence of large amounts of background clutter.
6 0.10466582 44 nips-2004-Conditional Random Fields for Object Recognition
7 0.10178491 99 nips-2004-Learning Hyper-Features for Visual Identification
8 0.084004 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
9 0.082352147 112 nips-2004-Maximising Sensitivity in a Spiking Network
10 0.080341019 174 nips-2004-Spike Sorting: Bayesian Clustering of Non-Stationary Data
11 0.078530185 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data
12 0.077824056 116 nips-2004-Message Errors in Belief Propagation
13 0.07251963 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
14 0.072322182 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
15 0.063212968 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
16 0.059279449 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
17 0.059251484 161 nips-2004-Self-Tuning Spectral Clustering
18 0.057569485 192 nips-2004-The power of feature clustering: An application to object detection
19 0.057450444 89 nips-2004-Joint MRI Bias Removal Using Entropy Minimization Across Images
20 0.055559639 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces
topicId topicWeight
[(0, -0.178), (1, -0.011), (2, -0.03), (3, -0.242), (4, 0.2), (5, -0.032), (6, -0.002), (7, -0.101), (8, -0.086), (9, 0.008), (10, -0.012), (11, 0.068), (12, 0.17), (13, 0.147), (14, 0.002), (15, -0.058), (16, 0.036), (17, 0.017), (18, -0.047), (19, 0.014), (20, -0.053), (21, -0.087), (22, 0.069), (23, -0.022), (24, 0.001), (25, 0.044), (26, -0.011), (27, 0.008), (28, 0.152), (29, -0.026), (30, 0.037), (31, 0.025), (32, -0.032), (33, 0.1), (34, -0.066), (35, 0.088), (36, 0.009), (37, 0.144), (38, -0.006), (39, -0.094), (40, 0.059), (41, 0.006), (42, 0.016), (43, 0.046), (44, -0.046), (45, -0.027), (46, -0.002), (47, -0.093), (48, 0.091), (49, -0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.95693582 73 nips-2004-Generative Affine Localisation and Tracking
Author: John Winn, Andrew Blake
Abstract: We present an extension to the Jojic and Frey (2001) layered sprite model which allows for layers to undergo affine transformations. This extension allows for affine object pose to be inferred whilst simultaneously learning the object shape and appearance. Learning is carried out by applying an augmented variational inference algorithm which includes a global search over a discretised transform space followed by a local optimisation. To aid correct convergence, we use bottom-up cues to restrict the space of possible affine transformations. We present results on a number of video sequences and show how the model can be extended to track an object whose appearance changes throughout the sequence. 1
2 0.81606114 83 nips-2004-Incremental Learning for Visual Tracking
Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang
Abstract: Most existing tracking algorithms construct a representation of a target object prior to the tracking task starts, and utilize invariant features to handle appearance variation of the target caused by lighting, pose, and view angle change. In this paper, we present an efficient and effective online algorithm that incrementally learns and adapts a low dimensional eigenspace representation to reflect appearance changes of the target, thereby facilitating the tracking task. Furthermore, our incremental method correctly updates the sample mean and the eigenbasis, whereas existing incremental subspace update methods ignore the fact the sample mean varies over time. The tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle filter is incorporated for propagating sample distributions over time. Numerous experiments demonstrate the effectiveness of the proposed tracking algorithm in indoor and outdoor environments where the target objects undergo large pose and lighting changes. 1
3 0.72816569 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang
Abstract: This paper presents an adaptive discriminative generative model that generalizes the conventional Fisher Linear Discriminant algorithm and renders a proper probabilistic interpretation. Within the context of object tracking, we aim to find a discriminative generative model that best separates the target from the background. We present a computationally efficient algorithm to constantly update this discriminative model as time progresses. While most tracking algorithms operate on the premise that the object appearance or ambient lighting condition does not significantly change as time progresses, our method adapts a discriminative generative model to reflect appearance variation of the target and background, thereby facilitating the tracking task in ever-changing environments. Numerous experiments show that our method is able to learn a discriminative generative model for tracking target objects undergoing large pose and lighting changes.
4 0.62658477 40 nips-2004-Common-Frame Model for Object Recognition
Author: Pierre Moreels, Pietro Perona
Abstract: A generative probabilistic model for objects in images is presented. An object consists of a constellation of features. Feature appearance and pose are modeled probabilistically. Scene images are generated by drawing a set of objects from a given database, with random clutter sprinkled on the remaining image surface. Occlusion is allowed. We study the case where features from the same object share a common reference frame. Moreover, parameters for shape and appearance densities are shared across features. This is to be contrasted with previous work on probabilistic ‘constellation’ models where features depend on each other, and each feature and model have different pose and appearance statistics [1, 2]. These two differences allow us to build models containing hundreds of features, as well as to train each model from a single example. Our model may also be thought of as a probabilistic revisitation of Lowe’s model [3, 4]. We propose an efficient entropy-minimization inference algorithm that constructs the best interpretation of a scene as a collection of objects and clutter. We test our ideas with experiments on two image databases. We compare with Lowe’s algorithm and demonstrate better performance, in particular in presence of large amounts of background clutter.
5 0.51599681 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
Author: Le Lu, Gregory D. Hager, Laurent Younes
Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.
6 0.48336262 99 nips-2004-Learning Hyper-Features for Visual Identification
7 0.44597313 191 nips-2004-The Variational Ising Classifier (VIC) Algorithm for Coherently Contaminated Data
8 0.39866009 44 nips-2004-Conditional Random Fields for Object Recognition
9 0.35220143 47 nips-2004-Contextual Models for Object Detection Using Boosted Random Fields
10 0.35074222 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces
11 0.3281087 192 nips-2004-The power of feature clustering: An application to object detection
12 0.3093521 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
13 0.30766708 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
14 0.30540574 205 nips-2004-Who's In the Picture
15 0.29796836 29 nips-2004-Beat Tracking the Graphical Model Way
16 0.29322717 91 nips-2004-Joint Tracking of Pose, Expression, and Texture using Conditionally Gaussian Filters
17 0.28609267 125 nips-2004-Multiple Relational Embedding
18 0.2799288 198 nips-2004-Unsupervised Variational Bayesian Learning of Nonlinear Models
19 0.26256222 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
20 0.26134256 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
topicId topicWeight
[(4, 0.254), (13, 0.078), (15, 0.175), (26, 0.051), (31, 0.02), (33, 0.174), (35, 0.014), (39, 0.026), (50, 0.036), (51, 0.019), (56, 0.014), (71, 0.03), (88, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.84302938 73 nips-2004-Generative Affine Localisation and Tracking
Author: John Winn, Andrew Blake
Abstract: We present an extension to the Jojic and Frey (2001) layered sprite model which allows for layers to undergo affine transformations. This extension allows for affine object pose to be inferred whilst simultaneously learning the object shape and appearance. Learning is carried out by applying an augmented variational inference algorithm which includes a global search over a discretised transform space followed by a local optimisation. To aid correct convergence, we use bottom-up cues to restrict the space of possible affine transformations. We present results on a number of video sequences and show how the model can be extended to track an object whose appearance changes throughout the sequence. 1
2 0.71978581 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications
Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang
Abstract: This paper presents an adaptive discriminative generative model that generalizes the conventional Fisher Linear Discriminant algorithm and renders a proper probabilistic interpretation. Within the context of object tracking, we aim to find a discriminative generative model that best separates the target from the background. We present a computationally efficient algorithm to constantly update this discriminative model as time progresses. While most tracking algorithms operate on the premise that the object appearance or ambient lighting condition does not significantly change as time progresses, our method adapts a discriminative generative model to reflect appearance variation of the target and background, thereby facilitating the tracking task in ever-changing environments. Numerous experiments show that our method is able to learn a discriminative generative model for tracking target objects undergoing large pose and lighting changes.
3 0.71651477 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
Author: Chakra Chennubhotla, Allan D. Jepson
Abstract: We show how to build hierarchical, reduced-rank representation for large stochastic matrices and use this representation to design an efficient algorithm for computing the largest eigenvalues, and the corresponding eigenvectors. In particular, the eigen problem is first solved at the coarsest level of the representation. The approximate eigen solution is then interpolated over successive levels of the hierarchy. A small number of power iterations are employed at each stage to correct the eigen solution. The typical speedups obtained by a Matlab implementation of our fast eigensolver over a standard sparse matrix eigensolver [13] are at least a factor of ten for large image sizes. The hierarchical representation has proven to be effective in a min-cut based segmentation algorithm that we proposed recently [8]. 1 Spectral Methods Graph-theoretic spectral methods have gained popularity in a variety of application domains: segmenting images [22]; embedding in low-dimensional spaces [4, 5, 8]; and clustering parallel scientific computation tasks [19]. Spectral methods enable the study of properties global to a dataset, using only local (pairwise) similarity or affinity measurements between the data points. The global properties that emerge are best understood in terms of a random walk formulation on the graph. For example, the graph can be partitioned into clusters by analyzing the perturbations to the stationary distribution of a Markovian relaxation process defined in terms of the affinity weights [17, 18, 24, 7]. The Markovian relaxation process need never be explicitly carried out; instead, it can be analytically expressed using the leading order eigenvectors, and eigenvalues, of the Markov transition matrix. In this paper we consider the practical application of spectral methods to large datasets. In particular, the eigen decomposition can be very expensive, on the order of O(n 3 ), where n is the number of nodes in the graph. While it is possible to compute analytically the first eigenvector (see §3 below), the remaining subspace of vectors (necessary for say clustering) has to be explicitly computed. A typical approach to dealing with this difficulty is to first sparsify the links in the graph [22] and then apply an efficient eigensolver [13, 23, 3]. In comparison, we propose in this paper a specialized eigensolver suitable for large stochastic matrices with known stationary distributions. In particular, we exploit the spectral properties of the Markov transition matrix to generate hierarchical, successively lower-ranked approximations to the full transition matrix. The eigen problem is solved directly at the coarsest level of representation. The approximate eigen solution is then interpolated over successive levels of the hierarchy, using a small number of power iterations to correct the solution at each stage. 2 Previous Work One approach to speeding up the eigen decomposition is to use the fact that the columns of the affinity matrix are typically correlated. The idea then is to pick a small number of representative columns to perform eigen decomposition via SVD. For example, in the Nystrom approximation procedure, originally proposed for integral eigenvalue problems, the idea is to randomly pick a small set of m columns; generate the corresponding affinity matrix; solve the eigenproblem and finally extend the solution to the complete graph [9, 10]. The Nystrom method has also been recently applied in the kernel learning methods for fast Gaussian process classification and regression [25]. Other sampling-based approaches include the work reported in [1, 2, 11]. Our starting point is the transition matrix generated from affinity weights and we show how building a representational hierarchy follows naturally from considering the stochastic matrix. A closely related work is the paper by Lin on reduced rank approximations of transition matrices [14]. We differ in how we approximate the transition matrices, in particular our objective function is computationally less expensive to solve. In particular, one of our goals in reducing transition matrices is to develop a fast, specialized eigen solver for spectral clustering. Fast eigensolving is also the goal in ACE [12], where successive levels in the hierarchy can potentially have negative affinities. A graph coarsening process for clustering was also pursued in [21, 3]. 3 Markov Chain Terminology We first provide a brief overview of the Markov chain terminology here (for more details see [17, 15, 6]). We consider an undirected graph G = (V, E) with vertices vi , for i = {1, . . . , n}, and edges ei,j with non-negative weights ai,j . Here the weight ai,j represents the affinity between vertices vi and vj . The affinities are represented by a non-negative, symmetric n × n matrix A having weights ai,j as elements. The degree of a node j is n n defined to be: dj = i=1 ai,j = j=1 aj,i , where we define D = diag(d1 , . . . , dn ). A Markov chain is defined using these affinities by setting a transition probability matrix M = AD −1 , where the columns of M each sum to 1. The transition probability matrix defines the random walk of a particle on the graph G. The random walk need never be explicitly carried out; instead, it can be analytically expressed using the leading order eigenvectors, and eigenvalues, of the Markov transition matrix. Because the stochastic matrices need not be symmetric in general, a direct eigen decomposition step is not preferred for reasons of instability. This problem is easily circumvented by considering a normalized affinity matrix: L = D −1/2 AD−1/2 , which is related to the stochastic matrix by a similarity transformation: L = D −1/2 M D1/2 . Because L is symmetric, it can be diagonalized: L = U ΛU T , where U = [u1 , u2 , · · · , un ] is an orthogonal set of eigenvectors and Λ is a diagonal matrix of eigenvalues [λ1 , λ2 , · · · , λn ] sorted in decreasing order. The eigenvectors have unit length uk = 1 and from the form of A and D it can be shown that the eigenvalues λi ∈ (−1, 1], with at least one eigenvalue equal to one. Without loss of generality, we take λ1 = 1. Because L and M are similar we can perform an eigen decomposition of the Markov transition matrix as: M = D1/2 LD−1/2 = D1/2 U Λ U T D−1/2 . Thus an eigenvector u of L corresponds to an eigenvector D 1/2 u of M with the same eigenvalue λ. The Markovian relaxation process after β iterations, namely M β , can be represented as: M β = D1/2 U Λβ U T D−1/2 . Therefore, a particle undertaking a random walk with an initial distribution p 0 acquires after β steps a distribution p β given by: p β = M β p 0 . Assuming the graph is connected, as β → ∞, the Markov chain approaches a unique n stationary distribution given by π = diag(D)/ i=1 di , and thus, M ∞ = π1T , where 1 is a n-dim column vector of all ones. Observe that π is an eigenvector of M as it is easy to show that M π = π and the corresponding eigenvalue is 1. Next, we show how to generate hierarchical, successively low-ranked approximations for the transition matrix M . 4 Building a Hierarchy of Transition Matrices The goal is to generate a very fast approximation, while simultaneously achieving sufficient accuracy. For notational ease, we think of M as a fine-scale representation and M as some coarse-scale approximation to be derived here. By coarsening M further, we can generate successive levels of the representation hierarchy. We use the stationary distribution π to construct a corresponding coarse-scale stationary distribution δ. As we just discussed a critical property of the fine scale Markov matrix M is that it is similar to the symmetric matrix L and we wish to preserve this property at every level of the representation hierarchy. 4.1 Deriving Coarse-Scale Stationary Distribution We begin by expressing the stationary distribution π as a probabilistic mixture of latent distributions. In matrix notation, we have (1) π = K δ, where δ is an unknown mixture coefficient vector of length m, K is an n × m non-negative n kernel matrix whose columns are latent distributions that each sum to 1: i=1 Ki,j = 1 and m n. It is easy to derive a maximum likelihood approximation of δ using an EM type algorithm [16]. The main step is to find a stationary point δ, λ for the Lagrangian: m n i=1 m Ki,j δj + λ πi ln E≡− j=1 δj − 1 . (2) j=1 An implicit step in this EM procedure is to compute the the ownership probability r i,j of the j th kernel (or node) at the coarse scale for the ith node on the fine scale and is given by ri,j = δj Ki,j . m k=1 δk Ki,k (3) The EM procedure allows for an update of both δ and the latent distributions in the kernel matrix K (see §8.3.1 in [6]). For initialization, δ is taken to be uniform over the coarse-scale states. But in choosing kernels K, we provide a good initialization for the EM procedure. Specifically, the Markov matrix M is diffused using a small number of iterations to get M β . The diffusion causes random walks from neighboring nodes to be less distinguishable. This in turn helps us select a small number of columns of M β in a fast and greedy way to be the kernel matrix K. We defer the exact details on kernel selection to a later section (§4.3). 4.2 Deriving the Coarse-Scale Transition Matrix In order to define M , the coarse-scale transition matrix, we break it down into three steps. First, the Markov chain propagation at the coarse scale can be defined as: q k+1 = M q k , (4) where q is the coarse scale probability distribution after k steps of the random walk. Second, we expand q k into the fine scale using the kernels K resulting in a fine scale probability distribution p k : p k = Kq k . (5) k Finally, we lift p k back into the coarse scale by using the ownership probability of the j th kernel for the ith node on the fine grid: n qjk+1 = ri,j pik i=1 (6) Substituting for Eqs.(3) and (5) in Eq. 6 gives n m qjk+1 = i=1 n Ki,t qtk = ri,j t=1 i=1 δj Ki,j m k=1 δk Ki,k m Ki,t qtk . (7) t=1 We can write the preceding equation in a matrix form: q k+1 = diag( δ ) K T diag K δ −1 Kq k . (8) Comparing this with Eq. 4, we can derive the transition matrix M as: M = diag( δ ) K T diag K δ −1 (9) K. It is easy to see that δ = M δ, so δ is the stationary distribution for M . Following the definition of M , and its stationary distribution δ, we can generate a symmetric coarse scale affinity matrix A given by A = M diag(δ) = diag( δ ) K T diag K δ −1 Kdiag(δ) , (10) where we substitute for the expression M from Eq. 9. The coarse-scale affinity matrix A is then normalized to get: L = D−1/2 AD−1/2 ; D = diag(d1 , d2 , · · · , dm ), (11) where dj is the degree of node j in the coarse-scale graph represented by the matrix A (see §3 for degree definition). Thus, the coarse scale Markov matrix M is precisely similar to a symmetric matrix L. 4.3 Selecting Kernels For demonstration purpose, we present the kernel selection details on the image of an eye shown below. To begin with, a random walk is defined where each pixel in the test image is associated with a vertex of the graph G. The edges in G are defined by the standard 8-neighbourhood of each pixel. For the demonstrations in this paper, the edge weight ai,j between neighbouring pixels xi and xj is given by a function of the difference in the 2 corresponding intensities I(xi ) and I(xj ): ai,j = exp(−(I(xi ) − I(xj ))2 /2σa ), where σa is set according to the median absolute difference |I(xi ) − I(xj )| between neighbours measured over the entire image. The affinity matrix A with the edge weights is then used to generate a Markov transition matrix M . The kernel selection process we use is fast and greedy. First, the fine scale Markov matrix M is diffused to M β using β = 4. The Markov matrix M is sparse as we make the affinity matrix A sparse. Every column in the diffused matrix M β is a potential kernel. To facilitate the selection process, the second step is to rank order the columns of M β based on a probability value in the stationary distribution π. Third, the kernels (i.e. columns of M β ) are picked in such a way that for a kernel Ki all of the neighbours of pixel i which are within the half-height of the the maximum value in the kernel Ki are suppressed from the selection process. Finally, the kernel selection is continued until every pixel in the image is within a half-height of the peak value of at least one kernel. If M is a full matrix, to avoid the expense of computing M β explicitly, random kernel centers can be selected, and only the corresponding columns of M β need be computed. We show results from a three-scale hierarchy on the eye image (below). The image has 25 × 20 pixels but is shown here enlarged for clarity. At the first coarse scale 83 kernels are picked. The kernels each correspond to a different column in the fine scale transition matrix and the pixels giving rise to these kernels are shown numbered on the image. Using these kernels as an initialization, the EM procedure derives a coarse-scale stationary distribution δ 21 14 26 4 (Eq. 2), while simultaneously updating the kernel ma12 27 2 19 trix. Using the newly updated kernel matrix K and the 5 8 13 23 30 18 6 9 derived stationary distribution δ a transition matrix M 28 20 15 32 10 22 is generated (Eq. 9). The coarse scale Markov matrix 24 17 7 is then diffused to M β , again using β = 4. The kernel Coarse Scale 1 Coarse Scale 2 selection algorithm is reapplied, this time picking 32 kernels for the second coarse scale. Larger values of β cause the coarser level to have fewer elements. But the exact number of elements depends on the form of the kernels themselves. For the random experiments that we describe later in §6 we found β = 2 in the first iteration and 4 thereafter causes the number of kernels to be reduced by a factor of roughly 1/3 to 1/4 at each level. 72 28 35 44 51 64 82 4 12 31 56 19 77 36 45 52 65 13 57 23 37 5 40 53 63 73 14 29 6 66 38 74 47 24 7 30 41 54 71 78 58 15 8 20 39 48 59 67 25 68 79 21 16 2 11 26 42 49 55 60 75 32 83 43 9 76 50 17 27 61 33 69 80 3 46 18 70 81 34 10 62 22 1 25 11 1 3 16 31 29 At coarser levels of the hierarchy, we expect the kernels to get less sparse and so will the affinity and the transition matrices. In order to promote sparsity at successive levels of the hierarchy we sparsify A by zeroing out elements associated with “small” transition probabilities in M . However, in the experiments described later in §6, we observe this sparsification step to be not critical. To summarize, we use the stationary distribution π at the fine-scale to derive a transition matrix M , and its stationary distribution δ, at the coarse-scale. The coarse scale transition in turn helps to derive an affinity matrix A and its normalized version L. It is obvious that this procedure can be repeated recursively. We describe next how to use this representation hierarchy for building a fast eigensolver. 5 Fast EigenSolver Our goal in generating a hierarchical representation of a transition matrix is to develop a fast, specialized eigen solver for spectral clustering. To this end, we perform a full eigen decomposition of the normalized affinity matrix only at the coarsest level. As discussed in the previous section, the affinity matrix at the coarsest level is not likely to be sparse, hence it will need a full (as opposed to a sparse) version of an eigen solver. However it is typically the case that e ≤ m n (even in the case of the three-scale hierarchy that we just considered) and hence we expect this step to be the least expensive computationally. The resulting eigenvectors are interpolated to the next lower level of the hierarchy by a process which will be described next. Because the eigen interpolation process between every adjacent pair of scales in the hierarchy is similar, we will assume we have access to the leading eigenvectors U (size: m × e) for the normalized affinity matrix L (size: m × m) and describe how to generate the leading eigenvectors U (size: n × e), and the leading eigenvalues S (size: e × 1), for the fine-scale normalized affinity matrix L (size: n × n). There are several steps to the eigen interpolation process and in the discussion that follows we refer to the lines in the pseudo-code presented below. First, the coarse-scale eigenvectors U can be interpolated using the kernel matrix K to generate U = K U , an approximation for the fine-scale eigenvectors (line 9). Second, interpolation alone is unlikely to set the directions of U exactly aligned with U L , the vectors one would obtain by a direct eigen decomposition of the fine-scale normalized affinity matrix L. We therefore update the directions in U by applying a small number of power iterations with L, as given in lines 13-15. e e function (U, S) = CoarseToFine(L, K, U , S) 1: INPUT 2: L, K ⇐ {L is n × n and K is n × m where m n} e e e e 3: U /S ⇐ {leading coarse-scale eigenvectors/eigenvalues of L. U is of size m × e, e ≤ m} 4: OUTPUT 5: U, S ⇐ {leading fine-scale eigenvectors/eigenvalues of L. U is n × e and S is e × 1.} x 10 0.4 3 0.96 0.94 0.92 0.9 0.35 2.5 Relative Error Absolute Relative Error 0.98 Eigen Value |δλ|λ−1 −3 Eigen Spectrum 1 2 1.5 1 5 10 15 20 Eigen Index (a) 25 30 0.2 0.15 0.1 0.5 0.88 0.3 0.25 0.05 5 10 15 20 Eigen Index (b) 25 30 5 10 15 20 Eigen Index 25 30 (c) Figure 1: Hierarchical eigensolver results. (a) comparing ground truth eigenvalues S L (red circles) with multi-scale eigensolver spectrum S (blue line) (b) Relative absolute error between eigenvalues: |S−SL | (c) Eigenvector mismatch: 1 − diag |U T UL | , between SL eigenvectors U derived by the multi-scale eigensolver and the ground truth U L . Observe the slight mismatch in the last few eigenvectors, but excellent agreement in the leading eigenvectors (see text). 6: CONSTANTS: TOL = 1e-4; POWER ITERS = 50 7: “ ” e 8: TPI = min POWER ITERS, log(e × eps/TOL)/ log(min(S)) {eps: machine accuracy} e 9: U = K U {interpolation from coarse to fine} 10: while not converged do 11: Uold = U {n × e matrix, e n} 12: for i = 1 to TPI do 13: U ⇐ LU 14: end for 15: U ⇐ Gram-Schmidt(U ) {orthogonalize U } 16: Le = U T LU {L may be sparse, but Le need not be.} 17: Ue Se UeT = svd(Le ) {eigenanalysis of Le , which is of size e × e.} 18: U ⇐ U Ue {update the leading eigenvectors of L} 19: S = diag(Se ) {grab the leading eigenvalues of L} T 20: innerProd = 1 − diag( Uold U ) {1 is a e × 1 vector of all ones} 21: converged = max[abs(innerProd)] < TOL 22: end while The number of power iterations TPI can be bounded as discussed next. Suppose v = U c where U is a matrix of true eigenvectors and c is a coefficient vector for an arbitrary vector v. After TPI power iterations v becomes v = U diag(S TPI )c, where S has the exact eigenvalues. In order for the component of a vector v in the direction Ue (the eth column of U ) not to be swamped by other components, we can limit it’s decay after TPI iterations as TPI follows: (S(e)/S(1)) >= e×eps/TOL, where S(e) is the exact eth eigenvalue, S(1) = 1, eps is the machine precision, TOL is requested accuracy. Because we do not have access to the exact value S(e) at the beginning of the interpolation procedure, we estimate it from the coarse eigenvalues S. This leads to a bound on the power iterations TPI, as derived on the line 9 above. Third, the interpolation process and the power iterations need not preserve orthogonality in the eigenvectors in U . We fix this by Gram-Schmidt orthogonalization procedure (line 16). Finally, there is a still a problem with power iterations that needs to be resolved, in that it is very hard to separate nearby eigenvalues. In particular, for the convergence of the power iterations the ratio that matters is between the (e + 1) st and eth eigenvalues. So the idea we pursue is to use the power iterations only to separate the reduced space of eigenvectors (of dimension e) from the orthogonal subspace (of dimension n − e). We then use a full SVD on the reduced space to update the leading eigenvectors U , and eigenvalues S, for the fine-scale (lines 17-20). This idea is similar to computing the Ritz values and Ritz vectors in a Rayleigh-Ritz method. 6 Interpolation Results Our multi-scale decomposition code is in Matlab. For the direct eigen decomposition, we have used the Matlab program svds.m which invokes the compiled ARPACKC routine [13], with a default convergence tolerance of 1e-10. In Fig. 1a we compare the spectrum S obtained from a three-scale decomposition on the eye image (blue line) with the ground truth, which is the spectrum SL resulting from direct eigen decomposition of the fine-scale normalized affinity matrices L (red circles). There is an excellent agreement in the leading eigenvalues. To illustrate this, we show absolute relative error between the spectra: |S−SL | in Fig. 1b. The spectra agree mostly, except for SL the last few eigenvalues. For a quantitative comparison between the eigenvectors, we plot in Fig. 1c the following measure: 1 − diag(|U T UL |), where U is the matrix of eigenvectors obtained by the multi-scale approximation, UL is the ground-truth resulting from a direct eigen decomposition of the fine-scale affinity matrix L and 1 is a vector of all ones. The relative error plot demonstrates a close match, within the tolerance threshold of 1e-4 that we chose for the multi-scale method, in the leading eigenvector directions between the two methods. The relative error is high with the last few eigen vectors, which suggests that the power iterations have not clearly separated them from other directions. So, the strategy we suggest is to pad the required number of leading eigen basis by about 20% before invoking the multi-scale procedure. Obviously, the number of hierarchical stages for the multi-scale procedure must be chosen such that the transition matrix at the coarsest scale can accommodate the slight increase in the subspace dimensions. For lack of space we are omitting extra results (see Ch.8 in [6]). Next we measure the time the hierarchical eigensolver takes to compute the leading eigenbasis for various input sizes, in comparison with the svds.m procedure [13]. We form images of different input sizes by Gaussian smoothing of i.i.d noise. The Gaussian function has a standard deviation of 3 pixels. The edges in graph G are defined by the standard 8-neighbourhood of each pixel. The edge weights between neighbouring pixels are simply given by a function of the difference in the corresponding intensities (see §4.3). The affinity matrix A with the edge weights is then used to generate a Markov transition matrix M . The fast eigensolver is run on ten different instances of the input image of a given size and the average of these times is reported here. For a fair comparison between the two procedures, we set the convergence tolerance value for the svds.m procedure to be 1e-4, the same as the one used for the fast eigensolver. We found the hierarchical representation derived from this tolerance threshold to be sufficiently accurate for a novel min-cut based segmentation results that we reported in [8]. Also, the subspace dimensionality is fixed to be 51 where we expect (and indeed observe) the leading 40 eigenpairs derived from the multi-scale procedure to be accurate. Hence, while invoking svds.m we compute only the leading 41 eigenpairs. In the table shown below, the first column corresponds to the number of nodes in the graph, while the second and third columns report the time taken in seconds by the svds.m procedure and the Matlab implementation of the multi-scale eigensolver respectively. The fourth column reports the speedups of the multi-scale eigensolver over svds.m procedure on a standard desktop (Intel P4, 2.5GHz, 1GB RAM). Lowering the tolerance threshold for svds.m made it faster by about 20 − 30%. Despite this, the multi-scale algorithm clearly outperforms the svds.m procedure. The most expensive step in the multi-scale algorithm is the power iteration required in the last stage, that is interpolating eigenvectors from the first coarse scale to the required fine scale. The complexity is of the order of n × e where e is the subspace dimensionality and n is the size of the graph. Indeed, from the table we can see that the multi-scale procedure is taking time roughly proportional to n. Deviations from the linear trend are observed at specific values of n, which we believe are due to the n 322 632 642 652 1002 1272 1282 1292 1602 2552 2562 2572 5112 5122 5132 6002 7002 8002 svds.m 1.6 10.8 20.5 12.6 44.2 91.1 230.9 96.9 179.3 819.2 2170.8 871.7 7977.2 20269 7887.2 10841.4 15048.8 Multi-Scale 1.5 4.9 5.5 5.1 13.1 20.4 35.2 20.9 34.4 90.3 188.7 93.3 458.8 739.3 461.9 644.2 1162.4 1936.6 Speedup 1.1 2.2 3.7 2.5 3.4 4.5 6.6 4.6 5.2 9.1 11.5 9.3 17.4 27.4 17.1 16.8 12.9 variations in the difficulty of the specific eigenvalue problem (eg. nearly multiple eigenvalues). The hierarchical representation has proven to be effective in a min-cut based segmentation algorithm that we proposed recently [8]. Here we explored the use of random walks and associated spectral embedding techniques for the automatic generation of suitable proposal (source and sink) regions for a min-cut based algorithm. The multiscale algorithm was used to generate the 40 leading eigenvectors of large transition matrices (eg. size 20K × 20K). In terms of future work, it will be useful to compare our work with other approximate methods for SVD such as [23]. Ack: We thank S. Roweis, F. Estrada and M. Sakr for valuable comments. References [1] D. Achlioptas and F. McSherry. Fast Computation of Low-Rank Approximations. STOC, 2001. [2] D. Achlioptas et al Sampling Techniques for Kernel Methods. NIPS, 2001. [3] S. Barnard and H. Simon Fast Multilevel Implementation of Recursive Spectral Bisection for Partitioning Unstructured Problems. PPSC, 627-632. [4] M. Belkin et al Laplacian Eigenmaps and Spectral Techniques for Embedding. NIPS, 2001. [5] M. Brand et al A unifying theorem for spectral embedding and clustering. AI & STATS, 2002. [6] C. Chennubhotla. Spectral Methods for Multi-scale Feature Extraction and Spectral Clustering. http://www.cs.toronto.edu/˜chakra/thesis.pdf Ph.D Thesis, Department of Computer Science, University of Toronto, Canada, 2004. [7] C. Chennubhotla and A. Jepson. Half-Lives of EigenFlows for Spectral Clustering. NIPS, 2002. [8] F. Estrada, A. Jepson and C. Chennubhotla. Spectral Embedding and Min-Cut for Image Segmentation. Manuscript Under Review, 2004. [9] C. Fowlkes et al Efficient spatiotemporal grouping using the Nystrom method. CVPR, 2001. [10] S. Belongie et al Spectral Partitioning with Indefinite Kernels using Nystrom app. ECCV, 2002. [11] A. Frieze et al Fast Monte-Carlo Algorithms for finding low-rank approximations. FOCS, 1998. [12] Y. Koren et al ACE: A Fast Multiscale Eigenvectors Computation for Drawing Huge Graphs IEEE Symp. on InfoVis 2002, pp. 137-144 [13] R. B. Lehoucq, D. C. Sorensen and C. Yang. ARPACK User Guide: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM 1998. [14] J. J. Lin. Reduced Rank Approximations of Transition Matrices. AI & STATS, 2002. [15] L. Lova’sz. Random Walks on Graphs: A Survey Combinatorics, 1996, 353–398. [16] G. J. McLachlan et al Mixture Models: Inference and Applications to Clustering. 1988 [17] M. Meila and J. Shi. A random walks view of spectral segmentation. AI & STATS, 2001. [18] A. Ng, M. Jordan and Y. Weiss. On Spectral Clustering: analysis and an algorithm NIPS, 2001. [19] A. Pothen Graph partitioning algorithms with applications to scientific computing. Parallel Numerical Algorithms, D. E. Keyes et al (eds.), Kluwer Academic Press, 1996. [20] G. L. Scott et al Feature grouping by relocalization of eigenvectors of the proximity matrix. BMVC, pg. 103-108, 1990. [21] E. Sharon et al Fast Multiscale Image Segmentation CVPR, I:70-77, 2000. [22] J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI, August, 2000. [23] H. Simon et al Low-Rank Matrix Approximation Using the Lanczos Bidiagonalization Process with Applications SIAM J. of Sci. Comp. 21(6):2257-2274, 2000. [24] N. Tishby et al Data clustering by Markovian Relaxation NIPS, 2001. [25] C. Williams et al Using the Nystrom method to speed up the kernel machines. NIPS, 2001.
4 0.71473271 178 nips-2004-Support Vector Classification with Input Data Uncertainty
Author: Jinbo Bi, Tong Zhang
Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classification, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efficiently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1
5 0.71089804 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
6 0.71059614 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
7 0.7094152 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
8 0.70671296 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
9 0.7048896 168 nips-2004-Semigroup Kernels on Finite Sets
10 0.70444745 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation
11 0.70357031 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
12 0.70337385 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
13 0.70305073 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill
14 0.70302725 131 nips-2004-Non-Local Manifold Tangent Learning
15 0.702447 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach
16 0.70117581 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
17 0.70095271 187 nips-2004-The Entire Regularization Path for the Support Vector Machine
18 0.70090842 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering
19 0.70079356 68 nips-2004-Face Detection --- Efficient and Rank Deficient
20 0.70029014 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters