nips nips2004 nips2004-55 knowledge-graph by maker-knowledge-mining

55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

Source: pdf

Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky

Abstract: We describe a three–dimensional geometric hand model suitable for visual tracking applications. The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. Inference in this model is complicated by the hand’s many degrees of freedom, as well as multimodal likelihoods caused by ambiguous image measurements. We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. We provide simulations showing that NBP may be used to reﬁne inaccurate model initializations, as well as track hand motion through extended image sequences. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We describe a three–dimensional geometric hand model suitable for visual tracking applications. [sent-10, score-0.317]

2 The kinematic constraints implied by the model’s joints have a probabilistic structure which is well described by a graphical model. [sent-11, score-0.589]

3 We use nonparametric belief propagation (NBP) to develop a tracking algorithm which exploits the graph’s structure to control complexity, while avoiding costly discretization. [sent-13, score-0.41]

4 While kinematic constraints naturally have a local structure, self– occlusions created by the imaging process lead to complex interpendencies in color and edge–based likelihood functions. [sent-14, score-0.55]

5 However, we show that local structure may be recovered by introducing binary hidden variables describing the occlusion state of each pixel. [sent-15, score-0.505]

6 We augment the NBP algorithm to infer these occlusion variables in a distributed fashion, and then analytically marginalize over them to produce hand position estimates which properly account for occlusion events. [sent-16, score-1.093]

7 We provide simulations showing that NBP may be used to reﬁne inaccurate model initializations, as well as track hand motion through extended image sequences. [sent-17, score-0.221]

8 1 Introduction Accurate visual detection and tracking of three–dimensional articulated objects is a challenging problem with applications in human–computer interfaces, motion capture, and scene understanding [1]. [sent-18, score-0.296]

9 In this paper, we develop a probabilistic method for tracking a geometric hand model from monocular image sequences. [sent-19, score-0.346]

10 Because articulated hand models have many (roughly 26) degrees of freedom, exact representation of the posterior distribution over model conﬁgurations is intractable. [sent-20, score-0.239]

11 This has motived many researchers to consider nonparametric representations, including particle ﬁlters [4, 5] and deterministic multiscale discretizations [6]. [sent-22, score-0.177]

12 However, the hand’s high dimensionality can cause these trackers to suffer catastrophic failures, requiring the use of models which limit the hand’s motion [4] or sophisticated prior models of hand conﬁgurations and dynamics [5, 6]. [sent-23, score-0.224]

13 An alternative way to address the high dimensionality of articulated tracking problems is to describe the posterior distribution’s statistical structure using a graphical model. [sent-24, score-0.301]

14 Graph- Figure 1: Projected edges (left block) and silhouettes (right block) for a conﬁguration of the 3D structural hand model matching the given image. [sent-25, score-0.305]

15 ical models have been used to track view–based human body representations [7], contour models of restricted hand conﬁgurations [8], view–based 2. [sent-27, score-0.235]

16 5D “cardboard” models of hands and people [9], and a full 3D kinematic human body model [10]. [sent-28, score-0.491]

17 Instead, these trackers are based on recently proposed extensions of particle ﬁlters to general graphs: mean ﬁeld Monte Carlo in [9], and nonparametric belief propagation (NBP) [11, 12] in [10]. [sent-30, score-0.342]

18 To derive a graphical model for the tracking problem, we consider a redundant local representation in which each hand component is described by its own three– dimensional position and orientation. [sent-32, score-0.453]

19 We show that the model’s kinematic constraints, including self–intersection constraints not captured by joint angle representations, take a simple form in this local representation. [sent-33, score-0.502]

20 We also provide a local decomposition of the likelihood function which properly handles occlusion in a distributed fashion, a signiﬁcant improvement over our earlier tracking results [13]. [sent-34, score-0.698]

21 2 Geometric Hand Modeling Structurally, the hand is composed of sixteen approximately rigid components: three phalanges or links for each ﬁnger and thumb, as well as the palm [1]. [sent-36, score-0.368]

22 As proposed by [2, 3], we model each rigid body by one or more truncated quadrics (ellipsoids, cones, and cylinders) of ﬁxed size. [sent-37, score-0.299]

23 These geometric primitives are well matched to the true geometry of the hand, allow tracking from arbitrary orientations (in contrast to 2. [sent-38, score-0.235]

24 Figure 1 shows the edges and silhouettes corresponding to a sample hand model conﬁguration. [sent-40, score-0.219]

25 1 Kinematic Representation and Constraints The kinematic constraints between different hand model components are well described by revolute joints [1]. [sent-43, score-0.666]

26 Figure 2(a) shows a graph describing this kinematic structure, in which nodes correspond to rigid bodies and edges to joints. [sent-44, score-0.701]

27 The two joints connecting the phalanges of each ﬁnger and thumb have a single rotational degree of freedom, while the joints connecting the base of each ﬁnger to the palm have two degrees of freedom (corresponding to grasping and spreading motions). [sent-45, score-0.436]

28 Forward kinematic transformations may be used to determine the ﬁnger positions corresponding to a given set of joint angles. [sent-47, score-0.412]

29 While most model–based hand trackers use this joint angle parameterization, we instead explore a redundant representation in which the ith rigid body is described by its position qi and orientation ri (a unit quaternion). [sent-48, score-0.601]

30 Let xi = (qi , ri ) denote this local description of each component, and x = {x1 , . [sent-49, score-0.234]

31 Clearly, there are dependencies among the elements of x implied by the kinematic con- (a) (b) (c) (d) Figure 2: Graphs describing the hand model’s constraints. [sent-53, score-0.524]

32 Let EK be the set of all pairs of rigid bodies which are connected by joints, or equivalently the edges in the kinematic graph of Fig. [sent-59, score-0.701]

33 For each joint (i, j) ∈ EK , K deﬁne an indicator function ψi,j (xi , xj ) which is equal to one if the pair (xi , xj ) are valid rigid body conﬁgurations associated with some setting of the angles of joint (i, j), and zero otherwise. [sent-61, score-0.643]

34 This constraint is complex in a joint angle parameterization, but simple in our local representation: the position and orientation of every pair of rigid bodies must be such that their component quadric surfaces do not intersect. [sent-66, score-0.417]

35 First, we only explicitly constrain those pairs of rigid bodies which are most likely to intersect, corresponding to the edges ES of the graph in Fig. [sent-68, score-0.326]

36 Furthermore, because the relative orientations of each ﬁnger’s quadrics are implicitly constrained by the kinematic prior pK (x), we may detect most intersections based on the distance between object centroids. [sent-70, score-0.466]

37 The structural prior is then given by 1 ||qi − qj || > δi,j S S pS (x) ∝ ψi,j (xi , xj ) ψi,j (xi , xj ) = (2) 0 otherwise (i,j)∈ES where δi,j is determined from the quadrics composing rigid bodies i and j. [sent-71, score-0.664]

38 Empirically, we ﬁnd that this constraint helps prevent different ﬁngers from tracking the same image data. [sent-72, score-0.191]

39 In order to track hand motion, we must model the hand’s dynamics. [sent-73, score-0.149]

40 Let xt denote the i position and orientation of the ith hand component at time t, and xt = {xt , . [sent-74, score-0.419]

41 2(c)): 16 N xt − xt−1 ; 0, Λi i i pT xt | xt−1 = (3) i=1 Although this temporal model is factorized, the kinematic constraints at the following time step implicitly couple the corresponding random walks. [sent-79, score-0.661]

42 Images without people were used to create a histogram model pbkgd of non–skin pixels. [sent-82, score-0.171]

43 Let Ω(x) denote the silhouette of projected hand conﬁguration x. [sent-83, score-0.153]

44 1 Distributed Occlusion Reasoning In conﬁgurations where there is no self–occlusion, pC (y | x) decomposes as a product of local likelihood terms involving the projections Ω(xi ) of individual hand components [13]. [sent-86, score-0.183]

45 To allow a similar decomposition (and hence distributed inference) when there is occlusion, we augment the conﬁguration xi of each node with a set of binary hidden variables zi = {zi(u) }u∈Υ . [sent-87, score-0.463]

46 Letting zi(u) = 0 if pixel u in the projection of rigid body i is occluded by any other body, and 1 otherwise, the color likelihood (eq. [sent-88, score-0.409]

47 (4)) may be rewritten as 16 pC (y | x, z) = i=1 u∈Ω(xi ) pskin (u) pbkgd (u) 16 zi(u) pC (y | xi , zi ) = (5) i=1 Assuming they are set consistently with the hand conﬁguration x, the hidden occlusion variables z ensure that the likelihood of each pixel in Ω(x) is counted exactly once. [sent-89, score-1.248]

48 We may enforce consistency of the occlusion variables using the following function: 0 if xj occludes xi , u ∈ Ω(xj ), and zi(u) = 1 η(xj , zi(u) ; xi ) = (6) 1 otherwise Note that because our rigid bodies are convex and nonintersecting, they can never take mutually occluding conﬁgurations. [sent-90, score-1.2]

49 The constraint η(xj , zi(u) ; xi ) is zero precisely when pixel u in the projection of xi should be occluded by xj , but zi(u) is in the unoccluded state. [sent-91, score-0.568]

50 The following potential encodes all of the occlusion relationships between nodes i and j: O ψi,j (xi , zi , xj , zj ) = η(xj , zi(u) ; xi ) η(xi , zj (u) ; xj ) (7) u∈Υ These occlusion constraints exist between all pairs of nodes. [sent-92, score-1.692]

51 2(d)) most prone to occlusion: O pO (x, z) ∝ ψi,j (xi , zi , xj , zj ) (8) xj z i(u) (i,j)∈EO Figure 3 shows a factor graph for the occlusion relationships between xi and its neighbors, as well as the observation potential pC (y | xi , zi ). [sent-94, score-1.591]

52 The occlusion potential η(xj , zi(u) ; xi ) has a very weak dependence on xi , depending only on whether xi is behind xj relative to the camera. [sent-95, score-1.103]

53 2 Modeling Edge Filter Responses xk y xi u ∈ϒ Figure 3: Factor graph showing p(y | xi , zi ), and the occlusion constraints placed on xi by xj , xk . [sent-97, score-1.419]

54 Using boundaries labeled in training images, we estimated a histogram pon of the response of a derivative of Gaussian ﬁlter steered to the edge’s orientation [8, 10]. [sent-101, score-0.137]

55 Then, again assuming pixel independence, image y has edge likelihood 16 pE (y | x, z) ∝ u∈Π(x) pon (u) = poﬀ (u) i=1 pon (u) poﬀ (u) u∈Π(xi ) 16 zi(u) pE (y | xi , zi ) = (9) i=1 where we have used the same occlusion variables z to allow a local decomposition. [sent-104, score-1.145]

56 When τ video frames are observed, the overall posterior distribution is given by τ p xt | y t pT (xt | xt−1 ) p (x | y) ∝ (11) t=1 Excluding the potentials involving occlusion variables, which we discuss in detail in Sec. [sent-106, score-0.57]

57 (11) is an example of a pairwise Markov random ﬁeld: p (x | y) ∝ ψi,j (xi , xj ) ψi (xi , y) (12) i∈V (i,j)∈E Hand tracking can thus be posed as inference in a graphical model, a problem we propose to solve using belief propagation (BP) [15]. [sent-109, score-0.493]

58 1 Nonparametric Representations For the hand tracking problem, the rigid body conﬁgurations xi are six–dimensional continuous variables, making accurate discretization intractable. [sent-113, score-0.699]

59 Instead, we employ nonparametric, particle–based approximations to these messages using the nonparametric belief propagation (NBP) algorithm [11, 12]. [sent-114, score-0.343]

60 Both types of messages are needed for hand tracking, as we discuss below. [sent-116, score-0.214]

61 For the general form of these updates, see [11]; the following sections focus on the details of the hand tracking implementation. [sent-118, score-0.281]

62 The hand tracking application is complicated by the fact that the orientation component ri of xi = (qi , ri ) is an element of the rotation group SO(3). [sent-119, score-0.593]

63 2 Marginal Computation BP’s estimate of the belief p (xi | y) is equal to the product of the incoming messages from ˆ neighboring nodes with the local observation potential (see eq. [sent-123, score-0.204]

64 First, M samples are drawn from the product of the incoming kinematic and temporal messages, which are Gaussian mixtures. [sent-126, score-0.413]

65 Following normalization of the rotational component, each sample is assigned a weight equal to the product of the color and edge likelihoods with any structural messages. [sent-128, score-0.211]

66 To derive BP updates for the occlusion masks zi , we ﬁrst cluster (xi , zi ) for each hand component so that p (xt , z t | y t ) has a pairwise form (as in eq. [sent-130, score-1.083]

67 In principle, NBP could manage occlusion constraints by sampling candidate occlusion masks zi along with rigid body conﬁgurations xi . [sent-132, score-1.622]

68 However, due to the exponentially large number of possible occlusion masks, we employ a more efﬁcient analytic approximation. [sent-133, score-0.441]

69 Consider the BP message sent from xj to (zi , xi ), calculated by applying eq. [sent-134, score-0.474]

70 (13) to the occlusion potential u η(xj , zi(u) ; xi ). [sent-135, score-0.614]

71 We assume that p (xj | y) is well separated from ˆ any candidate xi , a situation typically ensured by the kinematic and structural constraints. [sent-136, score-0.634]

72 The occlusion constraint’s weak dependence on xi (see Fig. [sent-137, score-0.614]

73 If xi lies in front of typical xj conﬁgurations, the BP message µj,i(u) (zi(u) ) is uninformative. [sent-139, score-0.442]

74 If xi is occluded, the message approximately equals µj,i(u) (zi(u) = 0) = 1 µj,i(u) (zi(u) = 1) = 1 − Pr [u ∈ Ω(xj )] (15) where we have neglected correlations among pixel occlusion states, and where the probability is computed with respect to p (xj | y). [sent-140, score-0.779]

75 By taking the product of these messages ˆ µk,i(u) (zi(u) ) from all potential occluders xk and normalizing, we may determine an approximation to the marginal occlusion probability νi(u) Pr[zi(u) = 0]. [sent-141, score-0.602]

76 The edge likelihood pE (y | xi ) averages over zi similarly. [sent-143, score-0.464]

77 The NBP estimate of p (xi | y) is determined by sampling conﬁgurations of xi as before, ˆ and reweighting them using these occlusion–sensitive likelihood functions. [sent-144, score-0.205]

78 Note that messages sent 1 2 1 2 Figure 4: Reﬁnement of a coarse initialization following one and two NBP iterations, both without (left) and with (right) occlusion reasoning. [sent-147, score-0.568]

79 along kinematic, structural, and temporal edges depend only on the belief p (xi | y) followˆ ing marginalization over occlusion variables zi . [sent-150, score-0.872]

80 Details and pseudocode for the message propagation step are provided in [13]. [sent-151, score-0.189]

81 For kinematic constraints, we sample uniformly among permissable joint angles, and then use forward kinematics to propagate samples from pn−1 (xi | y) /mn−1 (xi ) to hypothesized ˆ ji conﬁgurations of xj . [sent-152, score-0.587]

82 Following [12], temporal messages are determined by adjusting the bandwidths of the current marginal estimate p (xi | y) to match the temporal covariance Λi . [sent-153, score-0.237]

83 (2)) equal one for all state conﬁgurations outside some ball, the ideal structural messages are not ﬁnitely integrable. [sent-155, score-0.181]

84 We therefore approximate the structural message mij (xj ) as an analytic function equal to the weights of all kernels in p (xi | y) outside a ball centered at qj , the position of xj . [sent-156, score-0.384]

85 ˆ 5 Simulations We now present a set of computational examples which investigate the performance of our distributed occlusion reasoning; see [13] for additional simulations. [sent-157, score-0.472]

86 When occlusion constraints are neglected, the NBP estimates associate the ring and middle ﬁngers with the same image pixels, and miss the true middle ﬁnger location. [sent-160, score-0.528]

87 With proper occlusion reasoning, however, the correct hand conﬁguration is identiﬁed. [sent-161, score-0.56]

88 Video sequences demonstrating the NBP hand tracker are available at Selected frames from two of these sequences are shown in Fig. [sent-163, score-0.164]

89 5, in which ﬁltered estimates are computed by a single “forward” sequence of temporal message updates. [sent-164, score-0.164]

90 The ﬁrst sequence shows successful tracking through a partial occlusion of the ring ﬁnger by the middle ﬁnger, while the second shows a grasping motion in which the ﬁngers occlude each other. [sent-166, score-0.686]

91 For both of these sequences, rough tracking (not shown) is possible without occlusion reasoning, since all ﬁngers are the same color and the background is unambiguous. [sent-167, score-0.656]

92 However, we ﬁnd that stability improves when occlusion reasoning is used to properly discount obscured edges and silhouettes. [sent-168, score-0.554]

93 In contrast, we have shown that an NBP tracker may be built around the local structure of the true kinematic constraints. [sent-175, score-0.452]

94 Practically, our formulation avoids the need to explicitly approximate kinematic constraints, and allows us to build a functional tracker without the need for precise, labelled training data. [sent-177, score-0.42]

95 Figure 5: Four frames from two different video sequences: a hand rotation containing ﬁnger occlusion (top), and a grasping motion (bottom). [sent-178, score-0.677]

96 We have described the graphical structure underlying a kinematic model of the hand, and used this model to build a tracking algorithm using nonparametric BP. [sent-180, score-0.693]

97 By appropriately augmenting the model’s state, we are able to perform occlusion reasoning in a distributed fashion. [sent-181, score-0.528]

98 The modular state representation and robust, local computations of NBP offer a solution particularly well suited to visual tracking of articulated objects. [sent-182, score-0.285]

99 Attractive people: Assembling loose– limbed models using nonparametric belief propagation. [sent-258, score-0.185]

100 Constructing free energy approximations and generalized belief propagation algorithms. [sent-302, score-0.14]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('occlusion', 0.441), ('kinematic', 0.375), ('nbp', 0.358), ('zi', 0.227), ('xi', 0.173), ('tracking', 0.162), ('rigid', 0.159), ('xj', 0.143), ('nger', 0.142), ('message', 0.126), ('hand', 0.119), ('nonparametric', 0.108), ('pbkgd', 0.107), ('gurations', 0.098), ('messages', 0.095), ('xt', 0.095), ('pc', 0.091), ('articulated', 0.091), ('ngers', 0.089), ('structural', 0.086), ('body', 0.086), ('bodies', 0.079), ('pskin', 0.078), ('joints', 0.078), ('belief', 0.077), ('bp', 0.076), ('po', 0.071), ('marginal', 0.066), ('con', 0.065), ('propagation', 0.063), ('trackers', 0.062), ('constraints', 0.058), ('edges', 0.057), ('reasoning', 0.056), ('guration', 0.056), ('palm', 0.054), ('pon', 0.054), ('quadrics', 0.054), ('sudderth', 0.053), ('color', 0.053), ('mn', 0.051), ('orientation', 0.049), ('graphical', 0.048), ('ek', 0.047), ('pe', 0.047), ('eo', 0.047), ('sigal', 0.047), ('thumb', 0.047), ('freeman', 0.046), ('tracker', 0.045), ('motion', 0.043), ('skin', 0.043), ('silhouettes', 0.043), ('ihler', 0.043), ('pk', 0.042), ('cvpr', 0.042), ('self', 0.041), ('rotational', 0.04), ('grasping', 0.04), ('occluded', 0.04), ('pixel', 0.039), ('angles', 0.038), ('temporal', 0.038), ('orientations', 0.037), ('masks', 0.037), ('multiscale', 0.037), ('joint', 0.037), ('pn', 0.037), ('geometric', 0.036), ('cardboard', 0.036), ('phalanges', 0.036), ('revolute', 0.036), ('video', 0.034), ('freedom', 0.034), ('histogram', 0.034), ('projected', 0.034), ('zj', 0.033), ('local', 0.032), ('likelihood', 0.032), ('edge', 0.032), ('component', 0.032), ('variables', 0.032), ('ji', 0.032), ('particle', 0.032), ('sent', 0.032), ('stenger', 0.031), ('mandel', 0.031), ('dimensional', 0.031), ('qi', 0.031), ('distributed', 0.031), ('graph', 0.031), ('iccv', 0.03), ('implied', 0.03), ('people', 0.03), ('track', 0.03), ('position', 0.029), ('volume', 0.029), ('image', 0.029), ('ri', 0.029), ('degrees', 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky

2 0.17763476 116 nips-2004-Message Errors in Belief Propagation

Author: Alexander T. Ihler, John W. Fisher, Alan S. Willsky

Abstract: Belief propagation (BP) is an increasingly popular method of performing approximate inference on arbitrary graphical models. At times, even further approximations are required, whether from quantization or other simpliﬁed message representations or from stochastic approximation methods. Introducing such errors into the BP message computations has the potential to adversely affect the solution obtained. We analyze this effect with respect to a particular measure of message error, and show bounds on the accumulation of errors in the system. This leads both to convergence conditions and error bounds in traditional and approximate BP message passing. 1

3 0.13311227 83 nips-2004-Incremental Learning for Visual Tracking

Author: Jongwoo Lim, David A. Ross, Ruei-sung Lin, Ming-Hsuan Yang

Abstract: Most existing tracking algorithms construct a representation of a target object prior to the tracking task starts, and utilize invariant features to handle appearance variation of the target caused by lighting, pose, and view angle change. In this paper, we present an efﬁcient and effective online algorithm that incrementally learns and adapts a low dimensional eigenspace representation to reﬂect appearance changes of the target, thereby facilitating the tracking task. Furthermore, our incremental method correctly updates the sample mean and the eigenbasis, whereas existing incremental subspace update methods ignore the fact the sample mean varies over time. The tracking problem is formulated as a state inference problem within a Markov Chain Monte Carlo framework and a particle ﬁlter is incorporated for propagating sample distributions over time. Numerous experiments demonstrate the effectiveness of the proposed tracking algorithm in indoor and outdoor environments where the target objects undergo large pose and lighting changes. 1

4 0.12675707 178 nips-2004-Support Vector Classification with Input Data Uncertainty

Author: Jinbo Bi, Tong Zhang

Abstract: This paper investigates a new learning model in which the input data is corrupted with noise. We present a general statistical framework to tackle this problem. Based on the statistical reasoning, we propose a novel formulation of support vector classiﬁcation, which allows uncertainty in input data. We derive an intuitive geometric interpretation of the proposed formulation, and develop algorithms to efﬁciently solve it. Empirical results are included to show that the newly formed method is superior to the standard SVM for problems with noisy input. 1

5 0.12300111 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Author: Zhengdong Lu, Todd K. Leen

Abstract: While clustering is usually an unsupervised operation, there are circumstances in which we believe (with varying degrees of certainty) that items A and B should be assigned to the same cluster, while items A and C should not. We would like such pairwise relations to inﬂuence cluster assignments of out-of-sample data in a manner consistent with the prior knowledge expressed in the training set. Our starting point is probabilistic clustering based on Gaussian mixture models (GMM) of the data distribution. We express clustering preferences in the prior distribution over assignments of data points to clusters. This prior penalizes cluster assignments according to the degree with which they violate the preferences. We ﬁt the model parameters with EM. Experiments on a variety of data sets show that PPC can consistently improve clustering results.

6 0.11582123 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

7 0.098085068 203 nips-2004-Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks

8 0.089849763 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

9 0.08768779 67 nips-2004-Exponentiated Gradient Algorithms for Large-margin Structured Classification

10 0.083890796 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces

11 0.083604112 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

12 0.082436264 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

13 0.079781301 44 nips-2004-Conditional Random Fields for Object Recognition

14 0.076014794 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

15 0.072753176 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment

16 0.07251963 73 nips-2004-Generative Affine Localisation and Tracking

17 0.072349757 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

18 0.069593899 76 nips-2004-Hierarchical Bayesian Inference in Networks of Spiking Neurons

19 0.068851694 63 nips-2004-Expectation Consistent Free Energies for Approximate Inference

20 0.068222627 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.214), (1, 0.007), (2, 0.01), (3, -0.084), (4, 0.182), (5, -0.052), (6, -0.017), (7, 0.043), (8, -0.186), (9, -0.117), (10, 0.138), (11, 0.009), (12, 0.074), (13, 0.01), (14, 0.089), (15, 0.062), (16, -0.07), (17, 0.01), (18, 0.028), (19, 0.191), (20, -0.067), (21, -0.02), (22, 0.085), (23, 0.015), (24, 0.022), (25, -0.102), (26, 0.002), (27, 0.12), (28, -0.024), (29, 0.04), (30, -0.002), (31, -0.007), (32, -0.082), (33, -0.024), (34, 0.058), (35, 0.067), (36, -0.019), (37, 0.079), (38, 0.008), (39, 0.015), (40, 0.094), (41, -0.016), (42, 0.14), (43, -0.13), (44, -0.017), (45, 0.206), (46, 0.028), (47, 0.089), (48, -0.011), (49, -0.061)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94482565 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky

2 0.55606109 116 nips-2004-Message Errors in Belief Propagation

Author: Alexander T. Ihler, John W. Fisher, Alan S. Willsky

3 0.53220117 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

Author: Zhengdong Lu, Todd K. Leen

4 0.5210405 29 nips-2004-Beat Tracking the Graphical Model Way

Author: Dustin Lang, Nando D. Freitas

Abstract: We present a graphical model for beat tracking in recorded music. Using a probabilistic graphical model allows us to incorporate local information and global smoothness constraints in a principled manner. We evaluate our model on a set of varied and difﬁcult examples, and achieve impressive results. By using a fast dual-tree algorithm for graphical model inference, our system runs in less time than the duration of the music being processed. 1

5 0.49450707 203 nips-2004-Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks

Author: Joris M. Mooij, Hilbert J. Kappen

Abstract: We introduce a computationally efﬁcient method to estimate the validity of the BP method as a function of graph topology, the connectivity strength, frustration and network size. We present numerical results that demonstrate the correctness of our estimates for the uniform random model and for a real-world network (“C. Elegans”). Although the method is restricted to pair-wise interactions, no local evidence (zero “biases”) and binary variables, we believe that its predictions correctly capture the limitations of BP for inference and MAP estimation on arbitrary graphical models. Using this approach, we ﬁnd that BP always performs better than MF. Especially for large networks with broad degree distributions (such as scale-free networks) BP turns out to signiﬁcantly outperform MF. 1

6 0.44968587 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

7 0.43981859 83 nips-2004-Incremental Learning for Visual Tracking

8 0.42887306 178 nips-2004-Support Vector Classification with Input Data Uncertainty

9 0.42305249 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

10 0.41729245 41 nips-2004-Comparing Beliefs, Surveys, and Random Walks

11 0.40692887 63 nips-2004-Expectation Consistent Free Energies for Approximate Inference

12 0.3935689 86 nips-2004-Instance-Specific Bayesian Model Averaging for Classification

13 0.39250112 73 nips-2004-Generative Affine Localisation and Tracking

14 0.38096157 186 nips-2004-The Correlated Correspondence Algorithm for Unsupervised Registration of Nonrigid Surfaces

15 0.37990576 151 nips-2004-Rate- and Phase-coded Autoassociative Memory

16 0.37144986 122 nips-2004-Modelling Uncertainty in the Game of Go

17 0.36706901 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

18 0.3648499 17 nips-2004-Adaptive Manifold Learning

19 0.36450225 21 nips-2004-An Information Maximization Model of Eye Movements

20 0.35856882 43 nips-2004-Conditional Models of Identity Uncertainty with Application to Noun Coreference

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.243), (13, 0.073), (15, 0.173), (17, 0.013), (26, 0.042), (31, 0.023), (33, 0.181), (35, 0.023), (39, 0.028), (50, 0.035), (71, 0.026), (88, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.86501104 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation

Author: Erik B. Sudderth, Michael I. Mandel, William T. Freeman, Alan S. Willsky

2 0.86500388 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification

Author: Felix A. Wichmann, Arnulf B. Graf, Heinrich H. Bülthoff, Eero P. Simoncelli, Bernhard Schölkopf

Abstract: We study gender discrimination of human faces using a combination of psychophysical classiﬁcation and discrimination experiments together with methods from machine learning. We reduce the dimensionality of a set of face images using principal component analysis, and then train a set of linear classiﬁers on this reduced representation (linear support vector machines (SVMs), relevance vector machines (RVMs), Fisher linear discriminant (FLD), and prototype (prot) classiﬁers) using human classiﬁcation data. Because we combine a linear preprocessor with linear classiﬁers, the entire system acts as a linear classiﬁer, allowing us to visualise the decision-image corresponding to the normal vector of the separating hyperplanes (SH) of each classiﬁer. We predict that the female-tomaleness transition along the normal vector for classiﬁers closely mimicking human classiﬁcation (SVM and RVM [1]) should be faster than the transition along any other direction. A psychophysical discrimination experiment using the decision images as stimuli is consistent with this prediction. 1

3 0.84543532 177 nips-2004-Supervised Graph Inference

Author: Jean-philippe Vert, Yoshihiro Yamanishi

Abstract: We formulate the problem of graph inference where part of the graph is known as a supervised learning problem, and propose an algorithm to solve it. The method involves the learning of a mapping of the vertices to a Euclidean space where the graph is easy to infer, and can be formulated as an optimization problem in a reproducing kernel Hilbert space. We report encouraging results on the problem of metabolic network reconstruction from genomic data. 1

4 0.73571295 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

Author: Ruei-sung Lin, David A. Ross, Jongwoo Lim, Ming-Hsuan Yang

Abstract: This paper presents an adaptive discriminative generative model that generalizes the conventional Fisher Linear Discriminant algorithm and renders a proper probabilistic interpretation. Within the context of object tracking, we aim to ﬁnd a discriminative generative model that best separates the target from the background. We present a computationally efﬁcient algorithm to constantly update this discriminative model as time progresses. While most tracking algorithms operate on the premise that the object appearance or ambient lighting condition does not signiﬁcantly change as time progresses, our method adapts a discriminative generative model to reﬂect appearance variation of the target and background, thereby facilitating the tracking task in ever-changing environments. Numerous experiments show that our method is able to learn a discriminative generative model for tracking target objects undergoing large pose and lighting changes.

5 0.73507917 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

Author: Chakra Chennubhotla, Allan D. Jepson

Abstract: We show how to build hierarchical, reduced-rank representation for large stochastic matrices and use this representation to design an efﬁcient algorithm for computing the largest eigenvalues, and the corresponding eigenvectors. In particular, the eigen problem is ﬁrst solved at the coarsest level of the representation. The approximate eigen solution is then interpolated over successive levels of the hierarchy. A small number of power iterations are employed at each stage to correct the eigen solution. The typical speedups obtained by a Matlab implementation of our fast eigensolver over a standard sparse matrix eigensolver [13] are at least a factor of ten for large image sizes. The hierarchical representation has proven to be effective in a min-cut based segmentation algorithm that we proposed recently [8]. 1 Spectral Methods Graph-theoretic spectral methods have gained popularity in a variety of application domains: segmenting images [22]; embedding in low-dimensional spaces [4, 5, 8]; and clustering parallel scientiﬁc computation tasks [19]. Spectral methods enable the study of properties global to a dataset, using only local (pairwise) similarity or afﬁnity measurements between the data points. The global properties that emerge are best understood in terms of a random walk formulation on the graph. For example, the graph can be partitioned into clusters by analyzing the perturbations to the stationary distribution of a Markovian relaxation process deﬁned in terms of the afﬁnity weights [17, 18, 24, 7]. The Markovian relaxation process need never be explicitly carried out; instead, it can be analytically expressed using the leading order eigenvectors, and eigenvalues, of the Markov transition matrix. In this paper we consider the practical application of spectral methods to large datasets. In particular, the eigen decomposition can be very expensive, on the order of O(n 3 ), where n is the number of nodes in the graph. While it is possible to compute analytically the ﬁrst eigenvector (see §3 below), the remaining subspace of vectors (necessary for say clustering) has to be explicitly computed. A typical approach to dealing with this difﬁculty is to ﬁrst sparsify the links in the graph [22] and then apply an efﬁcient eigensolver [13, 23, 3]. In comparison, we propose in this paper a specialized eigensolver suitable for large stochastic matrices with known stationary distributions. In particular, we exploit the spectral properties of the Markov transition matrix to generate hierarchical, successively lower-ranked approximations to the full transition matrix. The eigen problem is solved directly at the coarsest level of representation. The approximate eigen solution is then interpolated over successive levels of the hierarchy, using a small number of power iterations to correct the solution at each stage. 2 Previous Work One approach to speeding up the eigen decomposition is to use the fact that the columns of the afﬁnity matrix are typically correlated. The idea then is to pick a small number of representative columns to perform eigen decomposition via SVD. For example, in the Nystrom approximation procedure, originally proposed for integral eigenvalue problems, the idea is to randomly pick a small set of m columns; generate the corresponding afﬁnity matrix; solve the eigenproblem and ﬁnally extend the solution to the complete graph [9, 10]. The Nystrom method has also been recently applied in the kernel learning methods for fast Gaussian process classiﬁcation and regression [25]. Other sampling-based approaches include the work reported in [1, 2, 11]. Our starting point is the transition matrix generated from afﬁnity weights and we show how building a representational hierarchy follows naturally from considering the stochastic matrix. A closely related work is the paper by Lin on reduced rank approximations of transition matrices [14]. We differ in how we approximate the transition matrices, in particular our objective function is computationally less expensive to solve. In particular, one of our goals in reducing transition matrices is to develop a fast, specialized eigen solver for spectral clustering. Fast eigensolving is also the goal in ACE [12], where successive levels in the hierarchy can potentially have negative afﬁnities. A graph coarsening process for clustering was also pursued in [21, 3]. 3 Markov Chain Terminology We ﬁrst provide a brief overview of the Markov chain terminology here (for more details see [17, 15, 6]). We consider an undirected graph G = (V, E) with vertices vi , for i = {1, . . . , n}, and edges ei,j with non-negative weights ai,j . Here the weight ai,j represents the afﬁnity between vertices vi and vj . The afﬁnities are represented by a non-negative, symmetric n × n matrix A having weights ai,j as elements. The degree of a node j is n n deﬁned to be: dj = i=1 ai,j = j=1 aj,i , where we deﬁne D = diag(d1 , . . . , dn ). A Markov chain is deﬁned using these afﬁnities by setting a transition probability matrix M = AD −1 , where the columns of M each sum to 1. The transition probability matrix deﬁnes the random walk of a particle on the graph G. The random walk need never be explicitly carried out; instead, it can be analytically expressed using the leading order eigenvectors, and eigenvalues, of the Markov transition matrix. Because the stochastic matrices need not be symmetric in general, a direct eigen decomposition step is not preferred for reasons of instability. This problem is easily circumvented by considering a normalized afﬁnity matrix: L = D −1/2 AD−1/2 , which is related to the stochastic matrix by a similarity transformation: L = D −1/2 M D1/2 . Because L is symmetric, it can be diagonalized: L = U ΛU T , where U = [u1 , u2 , · · · , un ] is an orthogonal set of eigenvectors and Λ is a diagonal matrix of eigenvalues [λ1 , λ2 , · · · , λn ] sorted in decreasing order. The eigenvectors have unit length uk = 1 and from the form of A and D it can be shown that the eigenvalues λi ∈ (−1, 1], with at least one eigenvalue equal to one. Without loss of generality, we take λ1 = 1. Because L and M are similar we can perform an eigen decomposition of the Markov transition matrix as: M = D1/2 LD−1/2 = D1/2 U Λ U T D−1/2 . Thus an eigenvector u of L corresponds to an eigenvector D 1/2 u of M with the same eigenvalue λ. The Markovian relaxation process after β iterations, namely M β , can be represented as: M β = D1/2 U Λβ U T D−1/2 . Therefore, a particle undertaking a random walk with an initial distribution p 0 acquires after β steps a distribution p β given by: p β = M β p 0 . Assuming the graph is connected, as β → ∞, the Markov chain approaches a unique n stationary distribution given by π = diag(D)/ i=1 di , and thus, M ∞ = π1T , where 1 is a n-dim column vector of all ones. Observe that π is an eigenvector of M as it is easy to show that M π = π and the corresponding eigenvalue is 1. Next, we show how to generate hierarchical, successively low-ranked approximations for the transition matrix M . 4 Building a Hierarchy of Transition Matrices The goal is to generate a very fast approximation, while simultaneously achieving sufﬁcient accuracy. For notational ease, we think of M as a ﬁne-scale representation and M as some coarse-scale approximation to be derived here. By coarsening M further, we can generate successive levels of the representation hierarchy. We use the stationary distribution π to construct a corresponding coarse-scale stationary distribution δ. As we just discussed a critical property of the ﬁne scale Markov matrix M is that it is similar to the symmetric matrix L and we wish to preserve this property at every level of the representation hierarchy. 4.1 Deriving Coarse-Scale Stationary Distribution We begin by expressing the stationary distribution π as a probabilistic mixture of latent distributions. In matrix notation, we have (1) π = K δ, where δ is an unknown mixture coefﬁcient vector of length m, K is an n × m non-negative n kernel matrix whose columns are latent distributions that each sum to 1: i=1 Ki,j = 1 and m n. It is easy to derive a maximum likelihood approximation of δ using an EM type algorithm [16]. The main step is to ﬁnd a stationary point δ, λ for the Lagrangian: m n i=1 m Ki,j δj + λ πi ln E≡− j=1 δj − 1 . (2) j=1 An implicit step in this EM procedure is to compute the the ownership probability r i,j of the j th kernel (or node) at the coarse scale for the ith node on the ﬁne scale and is given by ri,j = δj Ki,j . m k=1 δk Ki,k (3) The EM procedure allows for an update of both δ and the latent distributions in the kernel matrix K (see §8.3.1 in [6]). For initialization, δ is taken to be uniform over the coarse-scale states. But in choosing kernels K, we provide a good initialization for the EM procedure. Speciﬁcally, the Markov matrix M is diffused using a small number of iterations to get M β . The diffusion causes random walks from neighboring nodes to be less distinguishable. This in turn helps us select a small number of columns of M β in a fast and greedy way to be the kernel matrix K. We defer the exact details on kernel selection to a later section (§4.3). 4.2 Deriving the Coarse-Scale Transition Matrix In order to deﬁne M , the coarse-scale transition matrix, we break it down into three steps. First, the Markov chain propagation at the coarse scale can be deﬁned as: q k+1 = M q k , (4) where q is the coarse scale probability distribution after k steps of the random walk. Second, we expand q k into the ﬁne scale using the kernels K resulting in a ﬁne scale probability distribution p k : p k = Kq k . (5) k Finally, we lift p k back into the coarse scale by using the ownership probability of the j th kernel for the ith node on the ﬁne grid: n qjk+1 = ri,j pik i=1 (6) Substituting for Eqs.(3) and (5) in Eq. 6 gives n m qjk+1 = i=1 n Ki,t qtk = ri,j t=1 i=1 δj Ki,j m k=1 δk Ki,k m Ki,t qtk . (7) t=1 We can write the preceding equation in a matrix form: q k+1 = diag( δ ) K T diag K δ −1 Kq k . (8) Comparing this with Eq. 4, we can derive the transition matrix M as: M = diag( δ ) K T diag K δ −1 (9) K. It is easy to see that δ = M δ, so δ is the stationary distribution for M . Following the deﬁnition of M , and its stationary distribution δ, we can generate a symmetric coarse scale afﬁnity matrix A given by A = M diag(δ) = diag( δ ) K T diag K δ −1 Kdiag(δ) , (10) where we substitute for the expression M from Eq. 9. The coarse-scale afﬁnity matrix A is then normalized to get: L = D−1/2 AD−1/2 ; D = diag(d1 , d2 , · · · , dm ), (11) where dj is the degree of node j in the coarse-scale graph represented by the matrix A (see §3 for degree deﬁnition). Thus, the coarse scale Markov matrix M is precisely similar to a symmetric matrix L. 4.3 Selecting Kernels For demonstration purpose, we present the kernel selection details on the image of an eye shown below. To begin with, a random walk is deﬁned where each pixel in the test image is associated with a vertex of the graph G. The edges in G are deﬁned by the standard 8-neighbourhood of each pixel. For the demonstrations in this paper, the edge weight ai,j between neighbouring pixels xi and xj is given by a function of the difference in the 2 corresponding intensities I(xi ) and I(xj ): ai,j = exp(−(I(xi ) − I(xj ))2 /2σa ), where σa is set according to the median absolute difference |I(xi ) − I(xj )| between neighbours measured over the entire image. The afﬁnity matrix A with the edge weights is then used to generate a Markov transition matrix M . The kernel selection process we use is fast and greedy. First, the ﬁne scale Markov matrix M is diffused to M β using β = 4. The Markov matrix M is sparse as we make the afﬁnity matrix A sparse. Every column in the diffused matrix M β is a potential kernel. To facilitate the selection process, the second step is to rank order the columns of M β based on a probability value in the stationary distribution π. Third, the kernels (i.e. columns of M β ) are picked in such a way that for a kernel Ki all of the neighbours of pixel i which are within the half-height of the the maximum value in the kernel Ki are suppressed from the selection process. Finally, the kernel selection is continued until every pixel in the image is within a half-height of the peak value of at least one kernel. If M is a full matrix, to avoid the expense of computing M β explicitly, random kernel centers can be selected, and only the corresponding columns of M β need be computed. We show results from a three-scale hierarchy on the eye image (below). The image has 25 × 20 pixels but is shown here enlarged for clarity. At the ﬁrst coarse scale 83 kernels are picked. The kernels each correspond to a different column in the ﬁne scale transition matrix and the pixels giving rise to these kernels are shown numbered on the image. Using these kernels as an initialization, the EM procedure derives a coarse-scale stationary distribution δ 21 14 26 4 (Eq. 2), while simultaneously updating the kernel ma12 27 2 19 trix. Using the newly updated kernel matrix K and the 5 8 13 23 30 18 6 9 derived stationary distribution δ a transition matrix M 28 20 15 32 10 22 is generated (Eq. 9). The coarse scale Markov matrix 24 17 7 is then diffused to M β , again using β = 4. The kernel Coarse Scale 1 Coarse Scale 2 selection algorithm is reapplied, this time picking 32 kernels for the second coarse scale. Larger values of β cause the coarser level to have fewer elements. But the exact number of elements depends on the form of the kernels themselves. For the random experiments that we describe later in §6 we found β = 2 in the ﬁrst iteration and 4 thereafter causes the number of kernels to be reduced by a factor of roughly 1/3 to 1/4 at each level. 72 28 35 44 51 64 82 4 12 31 56 19 77 36 45 52 65 13 57 23 37 5 40 53 63 73 14 29 6 66 38 74 47 24 7 30 41 54 71 78 58 15 8 20 39 48 59 67 25 68 79 21 16 2 11 26 42 49 55 60 75 32 83 43 9 76 50 17 27 61 33 69 80 3 46 18 70 81 34 10 62 22 1 25 11 1 3 16 31 29 At coarser levels of the hierarchy, we expect the kernels to get less sparse and so will the afﬁnity and the transition matrices. In order to promote sparsity at successive levels of the hierarchy we sparsify A by zeroing out elements associated with “small” transition probabilities in M . However, in the experiments described later in §6, we observe this sparsiﬁcation step to be not critical. To summarize, we use the stationary distribution π at the ﬁne-scale to derive a transition matrix M , and its stationary distribution δ, at the coarse-scale. The coarse scale transition in turn helps to derive an afﬁnity matrix A and its normalized version L. It is obvious that this procedure can be repeated recursively. We describe next how to use this representation hierarchy for building a fast eigensolver. 5 Fast EigenSolver Our goal in generating a hierarchical representation of a transition matrix is to develop a fast, specialized eigen solver for spectral clustering. To this end, we perform a full eigen decomposition of the normalized afﬁnity matrix only at the coarsest level. As discussed in the previous section, the afﬁnity matrix at the coarsest level is not likely to be sparse, hence it will need a full (as opposed to a sparse) version of an eigen solver. However it is typically the case that e ≤ m n (even in the case of the three-scale hierarchy that we just considered) and hence we expect this step to be the least expensive computationally. The resulting eigenvectors are interpolated to the next lower level of the hierarchy by a process which will be described next. Because the eigen interpolation process between every adjacent pair of scales in the hierarchy is similar, we will assume we have access to the leading eigenvectors U (size: m × e) for the normalized afﬁnity matrix L (size: m × m) and describe how to generate the leading eigenvectors U (size: n × e), and the leading eigenvalues S (size: e × 1), for the ﬁne-scale normalized afﬁnity matrix L (size: n × n). There are several steps to the eigen interpolation process and in the discussion that follows we refer to the lines in the pseudo-code presented below. First, the coarse-scale eigenvectors U can be interpolated using the kernel matrix K to generate U = K U , an approximation for the ﬁne-scale eigenvectors (line 9). Second, interpolation alone is unlikely to set the directions of U exactly aligned with U L , the vectors one would obtain by a direct eigen decomposition of the ﬁne-scale normalized afﬁnity matrix L. We therefore update the directions in U by applying a small number of power iterations with L, as given in lines 13-15. e e function (U, S) = CoarseToFine(L, K, U , S) 1: INPUT 2: L, K ⇐ {L is n × n and K is n × m where m n} e e e e 3: U /S ⇐ {leading coarse-scale eigenvectors/eigenvalues of L. U is of size m × e, e ≤ m} 4: OUTPUT 5: U, S ⇐ {leading ﬁne-scale eigenvectors/eigenvalues of L. U is n × e and S is e × 1.} x 10 0.4 3 0.96 0.94 0.92 0.9 0.35 2.5 Relative Error Absolute Relative Error 0.98 Eigen Value |δλ|λ−1 −3 Eigen Spectrum 1 2 1.5 1 5 10 15 20 Eigen Index (a) 25 30 0.2 0.15 0.1 0.5 0.88 0.3 0.25 0.05 5 10 15 20 Eigen Index (b) 25 30 5 10 15 20 Eigen Index 25 30 (c) Figure 1: Hierarchical eigensolver results. (a) comparing ground truth eigenvalues S L (red circles) with multi-scale eigensolver spectrum S (blue line) (b) Relative absolute error between eigenvalues: |S−SL | (c) Eigenvector mismatch: 1 − diag |U T UL | , between SL eigenvectors U derived by the multi-scale eigensolver and the ground truth U L . Observe the slight mismatch in the last few eigenvectors, but excellent agreement in the leading eigenvectors (see text). 6: CONSTANTS: TOL = 1e-4; POWER ITERS = 50 7: “ ” e 8: TPI = min POWER ITERS, log(e × eps/TOL)/ log(min(S)) {eps: machine accuracy} e 9: U = K U {interpolation from coarse to ﬁne} 10: while not converged do 11: Uold = U {n × e matrix, e n} 12: for i = 1 to TPI do 13: U ⇐ LU 14: end for 15: U ⇐ Gram-Schmidt(U ) {orthogonalize U } 16: Le = U T LU {L may be sparse, but Le need not be.} 17: Ue Se UeT = svd(Le ) {eigenanalysis of Le , which is of size e × e.} 18: U ⇐ U Ue {update the leading eigenvectors of L} 19: S = diag(Se ) {grab the leading eigenvalues of L} T 20: innerProd = 1 − diag( Uold U ) {1 is a e × 1 vector of all ones} 21: converged = max[abs(innerProd)] < TOL 22: end while The number of power iterations TPI can be bounded as discussed next. Suppose v = U c where U is a matrix of true eigenvectors and c is a coefﬁcient vector for an arbitrary vector v. After TPI power iterations v becomes v = U diag(S TPI )c, where S has the exact eigenvalues. In order for the component of a vector v in the direction Ue (the eth column of U ) not to be swamped by other components, we can limit it’s decay after TPI iterations as TPI follows: (S(e)/S(1)) >= e×eps/TOL, where S(e) is the exact eth eigenvalue, S(1) = 1, eps is the machine precision, TOL is requested accuracy. Because we do not have access to the exact value S(e) at the beginning of the interpolation procedure, we estimate it from the coarse eigenvalues S. This leads to a bound on the power iterations TPI, as derived on the line 9 above. Third, the interpolation process and the power iterations need not preserve orthogonality in the eigenvectors in U . We ﬁx this by Gram-Schmidt orthogonalization procedure (line 16). Finally, there is a still a problem with power iterations that needs to be resolved, in that it is very hard to separate nearby eigenvalues. In particular, for the convergence of the power iterations the ratio that matters is between the (e + 1) st and eth eigenvalues. So the idea we pursue is to use the power iterations only to separate the reduced space of eigenvectors (of dimension e) from the orthogonal subspace (of dimension n − e). We then use a full SVD on the reduced space to update the leading eigenvectors U , and eigenvalues S, for the ﬁne-scale (lines 17-20). This idea is similar to computing the Ritz values and Ritz vectors in a Rayleigh-Ritz method. 6 Interpolation Results Our multi-scale decomposition code is in Matlab. For the direct eigen decomposition, we have used the Matlab program svds.m which invokes the compiled ARPACKC routine [13], with a default convergence tolerance of 1e-10. In Fig. 1a we compare the spectrum S obtained from a three-scale decomposition on the eye image (blue line) with the ground truth, which is the spectrum SL resulting from direct eigen decomposition of the ﬁne-scale normalized afﬁnity matrices L (red circles). There is an excellent agreement in the leading eigenvalues. To illustrate this, we show absolute relative error between the spectra: |S−SL | in Fig. 1b. The spectra agree mostly, except for SL the last few eigenvalues. For a quantitative comparison between the eigenvectors, we plot in Fig. 1c the following measure: 1 − diag(|U T UL |), where U is the matrix of eigenvectors obtained by the multi-scale approximation, UL is the ground-truth resulting from a direct eigen decomposition of the ﬁne-scale afﬁnity matrix L and 1 is a vector of all ones. The relative error plot demonstrates a close match, within the tolerance threshold of 1e-4 that we chose for the multi-scale method, in the leading eigenvector directions between the two methods. The relative error is high with the last few eigen vectors, which suggests that the power iterations have not clearly separated them from other directions. So, the strategy we suggest is to pad the required number of leading eigen basis by about 20% before invoking the multi-scale procedure. Obviously, the number of hierarchical stages for the multi-scale procedure must be chosen such that the transition matrix at the coarsest scale can accommodate the slight increase in the subspace dimensions. For lack of space we are omitting extra results (see Ch.8 in [6]). Next we measure the time the hierarchical eigensolver takes to compute the leading eigenbasis for various input sizes, in comparison with the svds.m procedure [13]. We form images of different input sizes by Gaussian smoothing of i.i.d noise. The Gaussian function has a standard deviation of 3 pixels. The edges in graph G are deﬁned by the standard 8-neighbourhood of each pixel. The edge weights between neighbouring pixels are simply given by a function of the difference in the corresponding intensities (see §4.3). The afﬁnity matrix A with the edge weights is then used to generate a Markov transition matrix M . The fast eigensolver is run on ten different instances of the input image of a given size and the average of these times is reported here. For a fair comparison between the two procedures, we set the convergence tolerance value for the svds.m procedure to be 1e-4, the same as the one used for the fast eigensolver. We found the hierarchical representation derived from this tolerance threshold to be sufﬁciently accurate for a novel min-cut based segmentation results that we reported in [8]. Also, the subspace dimensionality is ﬁxed to be 51 where we expect (and indeed observe) the leading 40 eigenpairs derived from the multi-scale procedure to be accurate. Hence, while invoking svds.m we compute only the leading 41 eigenpairs. In the table shown below, the ﬁrst column corresponds to the number of nodes in the graph, while the second and third columns report the time taken in seconds by the svds.m procedure and the Matlab implementation of the multi-scale eigensolver respectively. The fourth column reports the speedups of the multi-scale eigensolver over svds.m procedure on a standard desktop (Intel P4, 2.5GHz, 1GB RAM). Lowering the tolerance threshold for svds.m made it faster by about 20 − 30%. Despite this, the multi-scale algorithm clearly outperforms the svds.m procedure. The most expensive step in the multi-scale algorithm is the power iteration required in the last stage, that is interpolating eigenvectors from the ﬁrst coarse scale to the required ﬁne scale. The complexity is of the order of n × e where e is the subspace dimensionality and n is the size of the graph. Indeed, from the table we can see that the multi-scale procedure is taking time roughly proportional to n. Deviations from the linear trend are observed at speciﬁc values of n, which we believe are due to the n 322 632 642 652 1002 1272 1282 1292 1602 2552 2562 2572 5112 5122 5132 6002 7002 8002 svds.m 1.6 10.8 20.5 12.6 44.2 91.1 230.9 96.9 179.3 819.2 2170.8 871.7 7977.2 20269 7887.2 10841.4 15048.8 Multi-Scale 1.5 4.9 5.5 5.1 13.1 20.4 35.2 20.9 34.4 90.3 188.7 93.3 458.8 739.3 461.9 644.2 1162.4 1936.6 Speedup 1.1 2.2 3.7 2.5 3.4 4.5 6.6 4.6 5.2 9.1 11.5 9.3 17.4 27.4 17.1 16.8 12.9 variations in the difﬁculty of the speciﬁc eigenvalue problem (eg. nearly multiple eigenvalues). The hierarchical representation has proven to be effective in a min-cut based segmentation algorithm that we proposed recently [8]. Here we explored the use of random walks and associated spectral embedding techniques for the automatic generation of suitable proposal (source and sink) regions for a min-cut based algorithm. The multiscale algorithm was used to generate the 40 leading eigenvectors of large transition matrices (eg. size 20K × 20K). In terms of future work, it will be useful to compare our work with other approximate methods for SVD such as [23]. Ack: We thank S. Roweis, F. Estrada and M. Sakr for valuable comments. References [1] D. Achlioptas and F. McSherry. Fast Computation of Low-Rank Approximations. STOC, 2001. [2] D. Achlioptas et al Sampling Techniques for Kernel Methods. NIPS, 2001. [3] S. Barnard and H. Simon Fast Multilevel Implementation of Recursive Spectral Bisection for Partitioning Unstructured Problems. PPSC, 627-632. [4] M. Belkin et al Laplacian Eigenmaps and Spectral Techniques for Embedding. NIPS, 2001. [5] M. Brand et al A unifying theorem for spectral embedding and clustering. AI & STATS, 2002. [6] C. Chennubhotla. Spectral Methods for Multi-scale Feature Extraction and Spectral Clustering. http://www.cs.toronto.edu/˜chakra/thesis.pdf Ph.D Thesis, Department of Computer Science, University of Toronto, Canada, 2004. [7] C. Chennubhotla and A. Jepson. Half-Lives of EigenFlows for Spectral Clustering. NIPS, 2002. [8] F. Estrada, A. Jepson and C. Chennubhotla. Spectral Embedding and Min-Cut for Image Segmentation. Manuscript Under Review, 2004. [9] C. Fowlkes et al Efﬁcient spatiotemporal grouping using the Nystrom method. CVPR, 2001. [10] S. Belongie et al Spectral Partitioning with Indeﬁnite Kernels using Nystrom app. ECCV, 2002. [11] A. Frieze et al Fast Monte-Carlo Algorithms for ﬁnding low-rank approximations. FOCS, 1998. [12] Y. Koren et al ACE: A Fast Multiscale Eigenvectors Computation for Drawing Huge Graphs IEEE Symp. on InfoVis 2002, pp. 137-144 [13] R. B. Lehoucq, D. C. Sorensen and C. Yang. ARPACK User Guide: Solution of Large Scale Eigenvalue Problems by Implicitly Restarted Arnoldi Methods. SIAM 1998. [14] J. J. Lin. Reduced Rank Approximations of Transition Matrices. AI & STATS, 2002. [15] L. Lova’sz. Random Walks on Graphs: A Survey Combinatorics, 1996, 353–398. [16] G. J. McLachlan et al Mixture Models: Inference and Applications to Clustering. 1988 [17] M. Meila and J. Shi. A random walks view of spectral segmentation. AI & STATS, 2001. [18] A. Ng, M. Jordan and Y. Weiss. On Spectral Clustering: analysis and an algorithm NIPS, 2001. [19] A. Pothen Graph partitioning algorithms with applications to scientiﬁc computing. Parallel Numerical Algorithms, D. E. Keyes et al (eds.), Kluwer Academic Press, 1996. [20] G. L. Scott et al Feature grouping by relocalization of eigenvectors of the proximity matrix. BMVC, pg. 103-108, 1990. [21] E. Sharon et al Fast Multiscale Image Segmentation CVPR, I:70-77, 2000. [22] J. Shi and J. Malik. Normalized cuts and image segmentation. PAMI, August, 2000. [23] H. Simon et al Low-Rank Matrix Approximation Using the Lanczos Bidiagonalization Process with Applications SIAM J. of Sci. Comp. 21(6):2257-2274, 2000. [24] N. Tishby et al Data clustering by Markovian Relaxation NIPS, 2001. [25] C. Williams et al Using the Nystrom method to speed up the kernel machines. NIPS, 2001.

6 0.7304523 178 nips-2004-Support Vector Classification with Input Data Uncertainty

7 0.72888958 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

8 0.72519457 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

9 0.72463882 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

10 0.72170454 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

11 0.72089857 167 nips-2004-Semi-supervised Learning with Penalized Probabilistic Clustering

12 0.72054732 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

13 0.72037506 168 nips-2004-Semigroup Kernels on Finite Sets

14 0.71953404 73 nips-2004-Generative Affine Localisation and Tracking

15 0.7193653 68 nips-2004-Face Detection --- Efficient and Rank Deficient

16 0.71854955 61 nips-2004-Efficient Out-of-Sample Extension of Dominant-Set Clusters

17 0.71831697 131 nips-2004-Non-Local Manifold Tangent Learning

18 0.71788281 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

19 0.71685469 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

20 0.71666646 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition