nips nips2013 nips2013-256 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Miaomiao Zhang, P.T. Fletcher
Abstract: Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is defined as a geometric fit to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. [sent-6, score-0.601]
2 Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. [sent-8, score-0.177]
3 We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. [sent-10, score-0.633]
4 Other examples of latent variable models include probabilistic canonical correlation analysis (CCA) [1] and Gaussian process latent variable models [15]. [sent-16, score-0.228]
5 Another important example of manifold data is in shape analysis, where the definition of the shape of an object should not depend on its position, orientation, or scale. [sent-22, score-0.462]
6 The result is a manifold representation of shape, or shape space. [sent-24, score-0.328]
7 Linear operations violate the natural constraints of manifold data, e. [sent-25, score-0.194]
8 , a linear average of data on a sphere results in a vector that does not have unit length. [sent-27, score-0.169]
9 As shown recently [5], using the kernel trick with a Gaussian kernel maps data onto a Hilbert sphere, and utilizing Riemannian distances on this sphere rather than Euclidean distances improves clustering and classification performance. [sent-28, score-0.249]
10 Other examples of manifold data include geometric transformations, such as rotations and affine transforms, symmetric positive-definite tensors [9, 24], Grassmannian manifolds (the set of m-dimensional linear subspaces of Rn ), and Stiefel manifolds (the set of orthonormal m-frames in Rn ) [23]. [sent-29, score-0.737]
11 It’s important to note 1 the distinction between manifold data, where the manifold representation is known a priori, versus manifold learning and nonlinear component analysis [15, 20], where the data lies in Euclidean space on some unknown, lower-dimensional manifold that must be learned. [sent-36, score-0.816]
12 Principal geodesic analysis (PGA) [10] generalizes PCA to nonlinear manifolds. [sent-37, score-0.406]
13 It describes the geometric variability of manifold data by finding lower-dimensional geodesic subspaces that minimize the residual sum-of-squared geodesic distances to the data. [sent-38, score-1.096]
14 Related work on manifold component analysis has introduced variants of PGA. [sent-40, score-0.234]
15 This includes relaxing the constraint that geodesics pass through the mean of the data [11] and, for spherical data, replacing geodesic subspaces with nested spheres of arbitrary radius [13]. [sent-41, score-0.677]
16 , they find subspaces that minimize the sum-of-squared geodesic distances to the data. [sent-44, score-0.468]
17 Much like the original formulation of PCA, current component analysis methods on manifolds lack a probabilistic interpretation. [sent-45, score-0.267]
18 However, due to the lack of an explicit formulation for the normalizing constant, our estimation is limited to symmetric spaces, which include many common manifolds such as Euclidean space, spheres, Kendall shape spaces, Grassman/Stiefel manifolds, and more. [sent-48, score-0.441]
19 Recall that a Riemannian manifold is a differentiable manifold M equipped with a metric g, which is a smoothly varying inner product on the tangent spaces of M . [sent-51, score-0.59]
20 Given two vector fields v, w on M , the covariant derivative v w gives the change of the vector field w in the v direction. [sent-52, score-0.177]
21 The covariant derivative is a generalization of the Euclidean directional derivative to the manifold setting. [sent-53, score-0.515]
22 Given a vector field ˙ V (t) defined along γ, we can define the covariant derivative of V to be DV = γ V . [sent-55, score-0.177]
23 A vector field ˙ dt is called parallel if the covariant derivative along the curve γ is zero. [sent-56, score-0.177]
24 A curve γ is geodesic if it satisfies the equation γ γ = 0. [sent-57, score-0.406]
25 In other words, geodesics are curves with zero acceleration. [sent-58, score-0.176]
26 ˙ ˙ Recall that for any point p ∈ M and tangent vector v ∈ Tp M , the tangent space of M at p, there is a unique geodesic curve γ, with initial conditions γ(0) = p and γ(0) = v. [sent-59, score-0.706]
27 This geodesic is only ˙ guaranteed to exist locally. [sent-60, score-0.406]
28 When γ is defined over the interval [0, 1], the Riemannian exponential map at p is defined as Expp (v) = γ(1). [sent-61, score-0.137]
29 In other words, the exponential map takes a position and velocity as input and returns the point at time 1 along the geodesic with these initial conditions. [sent-62, score-0.599]
30 The exponential map is locally diffeomorphic onto a neighbourhood of p. [sent-63, score-0.198]
31 Then within V (p) the exponential map has an inverse, the Riemannian log map, Logp : V (p) → Tp M . [sent-65, score-0.168]
32 3 Probabilistic Principal Geodesic Analysis Before introducing our PPGA model for manifold data, we first review PPCA. [sent-71, score-0.194]
33 This removes the rotation ambiguity of the latent factors and makes them analogous to the eigenvectors and eigenvalues of standard PCA (there is still of course an ambiguity of the ordering of the factors). [sent-74, score-0.15]
34 1 Probability Model Following [8, 17], we use a generalization of the normal distribution for a Riemannian manifold as our noise model. [sent-77, score-0.235]
35 Consider a random variable y taking values on a Riemannian manifold M , defined by the probability density function (pdf) 1 τ exp − d(µ, y)2 , C(µ, τ ) 2 τ C(µ, τ ) = exp − d(µ, y)2 dy. [sent-78, score-0.417]
36 We note that this noise model could be replaced with a different distribution, perhaps specific to the type of manifold or application, and the inference procedure presented in the next section could be modified accordingly. [sent-82, score-0.194]
37 In this model, a linear combination of W Λ and the latent variables x forms a new tangent vector z ∈ Tµ M . [sent-84, score-0.216]
38 Next, the exponential map shoots the base point µ by z to generate the location parameter of a Riemannian normal distribution, from which the data point y is drawn. [sent-85, score-0.178]
39 Note that in Euclidean space, the exponential map is an addition operation, Exp(µ, z) = µ + z. [sent-86, score-0.137]
40 , yN } on M , with associated latent variable xi ∈ Rq , and zi = W Λxi , we formulate an expectation maximization (EM) algorithm. [sent-93, score-0.224]
41 Denote xij as the jth sample for xi , the Monte Carlo approximation of the Q function is given by N Q(θ|θk ) = Exi |yi ;θk log p(xi |yi ; θk ) ≈ i=1 1 S S N log p(xij |yi ; θk ). [sent-98, score-0.14]
42 The computation of the gradient term xi U (xi ) requires we compute dv Exp(p, v), i. [sent-102, score-0.169]
43 , the derivative operator (Jacobian matrix) of the exponential map with respect to the initial velocity v. [sent-104, score-0.29]
44 To derive this, consider a variation of geodesics c(s, t) = Exp(p, su + tv), where u ∈ Tp M . [sent-105, score-0.248]
45 The variation c produces a “fan” of geodesics; this is illustrated for a sphere on the left side of Figure 1. [sent-106, score-0.209]
46 Finally, this gives an expression for the exponential map derivative as dv Exp(p, v)u = Jv (1). [sent-108, score-0.3]
47 However, Jacobi fields can be evaluated in closed-form for the class of manifolds known as symmetric spaces. [sent-110, score-0.231]
48 For the sphere and Kendall shape space examples, we provide explicit formulas for these computations in Section 4. [sent-111, score-0.303]
49 Now, the gradient with respect to each xi is xi U = xi − τ ΛW T {dzi Exp(µ, zi )† Log(Exp(µ, zi ), yi )}, (7) where † represents the adjoint of a linear operator, i. [sent-113, score-0.383]
50 Gradient for τ : The gradient of the Q function with respect to τ requires evaulation of the derivative of the normalizing constant in the Riemannian normal distribution (2). [sent-120, score-0.241]
51 Thus, the normalizing constant can be written as C(τ ) = τ exp − d(µ, y)2 dy. [sent-122, score-0.146]
52 2 M We can rewrite this integral in normal coordinates, which can be thought of as a polar coordinate system in the tangent space, Tµ M . [sent-123, score-0.223]
53 Now the integral for the normalizing constant becomes R(v) C(τ ) = S n−1 0 τ exp − r2 | det(dφ(rv))|dr dv, 2 (8) where R(v) is the maximum distance that φ(rv) is defined. [sent-128, score-0.178]
54 , un , with u1 = v, such that n √ 1 | det(dφ(rv))| = (9) √ fk ( κk r), κk k=2 4 where κk = K(u1 , uk ) denotes the sectional curvature, and fk is defined as if κk > 0, sin(x) fk (x) = sinh(x) if κk < 0, x if κk = 0. [sent-135, score-0.225]
55 Also, if M is simply connected, then R(v) = R does not depend on the direction v, and we can write the normalizing constant as R C(τ ) = An−1 0 τ exp − r2 2 n −1/2 κk √ fk ( κk r)dr, k=2 where An−1 is the surface area of the n − 1 hypersphere, S n−1 . [sent-137, score-0.201]
56 While this formula works only for simply connected symmetric spaces, other symmetric spaces could be handled by lifting to the universal cover, which is simply connected, or by restricting the definition of the Riemannian normal pdf in (2) to have support only up to the injectivity radius, i. [sent-139, score-0.185]
57 The gradient term for estimating τ is N τQ S = i=1 j=1 1 An−1 C(τ ) R 0 n τ r2 exp − r2 2 2 −1/2 κk k=2 √ 1 fk ( κk r)dr− d(Exp(µ, zij ), yi )2 dr. [sent-142, score-0.343]
58 2 Gradient for µ: From (4) and (5), the gradient term for updating µ is µQ = 1 S S N τ dµ Exp(µ, zij )† Log (Exp(µ, zij ), yi ). [sent-143, score-0.275]
59 Similar to before (6), this derivative can be derived from a variation of geodesics: c(s, t) = Exp(Exp(µ, su), tv(s)), where v(s) comes from parallel translating v along the geodesic Exp(µ, su). [sent-145, score-0.543]
60 Again, the derivative of the exponential map is given by a Jacobi field satisfying Jµ (t) = dc/ds(0, t), and we have dµ Exp(µ, v) = Jµ (1). [sent-146, score-0.234]
61 each ath diagonal element Λa as ∂Q 1 = ∂Λa S N S τ (W a xa )T {dzij Exp(µ, zij )† Log(Exp(µ, zij ), yi )}, ij i=1 j=1 a where W denotes the ath column of W , and xa is the ath component of xij . [sent-150, score-0.437]
62 W is WQ = 1 S N S τ dzij Exp(µ, zij )† Log(Exp(µ, zij ), yi )xT Λ. [sent-154, score-0.26]
63 ij (10) i=1 j=1 To preserve the mutual orthogonality constraint on the columns of W , we represent W as a point on the Stiefel manifold Vq (Tµ M ), i. [sent-155, score-0.194]
64 We project the gradient in (10) onto the tangent space TW Vq (Tµ M ), and then update W by taking a small step along the geodesic in the projected gradient direction. [sent-158, score-0.692]
65 For details on the geodesic computations for Stiefel manifolds, see [7]. [sent-159, score-0.406]
66 The MCEM algorithm for PPGA is an iterative procedure for finding the subspace spanned by q principal components, shown in Algorithm 1. [sent-160, score-0.155]
67 until convergence 4 Experiments In this section, we demonstrate the effectiveness of PPGA and our ML estimation using both simulated data on the 2D sphere and a real corpus callosum data set. [sent-169, score-0.458]
68 Before presenting the experiments of PPGA, we briefly review the necessary computations for the specific types of manifolds used, including, the Riemannian exponential map, log map, and Jacobi fields. [sent-170, score-0.271]
69 1 Simulated Sphere Data Sphere geometry overview: Let p be a point on an n-dimensional sphere embedded in Rn+1 , and let v be a tangent at p. [sent-172, score-0.358]
70 The exponential map is given by a 2D rotation of p by an angle given by the norm of the tangent, i. [sent-174, score-0.195]
71 (11) θ The log map between two points p, q on the sphere can be computed by finding the initial velocity of the rotation between the two points. [sent-177, score-0.396]
72 The adjoint derivatives of the exponential map are given by dp Exp(p, v)† w = cos( v )w⊥ + w , dv Exp(p, v)† w = sin( v ) ⊥ w +w , v where w⊥ , w denote the components of w that are orthogonal and tangent to v, respectively. [sent-181, score-0.423]
73 An illustration of geodesics and the Jacobi fields that give rise to the exponential map derivatives is shown in Figure 1. [sent-182, score-0.313]
74 Parameter estimation on the sphere: Using our generative model for PGA (3), we forward simulated a random sample of 100 data points on the unit sphere S 2 , with known parameters θ = (µ, W, Λ, τ ), shown in Table 1. [sent-183, score-0.225]
75 Figure 1 compares the ground truth principal geodesics and MLE principal geodesic analysis using our algorithm. [sent-187, score-0.961]
76 A good overlap between the first principal geodesic shows that PPGA recovers the model parameters. [sent-188, score-0.561]
77 One advantage that our PPGA model has over the least-squares PGA formulation is that the mean point is estimated jointly with the principal geodesics. [sent-189, score-0.155]
78 In the standard PGA algorithm, the mean is estimated first (using geodesic least-squares), then the principal geodesics are estimated second. [sent-190, score-0.737]
79 The estimation error of principal geodesics turned to be larger in PGA compared to our model. [sent-193, score-0.359]
80 v p J(x) M Figure 1: Left: Jacobi fields; Right: the principal geodesic of random generated data on unit sphere. [sent-226, score-0.561]
81 Next, points in this subspace are deemed equivalent if they are a rotation and scaling of each other, which can be represented as multiplication by a complex number, ρeiθ , where ρ is the scaling factor and θ is the rotation angle. [sent-233, score-0.147]
82 We think of a centered shape p ∈ V as representing the complex line Lp = {z · p : z ∈ C\{0} }, i. [sent-235, score-0.165]
83 A tangent vector at Lp ∈ V is a complex vector, v ∈ V , such that p, v = 0. [sent-238, score-0.181]
84 The exponential map is given by rotating (within V ) the complex line Lp by the initial velocity v, that is, p sin θ · v, θ = v . [sent-239, score-0.278]
85 (13) θ Likewise, the log map between two shapes p, q ∈ V is given by finding the initial velocity of the rotation between the two complex lines Lp and Lq . [sent-240, score-0.258]
86 Then the log map is given by Exp(p, v) = cos θ · p + Log(p, q) = θ · (q − πp (q)) , q − πp (q) θ = arccos | p, q | . [sent-242, score-0.19]
87 The adjoint derivatives of the exponential map are given by ⊥ ⊥ dp Exp(p, v)† w = cos( v )w1 + cos(2 v )w2 + w , sin( v ) ⊥ sin(2 v ) dv Exp(p, v)† w = w1 + + w2 , v 2 v ⊥ ⊥ where w1 denotes the component of w parallel to u1 , i. [sent-249, score-0.287]
88 , w1 = w, u1 u1 , u2 denotes the remaining orthogonal component of w, and w denotes the component tangent to v. [sent-251, score-0.256]
89 7 Shape variability of corpus callosum data: As a demonstration of PPGA on Kendall shape space, we applied it to corpus callosum shape data derived from the OASIS database (www. [sent-252, score-0.734]
90 The corpus callosum was segmented in a midsagittal slice using the ITK SNAP program (www. [sent-256, score-0.261]
91 An example of a segmented corpus callosum in an MRI is shown in Figure 2. [sent-259, score-0.261]
92 Figure 2 displays the first two modes of corpus callosum shape variation, generated from the as points along the estimated principal geodesics: Exp(µ, αi wi ), where αi = −3λi , −1. [sent-266, score-0.522]
93 5λ2 3λ2 Figure 2: Left: example corpus callosum segmentation from an MRI slice. [sent-273, score-0.233]
94 Middle to right: first and second PGA mode of shape variation with −3, −1. [sent-274, score-0.174]
95 Nonparametric bayesian density estimation on manifolds with applications to planar shapes. [sent-293, score-0.213]
96 Principal geodesic analysis on symmetric spaces: statistics of diffusion tensors. [sent-332, score-0.452]
97 Statistics of shape via principal geodesic analysis on Lie groups. [sent-339, score-0.695]
98 Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces. [sent-344, score-0.174]
99 Manifold valued statistics, exact principal geodesic analysis and the effect of linear approximations. [sent-402, score-0.561]
100 Statistical computations on Grassmann and Stiefel manifolds for image and video-based recognition. [sent-416, score-0.185]
wordName wordTfidf (topN-words)
[('geodesic', 0.406), ('pga', 0.36), ('riemannian', 0.346), ('ppga', 0.24), ('manifold', 0.194), ('manifolds', 0.185), ('geodesics', 0.176), ('sphere', 0.169), ('jacobi', 0.159), ('callosum', 0.159), ('principal', 0.155), ('tangent', 0.15), ('shape', 0.134), ('exp', 0.098), ('derivative', 0.097), ('pca', 0.09), ('zij', 0.085), ('map', 0.082), ('ppca', 0.081), ('covariant', 0.08), ('hmc', 0.076), ('corpus', 0.074), ('euclidean', 0.07), ('hamiltonian', 0.069), ('dv', 0.066), ('latent', 0.066), ('stiefel', 0.061), ('fletcher', 0.061), ('rv', 0.061), ('mcem', 0.06), ('sectional', 0.06), ('spheres', 0.06), ('rotation', 0.058), ('velocity', 0.056), ('carlo', 0.056), ('monte', 0.056), ('exponential', 0.055), ('fk', 0.055), ('gradient', 0.055), ('sin', 0.054), ('kendall', 0.053), ('dzi', 0.053), ('jv', 0.053), ('logp', 0.053), ('spaces', 0.052), ('yi', 0.05), ('ath', 0.049), ('normalizing', 0.048), ('xi', 0.048), ('directional', 0.047), ('symmetric', 0.046), ('zi', 0.045), ('lp', 0.045), ('cos', 0.045), ('adjoint', 0.044), ('probabilistic', 0.042), ('projective', 0.042), ('normal', 0.041), ('component', 0.04), ('variation', 0.04), ('jacobian', 0.04), ('tp', 0.04), ('courty', 0.04), ('dzij', 0.04), ('expp', 0.04), ('miaomiao', 0.04), ('mri', 0.039), ('geometry', 0.039), ('maximization', 0.038), ('orthonormal', 0.038), ('truth', 0.036), ('vq', 0.035), ('diffeomorphic', 0.035), ('curvatures', 0.035), ('utah', 0.035), ('subspaces', 0.035), ('eld', 0.035), ('dr', 0.033), ('ground', 0.033), ('arccos', 0.032), ('su', 0.032), ('integral', 0.032), ('log', 0.031), ('complex', 0.031), ('salt', 0.031), ('ascent', 0.03), ('xij', 0.03), ('simulated', 0.028), ('estimation', 0.028), ('segmented', 0.028), ('geometric', 0.028), ('variable', 0.027), ('distances', 0.027), ('tipping', 0.027), ('orthogonal', 0.026), ('factors', 0.026), ('rotations', 0.026), ('onto', 0.026), ('ck', 0.025), ('rn', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 256 nips-2013-Probabilistic Principal Geodesic Analysis
Author: Miaomiao Zhang, P.T. Fletcher
Abstract: Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is defined as a geometric fit to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. 1
2 0.17702644 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
Author: Suvrit Sra, Reshad Hosseini
Abstract: Hermitian positive definite (hpd) matrices recur throughout machine learning, statistics, and optimisation. This paper develops (conic) geometric optimisation on the cone of hpd matrices, which allows us to globally optimise a large class of nonconvex functions of hpd matrices. Specifically, we first use the Riemannian manifold structure of the hpd cone for studying functions that are nonconvex in the Euclidean sense but are geodesically convex (g-convex), hence globally optimisable. We then go beyond g-convexity, and exploit the conic geometry of hpd matrices to identify another class of functions that remain amenable to global optimisation without requiring g-convexity. We present key results that help recognise g-convexity and also the additional structure alluded to above. We illustrate our ideas by applying them to likelihood maximisation for a broad family of elliptically contoured distributions: for this maximisation, we derive novel, parameter free fixed-point algorithms. To our knowledge, ours are the most general results on geometric optimisation of hpd matrices known so far. Experiments show that advantages of using our fixed-point algorithms. 1
3 0.14600147 63 nips-2013-Cluster Trees on Manifolds
Author: Sivaraman Balakrishnan, Srivatsan Narayanan, Alessandro Rinaldo, Aarti Singh, Larry Wasserman
Abstract: unkown-abstract
4 0.12096794 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
5 0.10519239 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex
Author: Sam Patterson, Yee Whye Teh
Abstract: In this paper we investigate the use of Langevin Monte Carlo methods on the probability simplex and propose a new method, Stochastic gradient Riemannian Langevin dynamics, which is simple to implement and can be applied to large scale data. We apply this method to latent Dirichlet allocation in an online minibatch setting, and demonstrate that it achieves substantial performance improvements over the state of the art online variational Bayesian methods. 1
6 0.099257931 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
7 0.093412422 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
8 0.082037434 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
9 0.076178342 224 nips-2013-On the Sample Complexity of Subspace Learning
10 0.075296484 188 nips-2013-Memory Limited, Streaming PCA
11 0.066800892 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
12 0.066253386 232 nips-2013-Online PCA for Contaminated Data
13 0.064599782 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
14 0.06455496 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
15 0.0584356 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
16 0.055933475 91 nips-2013-Dirty Statistical Models
17 0.055000842 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
18 0.0545814 233 nips-2013-Online Robust PCA via Stochastic Optimization
19 0.054074697 75 nips-2013-Convex Two-Layer Modeling
20 0.052253038 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
topicId topicWeight
[(0, 0.162), (1, 0.068), (2, 0.022), (3, 0.029), (4, -0.002), (5, 0.089), (6, 0.016), (7, 0.043), (8, -0.035), (9, -0.032), (10, -0.032), (11, 0.064), (12, 0.013), (13, 0.079), (14, 0.048), (15, 0.054), (16, -0.007), (17, -0.058), (18, -0.036), (19, 0.067), (20, 0.009), (21, 0.066), (22, 0.025), (23, 0.021), (24, 0.059), (25, -0.049), (26, 0.105), (27, 0.032), (28, 0.011), (29, 0.034), (30, -0.028), (31, -0.037), (32, -0.011), (33, -0.01), (34, -0.001), (35, -0.02), (36, 0.029), (37, -0.011), (38, 0.047), (39, 0.028), (40, 0.089), (41, 0.005), (42, -0.122), (43, -0.035), (44, -0.075), (45, -0.021), (46, -0.088), (47, 0.042), (48, 0.036), (49, 0.078)]
simIndex simValue paperId paperTitle
same-paper 1 0.91887707 256 nips-2013-Probabilistic Principal Geodesic Analysis
Author: Miaomiao Zhang, P.T. Fletcher
Abstract: Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is defined as a geometric fit to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. 1
2 0.68578982 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
Author: Suvrit Sra, Reshad Hosseini
Abstract: Hermitian positive definite (hpd) matrices recur throughout machine learning, statistics, and optimisation. This paper develops (conic) geometric optimisation on the cone of hpd matrices, which allows us to globally optimise a large class of nonconvex functions of hpd matrices. Specifically, we first use the Riemannian manifold structure of the hpd cone for studying functions that are nonconvex in the Euclidean sense but are geodesically convex (g-convex), hence globally optimisable. We then go beyond g-convexity, and exploit the conic geometry of hpd matrices to identify another class of functions that remain amenable to global optimisation without requiring g-convexity. We present key results that help recognise g-convexity and also the additional structure alluded to above. We illustrate our ideas by applying them to likelihood maximisation for a broad family of elliptically contoured distributions: for this maximisation, we derive novel, parameter free fixed-point algorithms. To our knowledge, ours are the most general results on geometric optimisation of hpd matrices known so far. Experiments show that advantages of using our fixed-point algorithms. 1
3 0.57853758 167 nips-2013-Learning the Local Statistics of Optical Flow
Author: Dan Rosenbaum, Daniel Zoran, Yair Weiss
Abstract: Motivated by recent progress in natural image statistics, we use newly available datasets with ground truth optical flow to learn the local statistics of optical flow and compare the learned models to prior models assumed by computer vision researchers. We find that a Gaussian mixture model (GMM) with 64 components provides a significantly better model for local flow statistics when compared to commonly used models. We investigate the source of the GMM’s success and show it is related to an explicit representation of flow boundaries. We also learn a model that jointly models the local intensity pattern and the local optical flow. In accordance with the assumptions often made in computer vision, the model learns that flow boundaries are more likely at intensity boundaries. However, when evaluated on a large dataset, this dependency is very weak and the benefit of conditioning flow estimation on the local intensity pattern is marginal. 1
4 0.55543631 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
Author: Francesca Petralia, Joshua T. Vogelstein, David Dunson
Abstract: Nonparametric estimation of the conditional distribution of a response given highdimensional features is a challenging problem. It is important to allow not only the mean but also the variance and shape of the response density to change flexibly with features, which are massive-dimensional. We propose a multiscale dictionary learning model, which expresses the conditional response density as a convex combination of dictionary densities, with the densities used and their weights dependent on the path through a tree decomposition of the feature space. A fast graph partitioning algorithm is applied to obtain the tree decomposition, with Bayesian methods then used to adaptively prune and average over different sub-trees in a soft probabilistic manner. The algorithm scales efficiently to approximately one million features. State of the art predictive performance is demonstrated for toy examples and two neuroscience applications including up to a million features. 1
5 0.54958791 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
6 0.5476141 63 nips-2013-Cluster Trees on Manifolds
7 0.54237628 224 nips-2013-On the Sample Complexity of Subspace Learning
8 0.50454593 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
9 0.49753365 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
10 0.4925459 225 nips-2013-One-shot learning and big data with n=2
11 0.49173424 120 nips-2013-Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
12 0.49093384 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
13 0.49053308 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
14 0.48294297 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex
15 0.48083177 284 nips-2013-Robust Spatial Filtering with Beta Divergence
16 0.47016492 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
17 0.46795306 186 nips-2013-Matrix factorization with binary components
18 0.46120545 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
19 0.45344839 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression
20 0.45038992 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
topicId topicWeight
[(2, 0.015), (16, 0.032), (33, 0.093), (34, 0.582), (41, 0.028), (49, 0.014), (56, 0.054), (70, 0.026), (85, 0.025), (89, 0.019), (93, 0.023), (95, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.96427184 256 nips-2013-Probabilistic Principal Geodesic Analysis
Author: Miaomiao Zhang, P.T. Fletcher
Abstract: Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is defined as a geometric fit to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. 1
2 0.95737809 351 nips-2013-What Are the Invariant Occlusive Components of Image Patches? A Probabilistic Generative Approach
Author: Zhenwen Dai, Georgios Exarchakis, Jörg Lücke
Abstract: We study optimal image encoding based on a generative approach with non-linear feature combinations and explicit position encoding. By far most approaches to unsupervised learning of visual features, such as sparse coding or ICA, account for translations by representing the same features at different positions. Some earlier models used a separate encoding of features and their positions to facilitate invariant data encoding and recognition. All probabilistic generative models with explicit position encoding have so far assumed a linear superposition of components to encode image patches. Here, we for the first time apply a model with non-linear feature superposition and explicit position encoding for patches. By avoiding linear superpositions, the studied model represents a closer match to component occlusions which are ubiquitous in natural images. In order to account for occlusions, the non-linear model encodes patches qualitatively very different from linear models by using component representations separated into mask and feature parameters. We first investigated encodings learned by the model using artificial data with mutually occluding components. We find that the model extracts the components, and that it can correctly identify the occlusive components with the hidden variables of the model. On natural image patches, the model learns component masks and features for typical image components. By using reverse correlation, we estimate the receptive fields associated with the model’s hidden units. We find many Gabor-like or globular receptive fields as well as fields sensitive to more complex structures. Our results show that probabilistic models that capture occlusions and invariances can be trained efficiently on image patches, and that the resulting encoding represents an alternative model for the neural encoding of images in the primary visual cortex. 1
3 0.94858712 210 nips-2013-Noise-Enhanced Associative Memories
Author: Amin Karbasi, Amir Hesam Salavati, Amin Shokrollahi, Lav R. Varshney
Abstract: Recent advances in associative memory design through structured pattern sets and graph-based inference algorithms allow reliable learning and recall of exponential numbers of patterns. Though these designs correct external errors in recall, they assume neurons compute noiselessly, in contrast to highly variable neurons in hippocampus and olfactory cortex. Here we consider associative memories with noisy internal computations and analytically characterize performance. As long as internal noise is less than a specified threshold, error probability in the recall phase can be made exceedingly small. More surprisingly, we show internal noise actually improves performance of the recall phase. Computational experiments lend additional support to our theoretical analysis. This work suggests a functional benefit to noisy neurons in biological neuronal networks. 1
4 0.94282067 129 nips-2013-Generalized Random Utility Models with Multiple Types
Author: Hossein Azari Soufiani, Hansheng Diao, Zhenyu Lai, David C. Parkes
Abstract: We propose a model for demand estimation in multi-agent, differentiated product settings and present an estimation algorithm that uses reversible jump MCMC techniques to classify agents’ types. Our model extends the popular setup in Berry, Levinsohn and Pakes (1995) to allow for the data-driven classification of agents’ types using agent-level data. We focus on applications involving data on agents’ ranking over alternatives, and present theoretical conditions that establish the identifiability of the model and uni-modality of the likelihood/posterior. Results on both real and simulated data provide support for the scalability of our approach. 1
5 0.93731683 202 nips-2013-Multiclass Total Variation Clustering
Author: Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht
Abstract: Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches. 1
6 0.92958087 48 nips-2013-Bayesian Inference and Learning in Gaussian Process State-Space Models with Particle MCMC
7 0.91798282 38 nips-2013-Approximate Dynamic Programming Finally Performs Well in the Game of Tetris
8 0.91794568 122 nips-2013-First-order Decomposition Trees
9 0.89662623 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
10 0.89225143 143 nips-2013-Integrated Non-Factorized Variational Inference
11 0.77676815 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
12 0.7709462 347 nips-2013-Variational Planning for Graph-based MDPs
13 0.7682929 39 nips-2013-Approximate Gaussian process inference for the drift function in stochastic differential equations
14 0.7399025 86 nips-2013-Demixing odors - fast inference in olfaction
15 0.73282933 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex
16 0.72823119 148 nips-2013-Latent Maximum Margin Clustering
17 0.72692615 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents
18 0.72124326 322 nips-2013-Symbolic Opportunistic Policy Iteration for Factored-Action MDPs
19 0.71900433 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
20 0.71788764 317 nips-2013-Streaming Variational Bayes