nips nips2013 nips2013-131 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Suvrit Sra, Reshad Hosseini
Abstract: Hermitian positive definite (hpd) matrices recur throughout machine learning, statistics, and optimisation. This paper develops (conic) geometric optimisation on the cone of hpd matrices, which allows us to globally optimise a large class of nonconvex functions of hpd matrices. Specifically, we first use the Riemannian manifold structure of the hpd cone for studying functions that are nonconvex in the Euclidean sense but are geodesically convex (g-convex), hence globally optimisable. We then go beyond g-convexity, and exploit the conic geometry of hpd matrices to identify another class of functions that remain amenable to global optimisation without requiring g-convexity. We present key results that help recognise g-convexity and also the additional structure alluded to above. We illustrate our ideas by applying them to likelihood maximisation for a broad family of elliptically contoured distributions: for this maximisation, we derive novel, parameter free fixed-point algorithms. To our knowledge, ours are the most general results on geometric optimisation of hpd matrices known so far. Experiments show that advantages of using our fixed-point algorithms. 1
Reference: text
sentIndex sentText sentNum sentScore
1 This paper develops (conic) geometric optimisation on the cone of hpd matrices, which allows us to globally optimise a large class of nonconvex functions of hpd matrices. [sent-2, score-1.509]
2 Specifically, we first use the Riemannian manifold structure of the hpd cone for studying functions that are nonconvex in the Euclidean sense but are geodesically convex (g-convex), hence globally optimisable. [sent-3, score-0.726]
3 We then go beyond g-convexity, and exploit the conic geometry of hpd matrices to identify another class of functions that remain amenable to global optimisation without requiring g-convexity. [sent-4, score-1.089]
4 We present key results that help recognise g-convexity and also the additional structure alluded to above. [sent-5, score-0.172]
5 We illustrate our ideas by applying them to likelihood maximisation for a broad family of elliptically contoured distributions: for this maximisation, we derive novel, parameter free fixed-point algorithms. [sent-6, score-0.304]
6 To our knowledge, ours are the most general results on geometric optimisation of hpd matrices known so far. [sent-7, score-0.971]
7 1 Introduction The geometry of Hermitian positive definite (hpd) matrices is remarkably rich and forms a foundational pillar of modern convex optimisation [21] and of the rapidly evolving area of convex algebraic geometry [4]. [sent-9, score-0.727]
8 The geometry exhibited by hpd matrices, however, goes beyond what is typically exploited in these two areas. [sent-10, score-0.473]
9 In particular, hpd matrices form a convex cone which is also a differentiable Riemannian manifold that is also a CAT(0) space (i. [sent-11, score-0.625]
10 This rich structure enables “geometric optimisation” with hpd matrices, which allows solving many problems that are nonconvex in the Euclidean sense but convex in the manifold sense (see §2 or [29]), or have enough metric structure (see §3) to permit efficient optimisation. [sent-14, score-0.716]
11 This paper develops (conic) geometric optimisation1 (GO) for hpd matrices. [sent-15, score-0.58]
12 We present key results that help recognise geodesic convexity (g-convexity); we also present sufficient conditions that put a class of even non g-convex functions within the grasp of GO. [sent-16, score-0.527]
13 To our knowledge, ours are the most general results on geometric optimisation with hpd matrices known so far. [sent-17, score-0.971]
14 We begin by noting that the widely studied class of geometric programs is ultimately nothing but the 1D version of GO on hpd matrices. [sent-19, score-0.59]
15 Given that geometric programming has enjoyed great success in numerous applications—see e. [sent-20, score-0.18]
16 Our theorems provide a starting point for recognising and constructing numerous problems amenable to geometric optimisation. [sent-25, score-0.233]
17 Another GO problem arises as a subroutine in nearest neighbour search over hpd matrices [12]. [sent-28, score-0.466]
18 Several other areas involve GO problems: statistics (covariance shrinkage) [9], nonlinear matrix equations [17], Markov decision processes and the wider encompassing area of nonlinear Perron-Frobenius theory [18]. [sent-29, score-0.194]
19 We use ECDs as a platform for illustrating our ideas for two reasons: (i) ECDs are important in a variety of settings (see the recent survey [23]); and (ii) they offer an instructive setup for presenting key ideas from the world of geometric optimisation. [sent-31, score-0.261]
20 , the set of d × d symmetric positive definite matrices) is the scatter matrix while ϕ : R → R++ is positive density generating function (dgf). [sent-35, score-0.109]
21 If ECDs have finite covariance matrix, then the scatter matrix is proportional to the covariance matrix [8]. [sent-36, score-0.12]
22 With ϕ(t) = e− 2 , density (1) reduces to the multivariate normal density. [sent-38, score-0.09]
23 For the choice ϕ(t) = tα−d/2 exp −(t/b)β , (2) where α, b and β are fixed positive numbers, density (1) yields the rich class called Kotz-type distributions that are known to have powerful modelling abilities [15; §3. [sent-39, score-0.153]
24 2]; they include as special cases multivariate power exponentials, elliptical gamma, multivariate W-distributions, for instance. [sent-40, score-0.242]
25 Up to constants, the log-likelihood is L(S) = − 1 n log det S + 2 n i=1 log ϕ(xT S −1 xi ). [sent-49, score-0.398]
26 i (3) Equivalently, we may consider the minimisation problem minS 0 Φ(S) := 1 n log det(S) − 2 i log ϕ(xT S −1 xi ). [sent-50, score-0.217]
27 i (4) Problem (4) is in general difficult as Φ may be nonconvex and may have multiple local minima. [sent-51, score-0.086]
28 Some examples already exist in the literature [16, 23, 30]; this paper develops techniques that are strictly more general and subsume previous examples, while advancing the broader idea of geometric optimisation. [sent-54, score-0.259]
29 We illustrate our ideas by studying the following two main classes of dgfs in (1): (i) Geodesically Convex (GC): This class contains functions for which the negative log-likelihood Φ(S) is g-convex, i. [sent-55, score-0.173]
30 , convex along geodesics in the manifold of hpd matrices. [sent-57, score-0.549]
31 Some members of this class have been previously studied (though sometimes without recognising or directly exploiting the g-convexity); (ii) Log-Nonexpansive (LN): This is a new class that we introduce in this paper. [sent-58, score-0.177]
32 It exploits the “non-positive curvature” property of the manifold of hpd matrices. [sent-59, score-0.465]
33 There is a third important class: LC, the class of log-convex dgfs ϕ. [sent-60, score-0.131]
34 Though, since (4) deals with − log ϕ, the optimisation problem is still nonconvex. [sent-61, score-0.431]
35 Each captures unique analytic or geometric structure crucial to efficient optimisation. [sent-64, score-0.148]
36 Class GC characterises the “hidden” convexity found in several instances of (4), while LN is a novel class of models that might not have this hidden convexity, but nevertheless admit global optimisation. [sent-65, score-0.167]
37 The key contributions of this paper are the following: – New results that characterise and help recognise g-convexity (Thm. [sent-67, score-0.123]
38 All technical proofs, and several additional results that help recognise g-convexity are in the longer version of this paper [28]. [sent-73, score-0.094]
39 Here too, our results go beyond ECDs—in fact, they broaden the class of problems that admit fixed-point algorithms in the metric space (Pd , δT )—Thms. [sent-76, score-0.243]
40 Our results on geodesic convexity subsume the more specialised results reported recently in [29]. [sent-78, score-0.356]
41 2 Geometric optimisation with geodesic convexity: class GC Geodesic convexity (g-convexity) is a classical concept in mathematics and is used extensively in the study of Hadamard manifolds and metric spaces of nonpositive curvature [7, 24] (i. [sent-82, score-0.933]
42 This concept has been previously studied in nonlinear optimisation [25], but its full importance and applicability in statistical applications and optimisation is only recently emerging [29, 30]. [sent-85, score-0.771]
43 A set X ⊂ M, where is called geodesically convex if any two points of X are joined by a geodesic lying in X . [sent-89, score-0.333]
44 A function φ : X → R is geodesically convex, if for any x, y ∈ X and a unit speed geodesic γ : [0, 1] → X with γ(0) = x and γ(1) = y, we have φ(γ(t)) ≤ (1 − t)φ(γ(0)) + tφ(γ(1)) = (1 − t)φ(x) + tφ(y). [sent-93, score-0.287]
45 (5) The power of gc functions in the context of solving (4) comes into play because the set Pd (the convex cone of positive definite matrices) is also a differentiable Riemannian manifold where geodesics between points can be computed efficiently. [sent-94, score-0.472]
46 At any point A ∈ Pd , this metric is given by the differential form ds = A−1/2 dAA−1/2 F ; also, between A, B ∈ Pd there is a unique geodesic [1; Thm. [sent-96, score-0.359]
47 (6) The midpoint of this path, namely A#1/2 B is called the matrix geometric mean, which is an object of great interest in numerous areas [1–3, 10, 22]. [sent-100, score-0.261]
48 2 2 We are ready to state our first main theorem, which vastly generalises the above example and provides a foundational tool for recognising and constructing gc functions. [sent-109, score-0.351]
49 For positive definite matrices A, B ∈ Pd and matrices 0 = X ∈ Cd×k we have tr X ∗ (A#t B)X ≤ [tr X ∗ AX]1−t [tr X ∗ BX]t , 3 t ∈ (0, 1). [sent-121, score-0.18]
50 Then the function φ(S) := i=1 ∗ log det( i Xi SXi ) is gc on Pd . [sent-133, score-0.309]
51 Since log det is monotonic, and determinant is multiplicative, the previous inequality yields φ(S#R) = log det Π(S#R) ≤ log det(Π(S)) + log det(Π(R)) = 1 φ(S) + 1 φ(R). [sent-137, score-0.658]
52 Let h : Pk → R be gc function that is nondecreasing in L¨ wner order. [sent-140, score-0.278]
53 Since φ is continuous, it suffices to prove midpoint geodesic convexity. [sent-144, score-0.254]
54 2 2 (10) Since ± log det(S#R) = ± 1 log det(S) + log det(R) , on combining with (10) we obtain 2 1 φ(S#R) ≤ 1 φ(S) + 2 φ(R), 2 as desired. [sent-148, score-0.222]
55 Then, for r ∈ {±1}, φ : Pd → R : S → i h(xT S r xi ) ± log det(S) is gc. [sent-160, score-0.143]
56 1 Application to ECDs in class GC We begin with a straightforward corollary of the above discussion. [sent-162, score-0.101]
57 3 If the log-likelihood is strictly gc then (4) cannot have multiple solutions. [sent-165, score-0.277]
58 Moreover, for any local optimisation method that computes a solution to (4), geodesic convexity ensures that this solution is globally optimal. [sent-166, score-0.68]
59 3 The dgfs of different distributions are brought here for the reader’s convenience. [sent-170, score-0.113]
60 If Φ(S) satisfies the following properties: (i) − log ϕ(t) is lower semi-continuous (lsc) for t > 0, and (ii) Φ(S) → ∞ as S → ∞ or S −1 → ∞, then Φ(S) attains its minimum. [sent-173, score-0.105]
61 It is a well-known result in variational analysis that a function that has bounded lower-level sets in a metric space and is lsc, then the function attains its minimum [26]. [sent-177, score-0.111]
62 Since − log ϕ(t) is lsc and log det(S −1 ) is continuous, Φ(S) is lsc on (Pd , dR ). [sent-178, score-0.318]
63 For these distributions, the function Φ(S) assumes the form K(S) = n 2 log det(S) + ( d − α) 2 n i=1 log xT S −1 xi + i n i=1 xT S −1 xi i b β . [sent-183, score-0.286]
64 Assume that dL is the number of eigenvalues of S that go to ∞ and |X ∩ L| is the number of data that lie in the subspace span by these eigenvalues. [sent-195, score-0.089]
65 Then in the limit when eigenvalues of S go to ∞, K(S) converges to the following limit lim n dL λ→∞ 2 log λ + ( d − α)|X ∩ L| log λ−1 + c 2 Apparently if n dL − ( d − α)|X ∩ L| > 0, then K(S) → ∞ and the proof is complete. [sent-196, score-0.237]
66 As previously mentioned, once existence is ensured, one may use any local optimisation method to minimise (4) to obtain the desired mle. [sent-203, score-0.387]
67 3 Geometric optimisation for class LN Without convexity or g-convexity, in general at best we might obtain local minima. [sent-208, score-0.524]
68 However, as alluded to previously, the set Pd of hpd matrices possesses remarkable geometric structure that allows us to extend global optimisation to a rich class beyond just gc functions. [sent-209, score-1.368]
69 To our knowledge, this class of ECDs was beyond the grasp of previous methods [16, 29, 30]. [sent-210, score-0.109]
70 We say f is log-nonexpansive (LN) on a compact interval I ⊂ R+ if there exists a fixed constant 0 ≤ q ≤ 1 such that | log f (t) − log f (s)| ≤ q| log t − log s|, ∀s, t ∈ I. [sent-214, score-0.296]
71 Finally, if for every s = t it holds that | log f (t) − log f (s)| < | log t − log s|, ∀s, t s = t, we say f is weakly log-contractive (wlc); an important point to note here is the absence of a fixed q. [sent-216, score-0.354]
72 To that end, momentarily ignore the constraint S 0, to see that the first-order necessary optimality condition for (4) is ∂Φ(S) ∂S −1 1 2 nS ⇐⇒ =0 n ϕ (xT S −1 xi ) −1 i S xi xT S −1 T −1 i i=1 ϕ(xi S xi ) + = 0. [sent-218, score-0.207]
73 (15) Defining h ≡ −ϕ /ϕ, condition (15) may be rewritten more compactly as S= n 2 n i=1 xi h(xT S −1 xi )xT = i i T 2 n Xh(DS )X , (16) Diag(xT S −1 xi ), i where DS := and X = [x1 , . [sent-219, score-0.207]
74 This question is in general highly nontrivial to answer because (16) is difficult nonlinear equation in matrix variables. [sent-225, score-0.116]
75 Introduce therefore the nonlinear map G : Pd → Pd that maps S to the right hand side of (16); then, starting with a feasible S0 0, simply perform the iteration Sk+1 ← G(Sk ), k = 0, 1, . [sent-228, score-0.09]
76 , xn ; function h Initialize: k ← 0; S0 ← In while ¬ converged do n −1 2 Sk+1 ← n i=1 xi h(xT Sk xi )xT i i end while return Sk The most interesting twist to analysing iteration (17) is that the map G is usually not contractive with respect to the Euclidean metric. [sent-236, score-0.248]
77 But the metric geometry of Pd alluded to previously suggests that it might be better to analyse the iteration using a non-Euclidean metric. [sent-237, score-0.244]
78 This impasse is broken by selecting a more suitable “hyperbolic distance” that captures the crucial non-Euclidean geometry of Pd , while still respecting its convex conical structure. [sent-239, score-0.095]
79 Such a suitable choice is provided by the Thompson metric—an object of great interest in nonlinear matrix equations [17]—which is known to possess geometric properties suitable for analysing convex cones, of which Pd is a shining example [18]. [sent-240, score-0.342]
80 On Pd , the Thompson metric is given by δT (X, Y ) := log(Y −1/2 XY −1/2 ) , (18) where · is the usual operator 2-norm, and ‘log’ is the matrix logarithm. [sent-241, score-0.109]
81 This theorem should be of wider interest as it enlarges the class of maps that one can study using the Thompson metric. [sent-248, score-0.097]
82 If G is weakly contractive and (16) has a solution, then this solution is unique and iteration (17) converges to it. [sent-269, score-0.138]
83 Let G be nonexpansive in the δT metric, that is δT (G(X), G(Y )) ≤ δT (X, Y ), and F be weakly contractive, that is δT (F(X), F(Y )) < δT (X, Y ), then G + F is also weakly contractive. [sent-273, score-0.151]
84 14 is a striking feature of the nonpositive curvature of Pd ; clearly, such a result does not usually hold in Banach spaces. [sent-275, score-0.127]
85 If h is LN, and S1 = S2 are solutions to the nonlinear equation (16), then iteration (17) converges to a solution, and S1 ∝ S2 . [sent-279, score-0.09]
86 But for the Kotz distribution we can show a stronger result, which helps obtain geometric convergence rates for the fixed-point iteration. [sent-283, score-0.148]
87 The key message here is that our fixed-point iterations solve nonconvex likelihood maximisation problems that involve a complicating hpd constraint. [sent-293, score-0.575]
88 But since the fixed-point iterations always generate hpd iterates, no extra eigenvalue computation is needed, which leads to substantial computational advantages. [sent-294, score-0.396]
89 9 log Running time (seconds) Figure 1: Running times comparison of the fixed-point iteration compared with M ATLAB’s fmincon to maximise a Kotz-likelihood (see text for details). [sent-328, score-0.182]
90 For comparison, the result of constrained optimisation (red curves) using M ATLAB ’ S optimisation toolbox are shown. [sent-367, score-0.752]
91 Additional comparisons in the longer version [28] show that the fixed-point iteration also significantly outperforms sophisticated manifold optimisation techniques [5], especially for increasing data dimensionality. [sent-371, score-0.459]
92 5 Conclusion We developed geometric optimisation for minimising potentially nonconvex functions over the set of positive definite matrices. [sent-372, score-0.631]
93 We showed key results that help recognise geodesic convexity; we also introduced the class of log-nonexpansive functions that contains functions that need not be g-convex, but can still be optimised efficiently. [sent-373, score-0.371]
94 Key to our ideas here was a careful construction of fixed-point iterations in a suitably chosen metric space. [sent-374, score-0.122]
95 We motivated, developed, and applied our results to the task of maximum likelihood estimation for various elliptically contoured distributions, covering classes and examples substantially beyond what had been known so far in the literature. [sent-375, score-0.226]
96 We believe that the general geometric optimisation techniques that we developed in this paper will prove to be of wider use and interest beyond our motivating application. [sent-376, score-0.584]
97 Developing a more extensive geometric optimisation numerical package is part of our ongoing project. [sent-377, score-0.505]
98 Computing the karcher mean of symmetric positive definite matrices. [sent-392, score-0.104]
99 Complex elliptically symmetric distributions: Survey, new results and applications. [sent-525, score-0.104]
100 Conic geometric optimisation on the manifold of positive definite matrices. [sent-558, score-0.614]
wordName wordTfidf (topN-words)
[('hpd', 0.396), ('pd', 0.361), ('optimisation', 0.357), ('gc', 0.235), ('geodesic', 0.202), ('ecds', 0.192), ('det', 0.181), ('geometric', 0.148), ('smin', 0.128), ('convexity', 0.121), ('elliptically', 0.104), ('recognise', 0.094), ('contoured', 0.094), ('xt', 0.094), ('multivariate', 0.09), ('go', 0.089), ('dl', 0.087), ('nonconvex', 0.086), ('dgfs', 0.085), ('geodesically', 0.085), ('lsc', 0.085), ('nonpositive', 0.085), ('recognising', 0.085), ('dr', 0.082), ('metric', 0.08), ('kotz', 0.078), ('ds', 0.077), ('fmincon', 0.075), ('riemannian', 0.074), ('log', 0.074), ('matrices', 0.07), ('manifold', 0.069), ('xi', 0.069), ('atlab', 0.064), ('karcher', 0.064), ('maximisation', 0.064), ('wlc', 0.064), ('elliptical', 0.062), ('weakly', 0.058), ('rr', 0.058), ('nonlinear', 0.057), ('corollary', 0.055), ('conic', 0.054), ('midpoint', 0.052), ('wider', 0.051), ('geometry', 0.049), ('ln', 0.049), ('hermitian', 0.049), ('alluded', 0.049), ('contractive', 0.047), ('xh', 0.047), ('convex', 0.046), ('class', 0.046), ('cd', 0.045), ('cone', 0.044), ('sk', 0.043), ('seconds', 0.043), ('nondecreasing', 0.043), ('contractions', 0.043), ('poin', 0.043), ('sxi', 0.043), ('tehran', 0.043), ('strictly', 0.042), ('curvature', 0.042), ('ideas', 0.042), ('sra', 0.041), ('thompson', 0.04), ('positive', 0.04), ('rich', 0.039), ('toolbox', 0.038), ('bhatia', 0.038), ('ecd', 0.038), ('geodesics', 0.038), ('develops', 0.036), ('grasp', 0.035), ('wiesel', 0.035), ('nonexpansive', 0.035), ('iteration', 0.033), ('subsume', 0.033), ('analyse', 0.033), ('fixed', 0.033), ('running', 0.032), ('ax', 0.032), ('great', 0.032), ('foundational', 0.031), ('bx', 0.031), ('covariance', 0.031), ('attains', 0.031), ('nontrivial', 0.03), ('existence', 0.03), ('imaging', 0.03), ('analysing', 0.03), ('datapoints', 0.03), ('matrix', 0.029), ('key', 0.029), ('lc', 0.029), ('ba', 0.029), ('pk', 0.028), ('beyond', 0.028), ('distributions', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999911 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
Author: Suvrit Sra, Reshad Hosseini
Abstract: Hermitian positive definite (hpd) matrices recur throughout machine learning, statistics, and optimisation. This paper develops (conic) geometric optimisation on the cone of hpd matrices, which allows us to globally optimise a large class of nonconvex functions of hpd matrices. Specifically, we first use the Riemannian manifold structure of the hpd cone for studying functions that are nonconvex in the Euclidean sense but are geodesically convex (g-convex), hence globally optimisable. We then go beyond g-convexity, and exploit the conic geometry of hpd matrices to identify another class of functions that remain amenable to global optimisation without requiring g-convexity. We present key results that help recognise g-convexity and also the additional structure alluded to above. We illustrate our ideas by applying them to likelihood maximisation for a broad family of elliptically contoured distributions: for this maximisation, we derive novel, parameter free fixed-point algorithms. To our knowledge, ours are the most general results on geometric optimisation of hpd matrices known so far. Experiments show that advantages of using our fixed-point algorithms. 1
2 0.17702644 256 nips-2013-Probabilistic Principal Geodesic Analysis
Author: Miaomiao Zhang, P.T. Fletcher
Abstract: Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is defined as a geometric fit to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. 1
Author: Arash Amini, Xuanlong Nguyen
Abstract: We propose a general formalism of iterated random functions with semigroup property, under which exact and approximate Bayesian posterior updates can be viewed as specific instances. A convergence theory for iterated random functions is presented. As an application of the general theory we analyze convergence behaviors of exact and approximate message-passing algorithms that arise in a sequential change point detection problem formulated via a latent variable directed graphical model. The sequential inference algorithm and its supporting theory are illustrated by simulated examples.
4 0.11232115 72 nips-2013-Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses
Author: Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari
Abstract: The design of convex, calibrated surrogate losses, whose minimization entails consistency with respect to a desired target loss, is an important concept to have emerged in the theory of machine learning in recent years. We give an explicit construction of a convex least-squares type surrogate loss that can be designed to be calibrated for any multiclass learning problem for which the target loss matrix has a low-rank structure; the surrogate loss operates on a surrogate target space of dimension at most the rank of the target loss. We use this result to design convex calibrated surrogates for a variety of subset ranking problems, with target losses including the precision@q, expected rank utility, mean average precision, and pairwise disagreement. 1
5 0.10146113 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
Author: Po-Ling Loh, Martin J. Wainwright
Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any first-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1
6 0.076964229 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion
7 0.071530372 63 nips-2013-Cluster Trees on Manifolds
8 0.071295381 269 nips-2013-Regression-tree Tuning in a Streaming Setting
9 0.069883451 224 nips-2013-On the Sample Complexity of Subspace Learning
10 0.06408409 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
11 0.063926198 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
12 0.063417159 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
13 0.061701275 62 nips-2013-Causal Inference on Time Series using Restricted Structural Equation Models
14 0.061364591 91 nips-2013-Dirty Statistical Models
15 0.058694139 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
16 0.057471525 257 nips-2013-Projected Natural Actor-Critic
17 0.056233749 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
18 0.055961877 188 nips-2013-Memory Limited, Streaming PCA
19 0.055874672 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models
20 0.053010061 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
topicId topicWeight
[(0, 0.17), (1, 0.028), (2, 0.064), (3, 0.022), (4, -0.008), (5, 0.072), (6, -0.009), (7, 0.012), (8, -0.032), (9, -0.028), (10, 0.008), (11, -0.014), (12, -0.03), (13, 0.012), (14, 0.018), (15, -0.027), (16, -0.01), (17, -0.011), (18, 0.02), (19, 0.114), (20, 0.005), (21, 0.043), (22, 0.078), (23, 0.038), (24, -0.049), (25, -0.041), (26, -0.009), (27, 0.043), (28, 0.03), (29, 0.02), (30, -0.038), (31, -0.027), (32, -0.084), (33, 0.006), (34, 0.007), (35, -0.065), (36, 0.067), (37, -0.002), (38, 0.131), (39, 0.031), (40, 0.108), (41, -0.066), (42, -0.157), (43, -0.136), (44, -0.023), (45, -0.066), (46, -0.092), (47, 0.014), (48, 0.009), (49, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.9035635 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
Author: Suvrit Sra, Reshad Hosseini
Abstract: Hermitian positive definite (hpd) matrices recur throughout machine learning, statistics, and optimisation. This paper develops (conic) geometric optimisation on the cone of hpd matrices, which allows us to globally optimise a large class of nonconvex functions of hpd matrices. Specifically, we first use the Riemannian manifold structure of the hpd cone for studying functions that are nonconvex in the Euclidean sense but are geodesically convex (g-convex), hence globally optimisable. We then go beyond g-convexity, and exploit the conic geometry of hpd matrices to identify another class of functions that remain amenable to global optimisation without requiring g-convexity. We present key results that help recognise g-convexity and also the additional structure alluded to above. We illustrate our ideas by applying them to likelihood maximisation for a broad family of elliptically contoured distributions: for this maximisation, we derive novel, parameter free fixed-point algorithms. To our knowledge, ours are the most general results on geometric optimisation of hpd matrices known so far. Experiments show that advantages of using our fixed-point algorithms. 1
2 0.67753428 256 nips-2013-Probabilistic Principal Geodesic Analysis
Author: Miaomiao Zhang, P.T. Fletcher
Abstract: Principal geodesic analysis (PGA) is a generalization of principal component analysis (PCA) for dimensionality reduction of data on a Riemannian manifold. Currently PGA is defined as a geometric fit to the data, rather than as a probabilistic model. Inspired by probabilistic PCA, we present a latent variable model for PGA that provides a probabilistic framework for factor analysis on manifolds. To compute maximum likelihood estimates of the parameters in our model, we develop a Monte Carlo Expectation Maximization algorithm, where the expectation is approximated by Hamiltonian Monte Carlo sampling of the latent variables. We demonstrate the ability of our method to recover the ground truth parameters in simulated sphere data, as well as its effectiveness in analyzing shape variability of a corpus callosum data set from human brain images. 1
Author: Arash Amini, Xuanlong Nguyen
Abstract: We propose a general formalism of iterated random functions with semigroup property, under which exact and approximate Bayesian posterior updates can be viewed as specific instances. A convergence theory for iterated random functions is presented. As an application of the general theory we analyze convergence behaviors of exact and approximate message-passing algorithms that arise in a sequential change point detection problem formulated via a latent variable directed graphical model. The sequential inference algorithm and its supporting theory are illustrated by simulated examples.
4 0.61631131 72 nips-2013-Convex Calibrated Surrogates for Low-Rank Loss Matrices with Applications to Subset Ranking Losses
Author: Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari
Abstract: The design of convex, calibrated surrogate losses, whose minimization entails consistency with respect to a desired target loss, is an important concept to have emerged in the theory of machine learning in recent years. We give an explicit construction of a convex least-squares type surrogate loss that can be designed to be calibrated for any multiclass learning problem for which the target loss matrix has a low-rank structure; the surrogate loss operates on a surrogate target space of dimension at most the rank of the target loss. We use this result to design convex calibrated surrogates for a variety of subset ranking problems, with target losses including the precision@q, expected rank utility, mean average precision, and pairwise disagreement. 1
5 0.57333857 31 nips-2013-Adaptivity to Local Smoothness and Dimension in Kernel Regression
Author: Samory Kpotufe, Vikas Garg
Abstract: We present the first result for kernel regression where the procedure adapts locally at a point x to both the unknown local dimension of the metric space X and the unknown H¨ lder-continuity of the regression function at x. The result holds with o high probability simultaneously at all points x in a general metric space X of unknown structure. 1
6 0.5333733 225 nips-2013-One-shot learning and big data with n=2
7 0.52672309 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
8 0.52058536 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
9 0.48938507 63 nips-2013-Cluster Trees on Manifolds
10 0.48007214 269 nips-2013-Regression-tree Tuning in a Streaming Setting
11 0.47654667 247 nips-2013-Phase Retrieval using Alternating Minimization
12 0.47014108 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends
13 0.45533234 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
14 0.45412761 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models
15 0.44720194 296 nips-2013-Sinkhorn Distances: Lightspeed Computation of Optimal Transport
16 0.44644275 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
17 0.44014305 89 nips-2013-Dimension-Free Exponentiated Gradient
18 0.43623877 215 nips-2013-On Decomposing the Proximal Map
19 0.4324936 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
20 0.42737561 212 nips-2013-Non-Uniform Camera Shake Removal Using a Spatially-Adaptive Sparse Penalty
topicId topicWeight
[(16, 0.019), (33, 0.111), (34, 0.103), (41, 0.026), (49, 0.448), (56, 0.097), (70, 0.016), (85, 0.026), (89, 0.042), (93, 0.016), (95, 0.031)]
simIndex simValue paperId paperTitle
1 0.92618614 323 nips-2013-Synthesizing Robust Plans under Incomplete Domain Models
Author: Tuan A. Nguyen, Subbarao Kambhampati, Minh Do
Abstract: Most current planners assume complete domain models and focus on generating correct plans. Unfortunately, domain modeling is a laborious and error-prone task, thus real world agents have to plan with incomplete domain models. While domain experts cannot guarantee completeness, often they are able to circumscribe the incompleteness of the model by providing annotations as to which parts of the domain model may be incomplete. In such cases, the goal should be to synthesize plans that are robust with respect to any known incompleteness of the domain. In this paper, we first introduce annotations expressing the knowledge of the domain incompleteness and formalize the notion of plan robustness with respect to an incomplete domain model. We then show an approach to compiling the problem of finding robust plans to the conformant probabilistic planning problem, and present experimental results with Probabilistic-FF planner. 1
2 0.9107787 6 nips-2013-A Determinantal Point Process Latent Variable Model for Inhibition in Neural Spiking Data
Author: Jasper Snoek, Richard Zemel, Ryan P. Adams
Abstract: Point processes are popular models of neural spiking behavior as they provide a statistical distribution over temporal sequences of spikes and help to reveal the complexities underlying a series of recorded action potentials. However, the most common neural point process models, the Poisson process and the gamma renewal process, do not capture interactions and correlations that are critical to modeling populations of neurons. We develop a novel model based on a determinantal point process over latent embeddings of neurons that effectively captures and helps visualize complex inhibitory and competitive interaction. We show that this model is a natural extension of the popular generalized linear model to sets of interacting neurons. The model is extended to incorporate gain control or divisive normalization, and the modulation of neural spiking based on periodic phenomena. Applied to neural spike recordings from the rat hippocampus, we see that the model captures inhibitory relationships, a dichotomy of classes of neurons, and a periodic modulation by the theta rhythm known to be present in the data. 1
3 0.86874288 274 nips-2013-Relevance Topic Model for Unstructured Social Group Activity Recognition
Author: Fang Zhao, Yongzhen Huang, Liang Wang, Tieniu Tan
Abstract: Unstructured social group activity recognition in web videos is a challenging task due to 1) the semantic gap between class labels and low-level visual features and 2) the lack of labeled training data. To tackle this problem, we propose a “relevance topic model” for jointly learning meaningful mid-level representations upon bagof-words (BoW) video representations and a classifier with sparse weights. In our approach, sparse Bayesian learning is incorporated into an undirected topic model (i.e., Replicated Softmax) to discover topics which are relevant to video classes and suitable for prediction. Rectified linear units are utilized to increase the expressive power of topics so as to explain better video data containing complex contents and make variational inference tractable for the proposed model. An efficient variational EM algorithm is presented for model parameter estimation and inference. Experimental results on the Unstructured Social Activity Attribute dataset show that our model achieves state of the art performance and outperforms other supervised topic model in terms of classification accuracy, particularly in the case of a very small number of labeled training videos. 1
4 0.84907216 266 nips-2013-Recurrent linear models of simultaneously-recorded neural populations
Author: Marius Pachitariu, Biljana Petreska, Maneesh Sahani
Abstract: Population neural recordings with long-range temporal structure are often best understood in terms of a common underlying low-dimensional dynamical process. Advances in recording technology provide access to an ever-larger fraction of the population, but the standard computational approaches available to identify the collective dynamics scale poorly with the size of the dataset. We describe a new, scalable approach to discovering low-dimensional dynamics that underlie simultaneously recorded spike trains from a neural population. We formulate the Recurrent Linear Model (RLM) by generalising the Kalman-filter-based likelihood calculation for latent linear dynamical systems to incorporate a generalised-linear observation process. We show that RLMs describe motor-cortical population data better than either directly-coupled generalised-linear models or latent linear dynamical system models with generalised-linear observations. We also introduce the cascaded generalised-linear model (CGLM) to capture low-dimensional instantaneous correlations in neural populations. The CGLM describes the cortical recordings better than either Ising or Gaussian models and, like the RLM, can be fit exactly and quickly. The CGLM can also be seen as a generalisation of a lowrank Gaussian model, in this case factor analysis. The computational tractability of the RLM and CGLM allow both to scale to very high-dimensional neural data. 1
same-paper 5 0.81186318 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
Author: Suvrit Sra, Reshad Hosseini
Abstract: Hermitian positive definite (hpd) matrices recur throughout machine learning, statistics, and optimisation. This paper develops (conic) geometric optimisation on the cone of hpd matrices, which allows us to globally optimise a large class of nonconvex functions of hpd matrices. Specifically, we first use the Riemannian manifold structure of the hpd cone for studying functions that are nonconvex in the Euclidean sense but are geodesically convex (g-convex), hence globally optimisable. We then go beyond g-convexity, and exploit the conic geometry of hpd matrices to identify another class of functions that remain amenable to global optimisation without requiring g-convexity. We present key results that help recognise g-convexity and also the additional structure alluded to above. We illustrate our ideas by applying them to likelihood maximisation for a broad family of elliptically contoured distributions: for this maximisation, we derive novel, parameter free fixed-point algorithms. To our knowledge, ours are the most general results on geometric optimisation of hpd matrices known so far. Experiments show that advantages of using our fixed-point algorithms. 1
6 0.78757375 70 nips-2013-Contrastive Learning Using Spectral Methods
7 0.76631987 221 nips-2013-On the Expressive Power of Restricted Boltzmann Machines
9 0.6916523 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization
10 0.66336137 121 nips-2013-Firing rate predictions in optimal balanced networks
11 0.62254286 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
12 0.5702787 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit
13 0.56391281 246 nips-2013-Perfect Associative Learning with Spike-Timing-Dependent Plasticity
14 0.55963039 64 nips-2013-Compete to Compute
15 0.53483993 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
17 0.53106439 148 nips-2013-Latent Maximum Margin Clustering
18 0.52935463 208 nips-2013-Neural representation of action sequences: how far can a simple snippet-matching model take us?
19 0.52708924 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
20 0.52365452 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions