nips nips2012 nips2012-254 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthew Coudron, Gilad Lerman
Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. [sent-3, score-0.926]
2 This estimator is used in a convex algorithm for robust subspace recovery (i. [sent-4, score-0.421]
3 Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i. [sent-11, score-0.577]
4 5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i. [sent-15, score-0.356]
5 These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. [sent-20, score-0.547]
6 To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. [sent-21, score-0.28]
7 1 Introduction A fundamental problem in probability and statistics is to determine with overwhelming probability the rate of convergence of the empirical covariance (or inverse covariance) of an i. [sent-22, score-0.52]
8 sample of increasing size N to the covariance (or inverse covariance) of the underlying random variable (see e. [sent-25, score-0.448]
9 Clearly, this problem is also closely related to estimating with high probability the sample complexity, that is, the number of samples required to obtain a given error of approximation . [sent-28, score-0.064]
10 In the case of a compactly supported (or even more generally subGaussian) underlying distribution, it is a classical exercise to show that this rate of convergence is O(N −0. [sent-29, score-0.257]
11 The precise estimate for this rate of convergence implies that the sample complexity of covariance estimation is O(D) when using the spectral norm and O(D2 ) when using the Frobenius norm. [sent-34, score-0.606]
12 The rate of convergence and sample complexity of PCA immediately follow from these estimates (see e. [sent-35, score-0.291]
13 That is, direct covariance or inverse covariance estimation and its resulting PCA are very sensitive to outliers. [sent-39, score-0.592]
14 Many robust versions of covariance estimation, PCA and dimension reduction have been developed in the last three decades (see e. [sent-40, score-0.345]
15 In the last few years new convex algorithms with provable guarantees have been suggested for robust subspace recovery and its corresponding dimension reduction [5, 4, 19, 20, 11, 7, 2, 1, 21, 9]. [sent-43, score-0.388]
16 Most of these works minimize a mixture of an 1 -type norm (depending on the application) and the nuclear norm. [sent-44, score-0.088]
17 Their algorithmic complexity is not as competitive as PCA and their sample com1 plexity is hard to estimate due to the problem of extending the nuclear norm out-of-sample. [sent-45, score-0.259]
18 On the other hand, Zhang and Lerman [21] have proposed a novel M-estimator for robust PCA, which is based on a convex relaxation of the sum of Euclidean distances to subspaces (which is originally minimized over the non-convex Grassmannian). [sent-46, score-0.256]
19 This procedure suggests an estimator for a generalized version of the inverse covariance matrix and uses it to robustly recover an underlying low-dimensional subspace. [sent-47, score-0.614]
20 This idea was extended in [9] to obtain an even more accurate method for subspace recovery, though it does not estimate the generalized inverse covariance matrix (in particular, it has no analogous notion of singular values or their inverses). [sent-48, score-0.597]
21 The algorithmic complexity of the algorithms solving the convex formulations of [21] and [9] is comparable to that of full PCA. [sent-49, score-0.114]
22 Here we show that for the setting of sub-Gaussian distributions the sample complexity of the robust PCA algorithm in [21] (or its generalized inverse covariance estimation) is close to that of PCA (or to sample covariance estimation). [sent-50, score-0.959]
23 Our analysis immediately extends to the robust PCA algorithm of [9]. [sent-51, score-0.136]
24 1) as a convex relaxation for the orthoprojectors (from RD to RD ), and defined the following energy function on H (with respect to a data set X in RD ): FX (Q) := Qx , (1. [sent-54, score-0.17]
25 2) x∈X where · denotes the Euclidean norm of a vector in RD . [sent-55, score-0.058]
26 Their generalized empirical inverse covariance is ˆ QX = arg min FX (Q). [sent-56, score-0.406]
27 3) results in a scaled version of the empirical inverse covariance matrix. [sent-60, score-0.364]
28 ˆ It is thus clear why we can refer to QX as a generalized empirical inverse covariance (or 1 -type version of it). [sent-61, score-0.435]
29 We describe the absolute notion of generalized inverse covariance matrix, i. [sent-62, score-0.406]
30 Zhang and Lerman [21] did not emphasize the empirical generalized inverse covariance, but the robust estimate of the underlying low-dimensional subspace by the span of the bottom eigenvectors of this matrix. [sent-66, score-0.606]
31 They rigorously proved that such a procedure robustly recovers the underlying subspace under some conditions. [sent-67, score-0.213]
32 2 Main Result of this Paper ˆ We focus on computing the sample complexity of the estimator QX . [sent-69, score-0.208]
33 sample X to the “generalized equivalent with estimating the rate of convergence of Q inverse covariance” of the underlying distribution µ. [sent-73, score-0.386]
34 We further assume that for some 0 < γ < 1, µ satisfies the following condition, which we refer to as the “two-subspaces criterion” (for γ): For any pair of (D − 1)-dimensional subspaces of RD , L1 and L2 : µ((L1 ∪ L2 )c ) ≥ γ. [sent-77, score-0.059]
35 4) We note that if µ satisfies the two-subspaces criterion for any particular 0 < γ < 1, then its support cannot be a union of two hyperplanes of RD . [sent-79, score-0.052]
36 2 We first formulate the generalized inverse covariance of the underlying measure as follows: ˆ Q = arg min F (Q), (1. [sent-82, score-0.484]
37 If µ is a compactly supported distribution satisfying the two-subspaces criterion for γ > 0, then there exists a constant α0 ≡ α0 (µ, D, ) > 0 such that for any > 0 and N > 2(D −1) the following estimate holds: ≤ 2 −1+ N 2 α0 ≥ 1 − C0 N D exp −N 2 2 D · Rµ ˆ ˆ ˆ ˆ P Q & QN are u. [sent-100, score-0.152]
38 9) Intuitively, α0 represents a lower bound on the directional second derivatives of F . [sent-105, score-0.307]
39 Therefore, α0 should affect sample complexity because the number of random samples taken to approximate a minimum of F should be affected by how sharply F increases about its minimum. [sent-106, score-0.144]
40 1 Implication and Extensions of the Main Result Generalization to Sub-Gaussian Measures We can remove the assumption that the support of µ is bounded (with radius Rµ ) and assume instead that µ is sub-Gaussian. [sent-109, score-0.06]
41 2 Sample Complexity The notion of sample complexity arises in the framework of Probably-Approximately-Correct Learning of Valiant [16]. [sent-114, score-0.144]
42 Generally speaking, the sample complexity in our setting is the minimum number of samples N required, as a function of dimension D, to achieve a good estimation ˆ of Q with high probability. [sent-115, score-0.192]
43 We recall that in this paper we use the Frobenius norm for the estimation error. [sent-116, score-0.138]
44 5 ) and also explain later why it makes sense for the setting of robust subspace recovery. [sent-121, score-0.328]
45 3 To bound the sample complexity we set C1 := 4 · ((4α0 ) ∨ 2) and C2 := 10 · (2α0 + 4((4α0 ) ∨ 2 2)Rµ )/(1−2α0 /(4α0 ) ∨ 2) so that C0 ≤ C1 ·(C2 ·D)D (see (1. [sent-122, score-0.144]
46 1) α0 2 −N 2 − 2 N 2(D−1) (1 − γ)N −2(D−1) ≥ 1 − C1 (C2 · D · N )D exp 2 D · Rµ ≥ 1 − C1 exp log(C2 · D1+η )D2 − D2η − 2 exp (2η(D − 1) log(D) + log(1 − γ)(Dη − 2(D − 1))) . [sent-128, score-0.117]
47 The exact guarantees on error estimation and probability of error can be manipulated by changing the constant hidden in the Ω term. [sent-137, score-0.075]
48 We would like to point out the expected tradeoff between the sample complexity and the rate of convergence. [sent-138, score-0.211]
49 If approaches 0, then the rate of convergence becomes optimal but the sample complexity deteriorates. [sent-139, score-0.291]
50 5, then the sample complexity becomes optimal, but the rate of convergence deteriorates. [sent-141, score-0.291]
51 To motivate our assumption on Rµ , γ and α0 , we recall the needle-haystack and syringe-haystack models of [9] as a prototype for robust subspace recovery. [sent-142, score-0.297]
52 These models assume a mixtures of outlier and inliers components. [sent-143, score-0.179]
53 The distribution of the outliers component is normal N (0, (σout2 /D)ID ) and the distribution of the inliers component is a mixture of N (0, (σin2 /d)PL ) (where L is a dsubspace) and N (0, (σin2 /(CD))ID ), where C 1 (the latter component has coefficient zero in the needle-haystack model). [sent-144, score-0.255]
54 The underlying distribution of the syringe-haystack (or needle-haystack) model is not compactly supported, but clearly sub-Gaussian (as discussed in §2. [sent-145, score-0.11]
55 We also note that γ here is the coefficient of the outlier component in the needle-haystack model, which we denote by ν0 . [sent-148, score-0.145]
56 Indeed, the only non-zero measure that can be contained in a (D-1)dimensional subspace is the measure associated with N (0, (σin2 /d)PL ), and that has total weight √ at most (1 − ν0 ). [sent-149, score-0.129]
57 It is also possible to verify explicitly that α0 is lower bounded by 1/ D in this case (though our argument is currently rather lengthy and will appear in the extended version of this paper). [sent-150, score-0.06]
58 3 From Generalized Covariances to Subspace Recovery We recall that the underlying d-dimensional subspace can be recovered from the bottom d eigenˆ vectors of QN . [sent-152, score-0.238]
59 Therefore, the rate of convergence of the subspace recovery (or its corresponding sample complexity) follows directly from Theorem 1. [sent-153, score-0.398]
60 and Ld , Ld,N are the subspaces spanned by the bottom d eigenvectors ˆ ˆ (i. [sent-165, score-0.154]
61 , with lowest d eigenvalues) of Q and QN respectively, PLd and PLd,N are the orthoprojectors ˆ ˆ ˆ then on these subspaces and νD−d is the (D − d)th eigengap of Q, P 2. [sent-167, score-0.12]
62 2), we can verify robustness to nontrivial noise when recovering L∗ from i. [sent-175, score-0.072]
63 5 Convergence Rate of the REAPER Estimator The REAPER and S-REAPER Algorithms [9] are variants of the robust PCA algorithm of [21]. [sent-180, score-0.136]
64 The d-dimensional subspace can then be recovered by the bottom d eigenvectors of Q (in [9] this minimization is formulated with P = I − Q, whose top d eigenvectors are found). [sent-183, score-0.328]
65 The rate of convergence of the minimizer of FX (Q) over G to the minimizer of F (Q) over G is similar to that in Theorem 1. [sent-184, score-0.299]
66 If the minimizer Q lies on the interior of G then the ˆ is on the boundary of G we must only consider the directional derivatives proof is the same. [sent-188, score-0.436]
67 [12] have analyzed an estimator for sparse inverse covariance. [sent-194, score-0.19]
68 This estimator minimizes over all Q 0 the energy Q, ΣN F − log det(Q) + λN Q 1 , where ΣN is the empirical covariance matrix based on sample of size N , ·, · D inner product (i. [sent-195, score-0.416]
69 5) are both equal to Σ−1 (assuming that the N Sp({xi }N ) = RD so that the inverse empirical covariance exists). [sent-204, score-0.335]
70 5) over all Q 0 is the same up to a multiplicative constant as the minimizer of the RHS of (2. [sent-210, score-0.076]
71 4) (or more precisely its variant in [22]) with the minimization over all Q ∈ H of the energy FX (Q) + λN Q 1 . [sent-214, score-0.085]
72 5 ) we can obtain similar rates of convergence for the minimizer of (2. [sent-218, score-0.156]
73 7) as the one when λN = 0 (see extended version of this paper), namely, rate of convergence of order O(N −0. [sent-219, score-0.207]
74 That is, the minimum sample size when using the Frobenius norm is O(Dη ) for any η > 2. [sent-222, score-0.122]
75 , Assumption 1 in [12]), the minimal sample size is O(log(D)r2 ), where r is the maximum node degree for a graph, whose edges are the nonzero entries of the inverse covariance. [sent-226, score-0.19]
76 4 we explain in short the two basic components of the proof of Theorem 1. [sent-235, score-0.087]
77 The first of them is ˆ ˆ that Q − QN F can be controlled from above by differences of directional derivatives of F . [sent-237, score-0.307]
78 The second component is that the rate of convergence of the derivatives of {FN }∞=1 to the derivative N of F is easily obtained by Hoeffding’s inequality. [sent-238, score-0.334]
79 6 we describe the construction of “nets” of increasing precision; using these nets we conclude the proof of Theorem 1. [sent-243, score-0.179]
80 Throughout this section we only provide the global ideas of the proof, whereas in the extended version of this paper we present the details. [sent-246, score-0.06]
81 1) From Energy Minimizers to Directional Derivatives of Energies ˆ We control the difference Q − Q F from above by differences of derivatives of energies at Q and ˆ ˆ ˆ Q. [sent-261, score-0.174]
82 Here Q is an arbitrary matrix in Br (Q) for some r > 0 (where Br (Q) is the ball in H with ˆ and radius r w. [sent-262, score-0.126]
83 2) Directional derivatives with respect to an element of D may not exist and therefore we use directional derivatives from the right. [sent-274, score-0.447]
84 That is, for Q ∈ H and D ∈ D, the directional derivative (from the right) of F at Q in the direction D is + D F (Q) 3. [sent-275, score-0.167]
85 The proof of this lemma clarifies the existence of α0 , though it does not suggest an explicit approximation for it. [sent-282, score-0.098]
86 5) N −1/2 Convergence of Directional Derivatives We formulate the following convergence rate of the directional derivatives of FN from the right: Theorem 3. [sent-289, score-0.483]
87 1 At this point we can outline the basic intuition behind the proof of Theorem 1. [sent-302, score-0.08]
88 This is because ˆ QN is a function of the samples (random variables) {xi }N , but for our proof to be valid, Q needs i=1 to be fixed before the sampling begins. [sent-326, score-0.053]
89 Therefore, our new goal is to utilize the intuition described above, but modify the proof to make it mathematically sound. [sent-327, score-0.112]
90 Each matrix in each of the nets is determined before the sampling begins, so it can be used in Theorem 3. [sent-329, score-0.157]
91 However, the construction of the nets guarantees that the N th net ˆ ˆ contains a matrix Q which is sufficiently close to QN to be used as a substitute for QN in the above process. [sent-331, score-0.188]
92 6 The Missing Component: Adaptive Nets We describe here a result on the existence of a sequence of nets as suggested in §3. [sent-333, score-0.157]
93 They are constructed in several stages, which cannot fit in here (see careful explanation in the extended version ˆ ˆ of this paper). [sent-335, score-0.06]
94 We recall that B2 (Q) denotes a ball in H with center Q and radius 2 w. [sent-336, score-0.127]
95 14) ˆ ˆ The following lemma shows that we can use SN to guarantee good approximation of Q by QN as long as the differences of partial derivatives are well-controlled for elements of SN (it uses the fixed constants κ and τ for SN ; see Lemma 3. [sent-348, score-0.185]
96 Fast global convergence of gradient methods for high-dimensional statistical recovery. [sent-389, score-0.08]
97 Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. [sent-396, score-0.065]
98 High-dimensional covariance estimation by minimizing 1 -penalized log-determinant divergence. [sent-484, score-0.257]
99 How close is the sample covariance matrix to the actual covariance matrix? [sent-526, score-0.513]
100 Sparse precision matrix estimation via positive definite constrained minimization of 1 penalized d-trace loss. [sent-557, score-0.116]
wordName wordTfidf (topN-words)
[('qn', 0.612), ('covariance', 0.209), ('fn', 0.191), ('lerman', 0.183), ('qx', 0.173), ('directional', 0.167), ('pca', 0.166), ('sn', 0.145), ('derivatives', 0.14), ('robust', 0.136), ('subspace', 0.129), ('nets', 0.126), ('inverse', 0.126), ('fx', 0.118), ('frobenius', 0.104), ('outlier', 0.098), ('reaper', 0.091), ('rhs', 0.09), ('inliers', 0.081), ('convergence', 0.08), ('complexity', 0.08), ('rd', 0.078), ('minimizer', 0.076), ('zhang', 0.075), ('theorem', 0.073), ('generalized', 0.071), ('eigenvectors', 0.067), ('rate', 0.067), ('estimator', 0.064), ('sample', 0.064), ('compactly', 0.061), ('coudron', 0.061), ('orthoprojectors', 0.061), ('pld', 0.061), ('radius', 0.06), ('minimizers', 0.059), ('subspaces', 0.059), ('recovery', 0.058), ('norm', 0.058), ('wiley', 0.057), ('ld', 0.054), ('caramanis', 0.054), ('mccoy', 0.054), ('proof', 0.053), ('criterion', 0.052), ('decays', 0.05), ('underlying', 0.049), ('br', 0.049), ('energy', 0.048), ('estimation', 0.048), ('component', 0.047), ('rothman', 0.047), ('tr', 0.046), ('lemma', 0.045), ('minnesota', 0.044), ('clari', 0.044), ('nontrivial', 0.043), ('ravikumar', 0.04), ('exp', 0.039), ('overwhelming', 0.038), ('negahban', 0.038), ('minimization', 0.037), ('sons', 0.037), ('robustly', 0.035), ('qt', 0.035), ('ball', 0.035), ('supp', 0.034), ('hoeffding', 0.034), ('energies', 0.034), ('explain', 0.034), ('convex', 0.034), ('sp', 0.033), ('outliers', 0.033), ('uniqueness', 0.033), ('arxiv', 0.033), ('mathematically', 0.032), ('recall', 0.032), ('pl', 0.032), ('tangent', 0.032), ('extended', 0.031), ('id', 0.031), ('substitute', 0.031), ('matrix', 0.031), ('suggested', 0.031), ('nuclear', 0.03), ('later', 0.029), ('version', 0.029), ('formulate', 0.029), ('robustness', 0.029), ('bottom', 0.028), ('agarwal', 0.028), ('relaxation', 0.027), ('intuition', 0.027), ('maronna', 0.027), ('plexity', 0.027), ('rem', 0.027), ('weaken', 0.027), ('manipulated', 0.027), ('fxn', 0.027), ('comparability', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 254 nips-2012-On the Sample Complexity of Robust PCA
Author: Matthew Coudron, Gilad Lerman
Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1
2 0.14333223 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
Author: Po-ling Loh, Martin J. Wainwright
Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1
3 0.13804917 16 nips-2012-A Polynomial-time Form of Robust Regression
Author: Ozlem Aslan, Dale Schuurmans, Yao-liang Yu
Abstract: Despite the variety of robust regression methods that have been developed, current regression formulations are either NP-hard, or allow unbounded response to even a single leverage point. We present a general formulation for robust regression—Variational M-estimation—that unifies a number of robust regression methods while allowing a tractable approximation strategy. We develop an estimator that requires only polynomial-time, while achieving certain robustness and consistency guarantees. An experimental evaluation demonstrates the effectiveness of the new estimation approach compared to standard methods. 1
4 0.13754183 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
Author: Benjamin Rolfs, Bala Rajaratnam, Dominique Guillot, Ian Wong, Arian Maleki
Abstract: The 1 -regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing 1 -regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log ε) iteration complexity to reach a tolerance of ε. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned. 1
5 0.12648317 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
Author: S. D. Babacan, Shinichi Nakajima, Minh Do
Abstract: In this paper, we consider the problem of clustering data points into lowdimensional subspaces in the presence of outliers. We pose the problem using a density estimation formulation with an associated generative model. Based on this probability model, we first develop an iterative expectation-maximization (EM) algorithm and then derive its global solution. In addition, we develop two Bayesian methods based on variational Bayesian (VB) approximation, which are capable of automatic dimensionality selection. While the first method is based on an alternating optimization scheme for all unknowns, the second method makes use of recent results in VB matrix factorization leading to fast and effective estimation. Both methods are extended to handle sparse outliers for robustness and can handle missing values. Experimental results suggest that proposed methods are very effective in subspace clustering and identifying outliers. 1
6 0.12274063 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression
7 0.10236045 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation
8 0.099009939 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods
9 0.081139877 232 nips-2012-Multiplicative Forests for Continuous-Time Processes
10 0.079124011 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
11 0.078732572 34 nips-2012-Active Learning of Multi-Index Function Models
12 0.076324187 237 nips-2012-Near-optimal Differentially Private Principal Components
13 0.07459376 234 nips-2012-Multiresolution analysis on the symmetric group
14 0.074373446 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions
15 0.072996929 351 nips-2012-Transelliptical Component Analysis
16 0.072063364 179 nips-2012-Learning Manifolds with K-Means and K-Flats
17 0.072020628 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
18 0.071675941 187 nips-2012-Learning curves for multi-task Gaussian process regression
19 0.069582604 86 nips-2012-Convex Multi-view Subspace Learning
20 0.067356765 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
topicId topicWeight
[(0, 0.196), (1, 0.057), (2, 0.123), (3, -0.086), (4, 0.036), (5, 0.093), (6, -0.012), (7, -0.01), (8, -0.036), (9, -0.067), (10, 0.002), (11, -0.068), (12, -0.058), (13, -0.017), (14, 0.005), (15, 0.028), (16, 0.125), (17, -0.045), (18, 0.038), (19, -0.048), (20, -0.024), (21, 0.032), (22, 0.008), (23, 0.061), (24, -0.019), (25, -0.005), (26, 0.076), (27, -0.053), (28, -0.059), (29, 0.039), (30, 0.005), (31, 0.059), (32, 0.018), (33, -0.087), (34, -0.005), (35, 0.015), (36, -0.052), (37, 0.015), (38, 0.019), (39, -0.06), (40, 0.051), (41, -0.03), (42, -0.006), (43, -0.057), (44, -0.015), (45, 0.015), (46, -0.015), (47, -0.082), (48, -0.031), (49, -0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.95770139 254 nips-2012-On the Sample Complexity of Robust PCA
Author: Matthew Coudron, Gilad Lerman
Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1
2 0.77219403 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA
Author: Shinichi Nakajima, Ryota Tomioka, Masashi Sugiyama, S. D. Babacan
Abstract: The variational Bayesian (VB) approach is one of the best tractable approximations to the Bayesian estimation, and it was demonstrated to perform well in many applications. However, its good performance was not fully understood theoretically. For example, VB sometimes produces a sparse solution, which is regarded as a practical advantage of VB, but such sparsity is hardly observed in the rigorous Bayesian estimation. In this paper, we focus on probabilistic PCA and give more theoretical insight into the empirical success of VB. More specifically, for the situation where the noise variance is unknown, we derive a sufficient condition for perfect recovery of the true PCA dimensionality in the large-scale limit when the size of an observed matrix goes to infinity. In our analysis, we obtain bounds for a noise variance estimator and simple closed-form solutions for other parameters, which themselves are actually very useful for better implementation of VB-PCA. 1
3 0.72613233 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression
Author: Piyush Rai, Abhishek Kumar, Hal Daume
Abstract: Multiple-output regression models require estimating multiple parameters, one for each output. Structural regularization is usually employed to improve parameter estimation in such models. In this paper, we present a multiple-output regression model that leverages the covariance structure of the latent model parameters as well as the conditional covariance structure of the observed outputs. This is in contrast with existing methods that usually take into account only one of these structures. More importantly, unlike some of the other existing methods, none of these structures need be known a priori in our model, and are learned from the data. Several previously proposed structural regularization based multiple-output regression models turn out to be special cases of our model. Moreover, in addition to being a rich model for multiple-output regression, our model can also be used in estimating the graphical model structure of a set of variables (multivariate outputs) conditioned on another set of variables (inputs). Experimental results on both synthetic and real datasets demonstrate the effectiveness of our method. 1
4 0.72317773 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
Author: Arnak Dalalyan, Yin Chen
Abstract: In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish finite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. 1
5 0.71948218 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation
Author: Benjamin Rolfs, Bala Rajaratnam, Dominique Guillot, Ian Wong, Arian Maleki
Abstract: The 1 -regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing 1 -regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log ε) iteration complexity to reach a tolerance of ε. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned. 1
6 0.69113576 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation
7 0.67003566 221 nips-2012-Multi-Stage Multi-Task Feature Learning
8 0.66732287 86 nips-2012-Convex Multi-view Subspace Learning
9 0.64877361 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
10 0.63003683 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
11 0.61920428 34 nips-2012-Active Learning of Multi-Index Function Models
12 0.61469924 16 nips-2012-A Polynomial-time Form of Robust Regression
13 0.61188918 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
14 0.60545528 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization
15 0.599832 211 nips-2012-Meta-Gaussian Information Bottleneck
16 0.59419048 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion
17 0.59312868 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
18 0.59098673 225 nips-2012-Multi-task Vector Field Learning
19 0.58700418 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning
20 0.58672559 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
topicId topicWeight
[(0, 0.03), (14, 0.205), (21, 0.017), (36, 0.012), (38, 0.158), (39, 0.027), (42, 0.057), (54, 0.019), (55, 0.02), (68, 0.01), (74, 0.043), (76, 0.188), (80, 0.09), (92, 0.044)]
simIndex simValue paperId paperTitle
1 0.90576392 269 nips-2012-Persistent Homology for Learning Densities with Bounded Support
Author: Florian T. Pokorny, Hedvig Kjellström, Danica Kragic, Carl Ek
Abstract: We present a novel method for learning densities with bounded support which enables us to incorporate ‘hard’ topological constraints. In particular, we show how emerging techniques from computational algebraic topology and the notion of persistent homology can be combined with kernel-based methods from machine learning for the purpose of density estimation. The proposed formalism facilitates learning of models with bounded support in a principled way, and – by incorporating persistent homology techniques in our approach – we are able to encode algebraic-topological constraints which are not addressed in current state of the art probabilistic models. We study the behaviour of our method on two synthetic examples for various sample sizes and exemplify the benefits of the proposed approach on a real-world dataset by learning a motion model for a race car. We show how to learn a model which respects the underlying topological structure of the racetrack, constraining the trajectories of the car. 1
same-paper 2 0.86975616 254 nips-2012-On the Sample Complexity of Robust PCA
Author: Matthew Coudron, Gilad Lerman
Abstract: We estimate the rate of convergence and sample complexity of a recent robust estimator for a generalized version of the inverse covariance matrix. This estimator is used in a convex algorithm for robust subspace recovery (i.e., robust PCA). Our model assumes a sub-Gaussian underlying distribution and an i.i.d. sample from it. Our main result shows with high probability that the norm of the difference between the generalized inverse covariance of the underlying distribution and its estimator from an i.i.d. sample of size N is of order O(N −0.5+ ) for arbitrarily small > 0 (affecting the probabilistic estimate); this rate of convergence is close to the one of direct covariance estimation, i.e., O(N −0.5 ). Our precise probabilistic estimate implies for some natural settings that the sample complexity of the generalized inverse covariance estimation when using the Frobenius norm is O(D2+δ ) for arbitrarily small δ > 0 (whereas the sample complexity of direct covariance estimation with Frobenius norm is O(D2 )). These results provide similar rates of convergence and sample complexity for the corresponding robust subspace recovery algorithm. To the best of our knowledge, this is the only work analyzing the sample complexity of any robust PCA algorithm. 1
3 0.86168802 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation
Author: Anima Anandkumar, Yi-kai Liu, Daniel J. Hsu, Dean P. Foster, Sham M. Kakade
Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1
4 0.85295397 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation
Author: Aaron Defazio, Tibério S. Caetano
Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.
5 0.78969878 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
Author: Ke Jiang, Brian Kulis, Michael I. Jordan
Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1
6 0.7896567 179 nips-2012-Learning Manifolds with K-Means and K-Flats
7 0.78919125 76 nips-2012-Communication-Efficient Algorithms for Statistical Optimization
8 0.78910869 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions
9 0.78881878 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
10 0.78823882 275 nips-2012-Privacy Aware Learning
11 0.78821141 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation
12 0.78806508 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
13 0.78794843 221 nips-2012-Multi-Stage Multi-Task Feature Learning
14 0.78773123 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
15 0.78735012 294 nips-2012-Repulsive Mixtures
16 0.78724843 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models
17 0.78681552 111 nips-2012-Efficient Sampling for Bipartite Matching Problems
18 0.78679645 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
19 0.78676587 34 nips-2012-Active Learning of Multi-Index Function Models
20 0.78608018 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets