nips nips2010 nips2010-231 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi
Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.
Reference: text
sentIndex sentText sentNum sentScore
1 Recent work has considered the setting where each point has a few arbitrarily corrupted components. [sent-9, score-0.164]
2 Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. [sent-10, score-0.391]
3 We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e. [sent-11, score-0.145]
4 , by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. [sent-13, score-0.192]
5 Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. [sent-14, score-0.209]
6 1 Introduction This paper is about the following problem: suppose we are given a large data matrix M , and we know it can be decomposed as M = L0 + C0 , where L0 is a low-rank matrix, and C0 is non-zero in only a fraction of the columns. [sent-16, score-0.195]
7 In particular we do not know the rank (or the row/column space) of L0 , or the number and positions of the non-zero columns of C0 . [sent-18, score-0.19]
8 Can we recover the column-space of the low-rank matrix L0 , and the identities of the non-zero columns of C0 , exactly and efficiently? [sent-19, score-0.439]
9 Using the Singular Value Decomposition (SVD), PCA finds the lower-dimensional approximating subspace by forming a low-rank approximation to the data 1 matrix, formed by considering each point as a column; the output of PCA is the (low-dimensional) column space of this low-rank approximation. [sent-22, score-0.299]
10 , [2–4]) that standard PCA is extremely fragile to the presence of outliers: even a single corrupted point can arbitrarily alter the quality of the approximation. [sent-25, score-0.164]
11 Such non-probabilistic or persistent data corruption may stem from sensor failures, malicious tampering, or the simple fact that some of the available data may not conform to the presumed low-dimensional source / model. [sent-26, score-0.119]
12 In terms of the data matrix, this means that most of the column vectors will lie in a low-dimensional space – and hence the corresponding matrix L0 will be low-rank – while the remaining columns will be outliers – corresponding to the column-sparse matrix C. [sent-27, score-0.832]
13 The natural question in this setting is to ask if we can still (exactly or near-exactly) recover the column space of the uncorrupted points, and the identities of the outliers. [sent-28, score-0.447]
14 Recent years have seen a lot of work on both robust PCA [3, 5–12], and on the use of convex optimization for recovering low-dimensional structure [4, 13–15]. [sent-30, score-0.153]
15 2 Problem Setup The precise PCA with outlier problem that we consider is as follows: we are given n points in pdimensional space. [sent-34, score-0.629]
16 A fraction 1−γ of the points lie on a r-dimensional true subspace of the ambient Rp , while the remaining γn points are arbitrarily located – we call these outliers/corrupted points. [sent-35, score-0.327]
17 We do not have any prior information about the true subspace or its dimension r. [sent-36, score-0.116]
18 Given the set of points, we would like to learn (a) the true subspace and (b) the identities of the outliers. [sent-37, score-0.191]
19 As is common practice, we collate the points into a p × n data matrix M , each of whose columns is one of the points, and each of whose rows is one of the p coordinates. [sent-38, score-0.285]
20 (1) Thus it is clear that the columns of U0 form an orthonormal basis for the r-dimensional true subspace. [sent-42, score-0.205]
21 Also note that at most (1 − γ)n of the columns of L0 are non-zero (the rest correspond to the outliers). [sent-43, score-0.132]
22 C0 is the matrix corresponding to the non-outliers; we will denote the set of non-zero columns of C0 by I0 , with |I0 | = γn. [sent-44, score-0.242]
23 With this notation, out intent is to exactly recover the column space of L0 , and the set of outliers I0 . [sent-46, score-0.575]
24 In this case we are interested in approximate identification of both the true subspace and the outliers. [sent-51, score-0.116]
25 Definition: A matrix L ∈ Rp×n with SVD as in (1), and (1 − γ)n of whose columns are non-zero, is said to be column-incoherent with parameter µ if µr max V ⊤ ei 2 ≤ i (1 − γ)n 2 where {ei } are the coordinate unit vectors. [sent-59, score-0.269]
26 Thus if V has a column aligned with a coordinate axis, then µ = (1 − γ)n/r. [sent-60, score-0.189]
27 A small incoherence parameter µ essentially enforces that the matrix L0 will have column support that is spread out. [sent-66, score-0.414]
28 Indeed, if the left hand side is as big as 1, it essentially means that one of the directions of the column space which we wish to recover, is defined by only a single observation. [sent-68, score-0.218]
29 Given the regime of a constant fraction of arbitrarily chosen and arbitrarily corrupted points, such a setting is not meaningful. [sent-69, score-0.262]
30 Indeed, having a small incoherence µ is an assumption made in all methods based on nuclear norm minimization up to date [4, 15–17]. [sent-70, score-0.237]
31 However, clearly an “outlier” point that lies in the true subspace is a meaningless concept. [sent-72, score-0.116]
32 Thus, in matrix terms, we require that every column of C0 does not lie in the column space of L0 . [sent-73, score-0.544]
33 Other Notation and Preliminaries: Capital letters such as A are used to represent matrices, and accordingly, Ai denotes the ith column vector. [sent-76, score-0.212]
34 ) are reserved for column space, row space and column support respectively. [sent-78, score-0.439]
35 The projection onto the column space, U , is denoted by PU and given by PU (A) = U U ⊤ A, and similarly for the row-space PV (A) = AV V ⊤ . [sent-80, score-0.227]
36 The matrix PI (A) is obtained from A by setting column Ai to zero for all i ∈ I. [sent-81, score-0.299]
37 Five matrix norms are used: A ∗ is the nuclear norm, A is the spectral norm, A 1,2 is the sum of ℓ2 norm of the columns Ai , A ∞,2 is the largest ℓ2 norm of the columns, and A F is the Frobenius norm. [sent-88, score-0.476]
38 e the low-dimensional space the uncorrupted points lie on) and the column support of C0 (i. [sent-96, score-0.391]
39 1 Algorithm ˆ Given data matrix M , our algorithm, called Outlier Pursuit, generates (a) a matrix U , with orthonormal rows, that spans the low-dimensional true subspace we want to recover, and (b) a set of column ˆ indices I corresponding to the outlier points. [sent-103, score-1.149]
40 ˆ Output the set of non-zero columns of C, ˆ addition to gross corruption of some samples, there is additional noise. [sent-110, score-0.182]
41 2 Performance We show that under rather weak assumptions, Outlier Pursuit exactly recovers the column space of the low-rank matrix L0 , and the identities of the non-zero columns of outlier matrix C0 . [sent-114, score-1.302]
42 Suppose we observe M = L0 + C0 , where L0 has rank r and incoherence parameter µ. [sent-117, score-0.141]
43 Any output to Outlier Pursuit recovers the column space exactly, and identifies exactly the indices of columns corresponding to outliers not lying in the recovered column space, as long as the fraction of corrupted points, γ, satisfies γ c1 ≤ , (5) 1−γ µr 9 3 where c1 = 121 . [sent-119, score-1.033]
44 This can be achieved by setting the parameter λ in outlier pursuit to be 7√γn – indeed it holds for any λ in a specific range which we provide below. [sent-120, score-0.915]
45 ˜ For the case where in addition to the corrupted points, we have noisy observations, M = M + W , we have the following result. [sent-121, score-0.194]
46 Then there exists ˜ ˜ ˜ ˜ ˜ ˜ L, C such that M = L + C, L has the correct column space, and C the correct column support, and √ √ ˜ ˜ L′ − L F ≤ 10 nε; C ′ − C F ≤ 9 nε; . [sent-125, score-0.378]
47 If there is no additional structure imposed, beyond what we have stated above, then up to scaling, in the noiseless case, Outlier Pursuit can recover from as many outliers (i. [sent-129, score-0.374]
48 In particular, it is easy to see that if the rank of the matrix L0 is r, and the fraction of outliers satisfies γ ≥ 1/(r + 1), then the problem is not identifiable, i. [sent-132, score-0.464]
49 Each of these algorithms either performs standard PCA on a robust estimate of the covariance matrix, or finds directions that maximize a robust estimate of the variance of the projected data. [sent-142, score-0.244]
50 These algorithms seek to approximately recover the column space, and moreover, no existing approach attempts to identify the set of outliers. [sent-143, score-0.311]
51 This outlier identification, while outside the scope of traditional PCA algorithms, is important in a variety of applications such as finance, bio-informatics, and more. [sent-144, score-0.586]
52 Many existing robust PCA algorithms suffer two pitfalls: performance degradation with dimension increase, and computational intractability. [sent-145, score-0.122]
53 Algorithms based on nuclear norm minimization to recover low rank matrices are now standard, since the seminal paper [14]. [sent-149, score-0.338]
54 Recent work [4,15] has taken the nuclear norm minimization approach to the decomposition of a low-rank matrix and an overall sparse matrix. [sent-150, score-0.325]
55 First, these algorithms fail in our setting as they cannot handle outliers – entire columns where every entry is corrupted. [sent-153, score-0.405]
56 Second, from a technical and proof perspective, all the above works investigate exact signal recovery – the intended outcome is known ahead of time, and one just needs to investigate the conditions needed for success. [sent-154, score-0.116]
57 The oracle problem ensures that the pair (L, C) has the desired column space and column support, respectively. [sent-163, score-0.49]
58 We remark that the aim of the matrix recovery papers [4, 15, 16] was exact recovery of the entire matrix, and thus the optimality conditions required are clear. [sent-166, score-0.316]
59 Since our setup precludes exact recovery of L0 and C0 , 2 our optimality conditions must imply the optimality for Outlier Pursuit of an ˆ ˆ as-of-yet-undetermined pair (L, C), the solution to the oracle problem. [sent-167, score-0.323]
60 Optimality Conditions: We now specify the conditions a candidate optimum needs to satisfy; these arise from the standard subgradient conditions for the norms involved. [sent-169, score-0.12]
61 For any matrix X, define PT ′ (X) := U ′ U ′⊤ X + XV ′ V ′⊤ − U ′ U ′⊤ XV ′ V ′⊤ , the projection of X onto matrices that share the same column space or row space with L′ . [sent-174, score-0.438]
62 Let I ′ be the set of non-zero columns of C ′ , and let H ′ be the column-normalized version of C ′ . [sent-175, score-0.132]
63 C′ That is, column Hi′ = C ′i 2 for all i ∈ I ′ , and Hi′ = 0 for all i ∈ I ′ . [sent-176, score-0.189]
64 Finally, for any matrix X let / i PI ′ (X) denote the matrix with all columns in I ′c set to 0, and the columns in I ′ left as-is. [sent-177, score-0.484]
65 2 ˆ The optimum L of (2) will be non-zero in every column of C0 that is not orthogonal to L0 ’s column space. [sent-178, score-0.44]
66 ℓ2 norm – of the column with the largest magnitude. [sent-186, score-0.269]
67 In particular, recall the SVD of the true L0 = U0 Σ0 V0⊤ and define for any matrix X the projection ⊤ onto the space of all matrices with column space contained in U0 as PU0 (X) := U0 U0 X. [sent-188, score-0.473]
68 Similarly for the column support I0 of the true C0 , define the projection PI0 (X) to be the matrix that results c when all the columns in I0 are set to 0. [sent-189, score-0.536]
69 Let the SVD of L ⊤ ˆˆ matrix V ∈ Rr×n such that U V ⊤ = U0 V , where U0 is the column space of L0 . [sent-197, score-0.328]
70 Consider the (much) simpler case where the corrupted columns are assumed to be orthogonal to ˆ the column space of L0 which we seek to recover. [sent-207, score-0.477]
71 Moreover, considering that the columns of H are either zero, or defined as normalizations of the columns of matrix C (i. [sent-210, score-0.418]
72 In the full version [26] we show in detail how these ideas and the oracle problem, are used to construct the dual certificate Q. [sent-214, score-0.124]
73 The algorithm converges with a rate of O(k −2 ) where k is the number of iterations, and in each iteration, it involves a singular value decomposition and thresholding, therefore, requiring significantly less computational time than interior point methods. [sent-221, score-0.118]
74 For different r and number of outliers γn, we generated matrices A ∈ Rp×r and B ∈ R(n−γn)×r where each entry is an independent N (0, 1) random variable, and then set L∗ := A × B ⊤ (the “clean” part of M ). [sent-224, score-0.316]
75 Outliers, C ∗ ∈ Rγn×p are generated either neutrally, where each entry of C ∗ is iid N (0, 1), or adversarial, where every column is an ˆ ˆ identical copy of a random Gaussian vector. [sent-225, score-0.253]
76 Note that if a lot of outliers span a same direction, it would be difficult to identify whether they are all outliers, or just a new direction of the true space. [sent-227, score-0.309]
77 Indeed, such a setup is order-wise worst, as we proved in the full version [26] a matching lower bound is achieved when all outliers are identical. [sent-228, score-0.287]
78 (a) Random Outlier (b) Identical Outlier (c) Noisy Outlier Detection 1 s=20, random outlier s=20, identical outlier 0. [sent-229, score-1.198]
79 4 s=10, identical outlier s=10, random outlier 0. [sent-235, score-1.198]
80 When outliers are random (easier case) Outlier Pursuit succeeds even when r = 20 with 100 outliers. [sent-249, score-0.28]
81 We then fix r = γn = 5 and examine the outlier identification ability of Outlier Pursuit with noisy observations. [sent-251, score-0.653]
82 We scale each outlier so that the ℓ2 distance of the outlier to the span of true samples equals a pre-determined value s. [sent-252, score-1.207]
83 Each true sample is thus corrupted with a Gaussian random vector with an ℓ2 magnitude σ. [sent-253, score-0.162]
84 We perform (noiseless) Outlier Pursuit on this noisy observation matrix, and ˆ ˆ ˆ claim that the algorithm successfully identifies outliers if for the resulting C matrix, Cj 2 < Ci 2 for all j ∈ I and i ∈ I, i. [sent-254, score-0.333]
85 7 for the random outlier case, Outlier Pursuit correctly identifies the outliers. [sent-259, score-0.586]
86 We further study the case of decomposing M under incomplete observation, which is motivated by robust collaborative filtering: we generate M as before, but only observe each entry with a given probability (independently). [sent-260, score-0.224]
87 Figure 2 shows a very promising result: the successful decomposition rate under incomplete observation is close to the complete observation case even when only 30% of entries are observed. [sent-263, score-0.191]
88 We use the data from [29], and construct the observation matrix M as containing the first 220 samples of digit “1” and the last 11 samples of “7”. [sent-267, score-0.173]
89 Since the columns of digit “1” are not exactly low rank, an exact decomposition 7 (a) 30% entries observed (b) 80% entries observed (c) Success rate vs Observe ratio 1 0. [sent-272, score-0.365]
90 Hence, we use the ℓ2 norm of each column in the resulting C matrix to identify the outliers: a larger ℓ2 norm means that the sample is more likely to be an outlier — essentially, we apply thresholding after C is obtained. [sent-291, score-1.115]
91 Figure 3(a) shows the ℓ2 norm of each column of the resulting C matrix. [sent-292, score-0.269]
92 (one run) 6 4 2 i 2 1 1 0 0 50 100 150 200 4 3 2 3 2 2 3 "7" "1" 5 "1" l norm of C 4 i 5 l norm of C i 6 "7" 5 l norm of C (c) Partial Obs. [sent-300, score-0.24]
93 6 Conclusion and Future Direction This paper considers robust PCA from a matrix decomposition approach, and develops the algorithm Outlier Pursuit. [sent-303, score-0.293]
94 Under some mild conditions, we show that Outlier Pursuit can exactly recover the column support, and exactly identify outliers. [sent-304, score-0.389]
95 This result is new, differing both from results in Robust PCA, and also from results using nuclear-norm approaches for matrix completion and matrix reconstruction. [sent-305, score-0.254]
96 Whenever the recovery concept (in this case, column space) does not uniquely correspond to a single matrix (we believe many, if not most cases of interest, will fall under this description), the use of such a tool will be quite useful. [sent-307, score-0.353]
97 Immediate goals for future work include considering specific applications, in particular, robust collaborative filtering (here, the goal is to decompose a partially observed columncorrupted matrix) and also obtaining tight bounds for outlier identification in the noisy case. [sent-308, score-0.805]
98 Principal component analysis based on robust estimators of the covariance or correlation matrix: Influence functions and efficiencies. [sent-358, score-0.155]
99 Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. [sent-401, score-0.322]
100 Projection-pursuit approach to robust dispersion matrices and principal components: Primary theory and monte carlo. [sent-460, score-0.275]
wordName wordTfidf (topN-words)
[('outlier', 0.586), ('pursuit', 0.329), ('pv', 0.251), ('outliers', 0.235), ('column', 0.189), ('pca', 0.177), ('columns', 0.132), ('pu', 0.13), ('corrupted', 0.127), ('robust', 0.122), ('matrix', 0.11), ('svd', 0.108), ('certi', 0.094), ('pt', 0.084), ('incoherence', 0.083), ('recover', 0.083), ('oracle', 0.083), ('subspace', 0.081), ('norm', 0.08), ('principal', 0.077), ('identities', 0.075), ('nuclear', 0.074), ('uncorrupted', 0.071), ('noisy', 0.067), ('sanghavi', 0.066), ('cate', 0.066), ('cand', 0.063), ('caramanis', 0.063), ('optimum', 0.062), ('fraction', 0.061), ('decomposition', 0.061), ('rank', 0.058), ('identi', 0.057), ('singular', 0.057), ('noiseless', 0.056), ('pi', 0.054), ('recovery', 0.054), ('setup', 0.052), ('corruption', 0.05), ('breakdown', 0.047), ('xu', 0.046), ('succeeds', 0.045), ('authentic', 0.044), ('croux', 0.044), ('devlin', 0.044), ('gnanadesikan', 0.044), ('normalizations', 0.044), ('peeling', 0.044), ('matrices', 0.043), ('points', 0.043), ('dual', 0.041), ('identify', 0.039), ('exactly', 0.039), ('conform', 0.039), ('dtra', 0.039), ('ellipsoidal', 0.039), ('contaminated', 0.039), ('xv', 0.039), ('isometric', 0.039), ('torre', 0.039), ('projection', 0.038), ('texas', 0.038), ('orthonormal', 0.038), ('entry', 0.038), ('arbitrarily', 0.037), ('success', 0.036), ('optimality', 0.036), ('austin', 0.036), ('hi', 0.035), ('true', 0.035), ('completion', 0.034), ('incomplete', 0.034), ('entries', 0.034), ('acknowledge', 0.034), ('component', 0.033), ('dispersion', 0.033), ('exact', 0.033), ('recovers', 0.032), ('support', 0.032), ('digit', 0.032), ('thresholding', 0.031), ('ci', 0.031), ('convex', 0.031), ('observation', 0.031), ('collaborative', 0.03), ('malicious', 0.03), ('space', 0.029), ('conditions', 0.029), ('rr', 0.028), ('rp', 0.028), ('lie', 0.027), ('succeed', 0.027), ('ei', 0.027), ('electrical', 0.026), ('identical', 0.026), ('adversarial', 0.025), ('suppose', 0.024), ('ltering', 0.023), ('notation', 0.023), ('letters', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 231 nips-2010-Robust PCA via Outlier Pursuit
Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi
Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.
2 0.27097338 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu
Abstract: Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of “loss clipping” can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems. 1
3 0.18804176 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan
Abstract: We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm. 1
4 0.10066294 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
Author: Taehwan Kim, Gregory Shakhnarovich, Raquel Urtasun
Abstract: Sparse coding has recently become a popular approach in computer vision to learn dictionaries of natural images. In this paper we extend the sparse coding framework to learn interpretable spatio-temporal primitives. We formulated the problem as a tensor factorization problem with tensor group norm constraints over the primitives, diagonal constraints on the activations that provide interpretability as well as smoothness constraints that are inherent to human motion. We demonstrate the effectiveness of our approach to learn interpretable representations of human motion from motion capture data, and show that our approach outperforms recently developed matching pursuit and sparse coding algorithms. 1
5 0.094783343 217 nips-2010-Probabilistic Multi-Task Feature Selection
Author: Yu Zhang, Dit-Yan Yeung, Qian Xu
Abstract: Recently, some variants of the đ?‘™1 norm, particularly matrix norms such as the đ?‘™1,2 and đ?‘™1,∞ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the đ?‘™1,2 and đ?‘™1,∞ norms by considering a family of đ?‘™1,đ?‘ž norms for 1 < đ?‘ž ≤ ∞ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the đ?‘™1,đ?‘ž norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization (EM) algorithms to learn all model parameters, including đ?‘ž, automatically. Experiments have been conducted on two cancer classiďŹ cation applications using microarray gene expression data. 1
6 0.084950387 162 nips-2010-Link Discovery using Graph Feature Tracking
7 0.084425777 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
8 0.082218133 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
9 0.081934512 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
10 0.080231227 265 nips-2010-The LASSO risk: asymptotic results and real world examples
11 0.075150073 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
12 0.075054809 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
13 0.072967827 148 nips-2010-Learning Networks of Stochastic Differential Equations
14 0.072701558 5 nips-2010-A Dirty Model for Multi-task Learning
15 0.071981229 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
16 0.07180886 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA
17 0.068274543 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
18 0.067335412 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
19 0.066757105 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
20 0.062882133 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
topicId topicWeight
[(0, 0.174), (1, 0.039), (2, 0.103), (3, 0.063), (4, 0.083), (5, -0.09), (6, -0.001), (7, 0.06), (8, -0.105), (9, 0.003), (10, 0.077), (11, 0.007), (12, 0.13), (13, 0.064), (14, 0.096), (15, 0.038), (16, 0.03), (17, 0.031), (18, 0.044), (19, 0.107), (20, 0.12), (21, 0.069), (22, 0.021), (23, -0.088), (24, 0.058), (25, -0.083), (26, 0.095), (27, 0.039), (28, 0.078), (29, 0.042), (30, -0.098), (31, 0.092), (32, -0.132), (33, 0.01), (34, 0.139), (35, -0.012), (36, -0.24), (37, 0.031), (38, -0.056), (39, -0.077), (40, 0.012), (41, 0.024), (42, -0.056), (43, -0.009), (44, -0.16), (45, -0.01), (46, 0.006), (47, -0.065), (48, 0.213), (49, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.95520312 231 nips-2010-Robust PCA via Outlier Pursuit
Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi
Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.
2 0.69243443 219 nips-2010-Random Conic Pursuit for Semidefinite Programming
Author: Ariel Kleiner, Ali Rahimi, Michael I. Jordan
Abstract: We present a novel algorithm, Random Conic Pursuit, that solves semidefinite programs (SDPs) via repeated optimization over randomly selected two-dimensional subcones of the PSD cone. This scheme is simple, easily implemented, applicable to very general SDPs, scalable, and theoretically interesting. Its advantages are realized at the expense of an ability to readily compute highly exact solutions, though useful approximate solutions are easily obtained. This property renders Random Conic Pursuit of particular interest for machine learning applications, in which the relevant SDPs are generally based upon random data and so exact minima are often not a priority. Indeed, we present empirical results to this effect for various SDPs encountered in machine learning; these experiments demonstrate the potential practical usefulness of Random Conic Pursuit. We also provide a preliminary analysis that yields insight into the theoretical properties and convergence of the algorithm. 1
3 0.62345266 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
Author: Min Yang, Linli Xu, Martha White, Dale Schuurmans, Yao-liang Yu
Abstract: Robust regression and classification are often thought to require non-convex loss functions that prevent scalable, global training. However, such a view neglects the possibility of reformulated training methods that can yield practically solvable alternatives. A natural way to make a loss function more robust to outliers is to truncate loss values that exceed a maximum threshold. We demonstrate that a relaxation of this form of “loss clipping” can be made globally solvable and applicable to any standard loss while guaranteeing robustness against outliers. We present a generic procedure that can be applied to standard loss functions and demonstrate improved robustness in regression and classification problems. 1
4 0.6108346 136 nips-2010-Large-Scale Matrix Factorization with Missing Data under Additional Constraints
Author: Kaushik Mitra, Sameer Sheorey, Rama Chellappa
Abstract: Matrix factorization in the presence of missing data is at the core of many computer vision problems such as structure from motion (SfM), non-rigid SfM and photometric stereo. We formulate the problem of matrix factorization with missing data as a low-rank semidefinite program (LRSDP) with the advantage that: 1) an efficient quasi-Newton implementation of the LRSDP enables us to solve large-scale factorization problems, and 2) additional constraints such as orthonormality, required in orthographic SfM, can be directly incorporated in the new formulation. Our empirical evaluations suggest that, under the conditions of matrix completion theory, the proposed algorithm finds the optimal solution, and also requires fewer observations compared to the current state-of-the-art algorithms. We further demonstrate the effectiveness of the proposed algorithm in solving the affine SfM problem, non-rigid SfM and photometric stereo problems.
5 0.51198792 110 nips-2010-Guaranteed Rank Minimization via Singular Value Projection
Author: Prateek Jain, Raghu Meka, Inderjit S. Dhillon
Abstract: Minimizing the rank of a matrix subject to affine constraints is a fundamental problem with many important applications in machine learning and statistics. In this paper we propose a simple and fast algorithm SVP (Singular Value Projection) for rank minimization under affine constraints (ARMP) and show that SVP recovers the minimum rank solution for affine constraints that satisfy a restricted isometry property (RIP). Our method guarantees geometric convergence rate even in the presence of noise and requires strictly weaker assumptions on the RIP constants than the existing methods. We also introduce a Newton-step for our SVP framework to speed-up the convergence with substantial empirical gains. Next, we address a practically important application of ARMP - the problem of lowrank matrix completion, for which the defining affine constraints do not directly obey RIP, hence the guarantees of SVP do not hold. However, we provide partial progress towards a proof of exact recovery for our algorithm by showing a more restricted isometry property and observe empirically that our algorithm recovers low-rank incoherent matrices from an almost optimal number of uniformly sampled entries. We also demonstrate empirically that our algorithms outperform existing methods, such as those of [5, 18, 14], for ARMP and the matrix completion problem by an order of magnitude and are also more robust to noise and sampling schemes. In particular, results show that our SVP-Newton method is significantly robust to noise and performs impressively on a more realistic power-law sampling scheme for the matrix completion problem. 1
6 0.50584519 45 nips-2010-CUR from a Sparse Optimization Viewpoint
8 0.43483058 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives
9 0.43288246 48 nips-2010-Collaborative Filtering in a Non-Uniform World: Learning with the Weighted Trace Norm
10 0.42388868 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
11 0.41223258 217 nips-2010-Probabilistic Multi-Task Feature Selection
12 0.41172162 287 nips-2010-Worst-Case Linear Discriminant Analysis
13 0.39713037 265 nips-2010-The LASSO risk: asymptotic results and real world examples
14 0.39617354 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
15 0.38882414 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
16 0.38539413 195 nips-2010-Online Learning in The Manifold of Low-Rank Matrices
17 0.38079304 275 nips-2010-Transduction with Matrix Completion: Three Birds with One Stone
18 0.37299421 172 nips-2010-Multi-Stage Dantzig Selector
19 0.36204302 162 nips-2010-Link Discovery using Graph Feature Tracking
20 0.35917896 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
topicId topicWeight
[(13, 0.081), (17, 0.015), (27, 0.042), (30, 0.055), (35, 0.018), (45, 0.136), (50, 0.024), (52, 0.463), (60, 0.027), (77, 0.036), (90, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.88353038 231 nips-2010-Robust PCA via Outlier Pursuit
Author: Huan Xu, Constantine Caramanis, Sujay Sanghavi
Abstract: Singular Value Decomposition (and Principal Component Analysis) is one of the most widely used techniques for dimensionality reduction: successful and efficiently computable, it is nevertheless plagued by a well-known, well-documented sensitivity to outliers. Recent work has considered the setting where each point has a few arbitrarily corrupted components. Yet, in applications of SVD or PCA such as robust collaborative filtering or bioinformatics, malicious agents, defective genes, or simply corrupted or contaminated experiments may effectively yield entire points that are completely corrupted. We present an efficient convex optimization-based algorithm we call Outlier Pursuit, that under some mild assumptions on the uncorrupted points (satisfied, e.g., by the standard generative assumption in PCA problems) recovers the exact optimal low-dimensional subspace, and identifies the corrupted points. Such identification of corrupted points that do not conform to the low-dimensional approximation, is of paramount interest in bioinformatics and financial applications, and beyond. Our techniques involve matrix decomposition using nuclear norm minimization, however, our results, setup, and approach, necessarily differ considerably from the existing line of work in matrix completion and matrix decomposition, since we develop an approach to recover the correct column space of the uncorrupted matrix, rather than the exact matrix itself.
Author: Felipe Gerhard, Wulfram Gerstner
Abstract: Generalized Linear Models (GLMs) are an increasingly popular framework for modeling neural spike trains. They have been linked to the theory of stochastic point processes and researchers have used this relation to assess goodness-of-fit using methods from point-process theory, e.g. the time-rescaling theorem. However, high neural firing rates or coarse discretization lead to a breakdown of the assumptions necessary for this connection. Here, we show how goodness-of-fit tests from point-process theory can still be applied to GLMs by constructing equivalent surrogate point processes out of time-series observations. Furthermore, two additional tests based on thinning and complementing point processes are introduced. They augment the instruments available for checking model adequacy of point processes as well as discretized models. 1
3 0.85117298 279 nips-2010-Universal Kernels on Non-Standard Input Spaces
Author: Andreas Christmann, Ingo Steinwart
Abstract: During the last years support vector machines (SVMs) have been successfully applied in situations where the input space X is not necessarily a subset of Rd . Examples include SVMs for the analysis of histograms or colored images, SVMs for text classiÄ?Ĺš cation and web mining, and SVMs for applications from computational biology using, e.g., kernels for trees and graphs. Moreover, SVMs are known to be consistent to the Bayes risk, if either the input space is a complete separable metric space and the reproducing kernel Hilbert space (RKHS) H ⊂ Lp (PX ) is dense, or if the SVM uses a universal kernel k. So far, however, there are no kernels of practical interest known that satisfy these assumptions, if X ⊂ Rd . We close this gap by providing a general technique based on Taylor-type kernels to explicitly construct universal kernels on compact metric spaces which are not subset of Rd . We apply this technique for the following special cases: universal kernels on the set of probability measures, universal kernels based on Fourier transforms, and universal kernels for signal processing. 1
4 0.80663466 65 nips-2010-Divisive Normalization: Justification and Effectiveness as Efficient Coding Transform
Author: Siwei Lyu
Abstract: Divisive normalization (DN) has been advocated as an effective nonlinear efficient coding transform for natural sensory signals with applications in biology and engineering. In this work, we aim to establish a connection between the DN transform and the statistical properties of natural sensory signals. Our analysis is based on the use of multivariate t model to capture some important statistical properties of natural sensory signals. The multivariate t model justifies DN as an approximation to the transform that completely eliminates its statistical dependency. Furthermore, using the multivariate t model and measuring statistical dependency with multi-information, we can precisely quantify the statistical dependency that is reduced by the DN transform. We compare this with the actual performance of the DN transform in reducing statistical dependencies of natural sensory signals. Our theoretical analysis and quantitative evaluations confirm DN as an effective efficient coding transform for natural sensory signals. On the other hand, we also observe a previously unreported phenomenon that DN may increase statistical dependencies when the size of pooling is small. 1
5 0.77461761 140 nips-2010-Layer-wise analysis of deep networks with Gaussian kernels
Author: Grégoire Montavon, Klaus-Robert Müller, Mikio L. Braun
Abstract: Deep networks can potentially express a learning problem more efficiently than local learning machines. While deep networks outperform local learning machines on some problems, it is still unclear how their nice representation emerges from their complex structure. We present an analysis based on Gaussian kernels that measures how the representation of the learning problem evolves layer after layer as the deep network builds higher-level abstract representations of the input. We use this analysis to show empirically that deep networks build progressively better representations of the learning problem and that the best representations are obtained when the deep network discriminates only in the last layers. 1
6 0.53344911 18 nips-2010-A novel family of non-parametric cumulative based divergences for point processes
7 0.51451558 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data
8 0.50285757 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
9 0.50112051 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
10 0.50029534 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
11 0.49929526 64 nips-2010-Distributionally Robust Markov Decision Processes
12 0.49156234 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
13 0.49129477 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
14 0.49051979 115 nips-2010-Identifying Dendritic Processing
15 0.48942032 96 nips-2010-Fractionally Predictive Spiking Neurons
16 0.48851898 117 nips-2010-Identifying graph-structured activation patterns in networks
17 0.48557356 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
18 0.48523086 265 nips-2010-The LASSO risk: asymptotic results and real world examples
19 0.47766584 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
20 0.47201002 253 nips-2010-Spike timing-dependent plasticity as dynamic filter