nips nips2011 nips2011-239 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen
Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Robust Lasso with missing and grossly corrupted observations Nam H. [sent-1, score-0.328]
2 edu Abstract This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. [sent-12, score-0.579]
3 Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. [sent-14, score-0.681]
4 Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. [sent-15, score-0.698]
5 Our second set of results applies to a general class of Gaussian design matrix X with i. [sent-16, score-0.29]
6 d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. [sent-18, score-1.567]
7 Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. [sent-19, score-0.57]
8 Accordingly, there have been various lines of work on high dimensional inference based on imposing different types of structure constraints such as sparsity and group sparsity [15] [5] [21]. [sent-23, score-0.284]
9 For instance, some authors [4] seek to obtain a regression estimate β that delivers small prediction error while other authors [2] [11] [22] seek to produce a regressor with minimal parameter estimation error, which is measured by the 2 -norm of (β − β ). [sent-29, score-0.266]
10 It is clear that if all the entries of y is corrupted by large error, then it is impossible to faithfully recover the regression vector β . [sent-35, score-0.442]
11 However, in many practical applications such as face and acoustic recognition, only a portion of the observation vector is contaminated by gross error. [sent-36, score-0.298]
12 Formally, we have the mathematical model y = Xβ + e + w, (3) where e ∈ Rn is the sparse error whose locations of nonzero entries are unknown and magnitudes can be arbitrarily large and w is another noise vector with bounded entries. [sent-37, score-0.38]
13 If some entries of y are missing, the nonzero entries of e whose locations are associated with the missing entries of the observation vector y have the same values as entries of y but with inverse signs. [sent-41, score-0.443]
14 Another recent line of research on recovering the data from grossly corrupted measurements has been also studied in the context of robust principal component analysis (RPCA) [3] [20] [1]. [sent-43, score-0.331]
15 If the data points are represented as a matrix X, then we wish to find a sparse coefficient matrix B such that X = XB and diag(B) = 0. [sent-54, score-0.286]
16 Given the observation model (1) and the sparsity assumptions on both regression vector β and error e , we propose the following convex minimization to estimate the unknown parameter β as well as the error e . [sent-66, score-0.421]
17 The additional regularization associated with e encourages sparsity on the error where parameter λe controls the sparsity level. [sent-70, score-0.4]
18 (ii) the extended Lasso is able to recover β and e with small prediction error and parameter error? [sent-72, score-0.324]
19 They showed that for a class of Gaussian design matrix with i. [sent-80, score-0.29]
20 d entries, the optimization (5) can recover (β , e ) precisely with high probability even when the fraction of corruption is arbitrarily close to one. [sent-82, score-0.538]
21 In particularly, they require the number of observations n grow proportionally with the ambient dimension p, and the sparsity index k is a very small portion of n. [sent-84, score-0.31]
22 [9], the authors establish that for Gaussian design matrix X, if n ≥ C(k + s) log p where s is the sparsity level of e , then the recovery is exact. [sent-89, score-0.746]
23 This follows from the fact that the combination matrix [X, I] obeys the restricted isometry property, a well-known property used to guarantee exact recovery of sparse vectors via 1 -minimization. [sent-90, score-0.511]
24 These results, however, do not allow the fraction of corruption close to one. [sent-91, score-0.382]
25 √ Using different methods, both sets of authors show that as λ is deterministically selected to be 1/ log p and X is a sub-orthogonal matrix, then the solution of following optimization is exact even a constant fraction of observation is corrupted. [sent-94, score-0.427]
26 Moreover, [8] establishes a similar result with Gaussian design matrix in which the number of observations is only an order of k log p an amount that is known to be optimal in CS and statistics. [sent-95, score-0.539]
27 This paper considers a general setting in which the observations are contaminated by both sparse and dense errors. [sent-100, score-0.297]
28 We establish a general scaling of the quadruplet (n, p, k, s) such that the extended Lasso stably recovers both the regression and corruption vectors. [sent-102, score-0.624]
29 Of particular interest to us are the following equations: (a) First, under what scalings of (n, p, k, s) does the extended Lasso obtain the unique solution with small estimation error. [sent-103, score-0.304]
30 (b) Second, under what scalings of (n, p, k) does the extended Lasso obtain the exact signed support recovery even almost all the observations are corrupted? [sent-104, score-0.885]
31 3 (c) Third, under what scalings of (n, p, k, s) does no solution of the extended Lasso specify the correct signed support? [sent-105, score-0.601]
32 To answer for the first question, we introduce a notion of extended restricted eigenvalue for a matrix [X, I] where I is an identity matrix. [sent-106, score-0.551]
33 In particular, for random Gaussian design matrix with i. [sent-109, score-0.29]
34 d rows N (0, Σ), we rely on two standard assumptions: invertibility and mutual incoherence. [sent-111, score-0.261]
35 However, previous results [2] [17] applying to random Gaussian design matrix are irrelevant to this setting since the Z no longer behave like a Gaussian matrix. [sent-113, score-0.29]
36 By exploiting the fact that the matrix Z consists of two component where one component has special structure, our analysis reveals an interesting phenomenon: extended Lasso can accurately recover both the regressor β and corruption e even when the fraction of corruption is up to 100%. [sent-115, score-1.074]
37 Moreover, our analysis can be extended to the situation in which the identity matrix can be replaced by a tight frame D as well as extended to other models such as group Lasso or matrix Lasso with sparse error. [sent-117, score-0.671]
38 Given and design matrix X ∈ Rn×p and subsets S and T , we use XST to denote the |S| × |T | submatrix obtained by extracting those rows indexed by S and columns indexed by T . [sent-120, score-0.338]
39 In the first subsection, we establish the parameter estimation and provide a deterministic result which bases on the notion of extended restricted eigenvalue. [sent-126, score-0.414]
40 We further show that the random Gaussian design matrix satisfies this property with high probability. [sent-127, score-0.321]
41 We establish conditions for the design matrix such that the solution of the extended Lasso has the exact signed supports. [sent-129, score-0.988]
42 1 Parameter estimation As in conventional Lasso, to obtain a low parameter estimation bound, it is necessary to impose conditions on the design matrix X. [sent-131, score-0.378]
43 In this paper, we introduce a notion of extended restricted eigenvalue (extended RE) condition. [sent-132, score-0.408]
44 (8) This assumption is a natural extension of the restricted eigenvalue condition and restricted strong convexity considered in [2] [14] and [12]. [sent-134, score-0.286]
45 As explained at more length in [2] and [16], restricted eigenvalue is among the weakest assumption on the design matrix such that the solution of the Lasso is consistent. [sent-136, score-0.521]
46 Assuming that the design matrix X obeys the extended RE, then the error set (h, f ) = (β − β , e − e ) is bounded by √ √ (10) h 2 + f 2 ≤ 3κ−2 λβ k + λe s . [sent-139, score-0.573]
47 l There are several interesting observations from this theorem 1) The error bound naturally split into two components related to the sparsity indices of β and e . [sent-140, score-0.353]
48 In addition, the error bound contains three quantity: the sparsity indices, regularization parameters and the extended RE constant. [sent-141, score-0.432]
49 If the terms related to the corruption e are omitted, then we obtain similar parameter estimation bound as the standard Lasso [2] [12]. [sent-142, score-0.312]
50 2) The choice of regularization parameters λβ and λe can make explicitly: assuming w is a Gaussian random vector whose entries are N (0, σ 2 ) and the design matrix has unit-normed columns, it is clear that with high probability, X ∗ w ∞ ≤ 2 σ 2 log p and w ∞ ≤ 2 σ 2 log n. [sent-143, score-0.725]
51 Thus, it is sufficient 4 to select λβ ≥ γ σ 2 log p and λe ≥ 4 σ 2 log n. [sent-144, score-0.306]
52 However, this parameter actually control the sparsity level of the regression vector with respect to the fraction of corruption. [sent-146, score-0.364]
53 In the following lemma, we show that the extended RE condition actually exists for a large class of random Gaussian design matrix whose rows are i. [sent-148, score-0.512]
54 Before stating the lemma, let us define some quantities operating on the covariance matrix Σ: Cmin := λmin (Σ) is the smallest eigenvalue of Σ, Cmax := λmax (Σ) is the biggest eigenvalue of Σ and ξ(Σ) := maxi Σii is the maximal entry on the diagonal of the matrix Σ. [sent-151, score-0.575]
55 Consider the random Gaussian design matrix whose rows are i. [sent-153, score-0.338]
56 Select λn := γ ξ(Σ)n log n , log p (11) then with probability greater than 1 − c1 exp(−c2 n), the matrix X satisfies the extended RE with ξ(Σ) 1 n parameter κl = 4√2 , provided that n ≥ C Cmin k log p and s ≤ min C1 γ 2 log n , C2 n for some small constants C1 , C2 . [sent-156, score-0.922]
57 When design matrix is Gaussian and independent with the Gaussian stochastic noise w, we can easily show that X ∗ w ∞ ≤ 2 ξ(Σ)nδ 2 log p with probability at least 1 − 2 exp(− log p). [sent-158, score-0.627]
58 The small mutual incoherence 5 property is especially important since it provides how the regression separates away from the sparse error. [sent-166, score-0.402]
59 Consider the optimal solution (β, e) to the optimization problem (4) with regularization parameters chosen as λβ ≥ 4 γ σ 2 log p and σ 2 log n, λe ≥ 4 (12) n where γ ∈ (0, 1]. [sent-171, score-0.394]
60 Also assuming that n ≥ Ck log p and s ≤ min{C1 γ 2 log n , C2 n} for some small constants C1 , C2 . [sent-172, score-0.336]
61 Without the dense noise w in the observation model (3) (σ = 0), the extended Lasso recovers the exact solution. [sent-174, score-0.378]
62 Furthermore, if we know in prior that the number of corrupted observations is an order of O(n/ log p), then selecting γ = 1 instead of 1/ log n will minimize the estimation error (see equation (13)) of Theorem 1. [sent-176, score-0.636]
63 2 Feature selection with random Gaussian design In many applications, the feature selection criteria is more preferred [17] [23]. [sent-178, score-0.292]
64 Feature selection refers to the property that the recovered parameter has the same signed support as the true regressor. [sent-179, score-0.482]
65 In this part, we investigate conditions for the design matrix and the scaling of (n, p, k, s) such as both regression and sparse error vectors obtain this criteria. [sent-181, score-0.532]
66 Consider the linear model (3) where X is the Gaussian random design matrix whose rows are i. [sent-182, score-0.338]
67 It has been well known in the Lasso that in order to obtain feature selection accuracy, the covariance matrix Σ must obey two properties: invertibility and small mutual coherence restricted on the set T . [sent-185, score-0.526]
68 We establish the following result for Gaussian random design whose covariance matrix Σ obeys the two assumptions. [sent-202, score-0.479]
69 In addition, suppose that mini∈T |βi | > fβ (λβ ) and mini∈S |ei | > fe (λβ , λe ) where n ≥ max C1 fβ := c1 λβ n−s k log(p − k) −1/2 ΣT T n √ √ λβ fe := c2 (Cmax (k s + s k))1/2 n−s 2 ∞ + 20 σ 2 log k Cmin (n − s) k log(p − k) −1/2 ΣT T n and (18) + c3 λ e . [sent-206, score-0.277]
70 The solution pair (β, e) of the extended Lasso (4) is unique and has exact signed support. [sent-208, score-0.624]
71 There are several interesting observations from the theorem 1) The first and important observation is that extended Lasso is robust to arbitrarily large and sparse error observation. [sent-211, score-0.607]
72 What we sacrifice for the corruption robustness is an additional log factor to the number of samples. [sent-214, score-0.421]
73 We notice that when the error fraction is O(n/ log n), only O(k log(p − k)) samples are sufficient to recover the exact signed supports of both regression and sparse error vectors. [sent-215, score-1.16]
74 2) We consider the special case with Gaussian random design in which the covariance matrix Σ = 1 n Ip×p . [sent-216, score-0.356]
75 In addition, the invertibility and mutual incoherence properties are automatically satisfied. [sent-220, score-0.321]
76 The theorem implies that when the number of errors s is close to n, n the number of samples n needed to recover exact signed supports satisfies log n = Ω(k log(p − k)). [sent-221, score-0.799]
77 Furthermore, Theorem 2 guarantees consistency in element-wise ∞ -norm of the estimated regression at the rate β − β ∞ =O σ 2 log p k log(p−k) γ2n . [sent-222, score-0.261]
78 √ As γ is √ chosen to be 1/ 32 log n (equivalent to establish s close to n), the ∞ error rate is an order of O(σ log p), which is known to be the same as that of the standard Lasso. [sent-223, score-0.44]
79 7 3) Corollary 1, though interesting, is not able to guarantee stable recovery when the fraction of corruption converges to one. [sent-224, score-0.469]
80 We show in Theorem 2 that this fraction can come arbitrarily close to one by sacrificing a factor of log n for the number of samples. [sent-225, score-0.333]
81 Theorem 2 also implies that there is a significant difference between recovery to obtain small parameter estimation error versus recovery to obtain correct variable selection. [sent-226, score-0.278]
82 When the amount of corrupted observations is linearly proportional with n, recovering the exact signed supports require an increase from Ω(k log p) (in Corollary 1) to Ω(k log p log n) samples (in Theorem 2). [sent-227, score-1.255]
83 Our next theorem show that the number of samples needed to recover accurate signed support is optimal. [sent-229, score-0.542]
84 That is, whenever the rescaled sample size satisfies (20), then for whatever regularization parameters λβ and λe are selected, no solution of the extended Lasso correctly identifies the signed supports with high probability. [sent-230, score-0.756]
85 (Inachievability) Given the linear model (3) with random Gaussian design and the covariance matrix Σ satisfy invertibility and incoherence properties for any γ ∈ (0, 1). [sent-232, score-0.596]
86 Then with probability tending to one, no solution pair of the extended Lasso (5) has the correct signed support. [sent-234, score-0.547]
87 3 Illustrative simulations In this section, we provide some simulations to illustrate the possibility of the extended Lasso in recovering the exact regression signed support when a significant fraction of observations is corrupted by large error. [sent-235, score-1.239]
88 Simulations are performed for a range of parameters (n, p, k, s) where the design matrix X is uniform Gaussian random whose rows are i. [sent-236, score-0.338]
89 In our experiments, we consider varying problem sizes p = {128, 256, 512} and three types of regression sparsity indices: sublinear sparsity (k = 0. [sent-240, score-0.392]
90 By this selection, Theorem 2 suggests that number of samples n ≥ 2Ck log(p − k) log n to guarantee exact signed support recovery. [sent-248, score-0.627]
91 In the algorithm, we select λβ = 2 σ 2 log p log n and λe = 2 σ 2 log n as suggested by Theorem 2, where the noise level σ = 0. [sent-251, score-0.49]
92 The algorithm reports a success if the solution pair has the same signed support as (β , e ). [sent-253, score-0.473]
93 As demonstrated by simulations, our extended Lasso is cable to recover the exact signed support of both β and e even 50% of the observations are contaminated. [sent-256, score-0.834]
94 As the sample size log n ≤ 2k log(p − k), the probability of success starts going to zero, implying the failure of the extended Lasso. [sent-258, score-0.371]
95 8 σ2 log n λe −1 , Sublinear sparsity 0. [sent-271, score-0.295]
96 8 0 0 Fractional power sparsity Linear sparsity 1 1 0. [sent-281, score-0.284]
97 8 1 Figure 1: Probability of success in recovering the signed supports [2] P. [sent-295, score-0.537]
98 Exact signal recovery from sparsely corrupted measurements through the pursuit of justice. [sent-333, score-0.274]
99 Exact recoverability from dense corrupted observations via l1 minimization. [sent-368, score-0.325]
100 Sharp thresholds for high-dimensional and noisy sparsity recovery using l1 -constrained quadratic programming ( lasso ). [sent-391, score-0.611]
wordName wordTfidf (topN-words)
[('lasso', 0.382), ('signed', 0.341), ('corruption', 0.268), ('cmin', 0.22), ('dmax', 0.215), ('design', 0.184), ('extended', 0.174), ('cmax', 0.165), ('log', 0.153), ('sparsity', 0.142), ('invertibility', 0.132), ('corrupted', 0.13), ('fraction', 0.114), ('eigenvalue', 0.112), ('regression', 0.108), ('incoherence', 0.108), ('matrix', 0.106), ('observations', 0.096), ('recover', 0.09), ('recovery', 0.087), ('restricted', 0.087), ('supports', 0.083), ('contaminated', 0.082), ('mutual', 0.081), ('gaussian', 0.077), ('exact', 0.077), ('xh', 0.075), ('sparse', 0.074), ('establish', 0.074), ('entries', 0.073), ('rescaled', 0.07), ('recovering', 0.069), ('face', 0.069), ('covariance', 0.066), ('arbitrarily', 0.066), ('fe', 0.062), ('error', 0.06), ('gross', 0.059), ('nguyen', 0.059), ('missing', 0.058), ('measurements', 0.057), ('support', 0.056), ('wright', 0.056), ('corollary', 0.056), ('regularization', 0.056), ('theorem', 0.055), ('nam', 0.054), ('recoverability', 0.054), ('trac', 0.054), ('selection', 0.054), ('regressor', 0.054), ('scalings', 0.054), ('observation', 0.051), ('obeys', 0.049), ('sensing', 0.048), ('rows', 0.048), ('elhamifar', 0.048), ('mini', 0.046), ('dense', 0.045), ('grossly', 0.044), ('laska', 0.044), ('success', 0.044), ('estimation', 0.044), ('ip', 0.042), ('nonzero', 0.042), ('transaction', 0.041), ('faithfully', 0.041), ('compensation', 0.041), ('army', 0.039), ('operating', 0.039), ('cand', 0.039), ('preprint', 0.038), ('annals', 0.038), ('portion', 0.037), ('simulations', 0.037), ('identity', 0.037), ('cg', 0.036), ('johns', 0.036), ('sacri', 0.036), ('fs', 0.036), ('xb', 0.036), ('compressed', 0.036), ('phenomenon', 0.036), ('notion', 0.035), ('hopkins', 0.035), ('ambient', 0.035), ('meinshausen', 0.035), ('dantzig', 0.035), ('quantities', 0.034), ('satis', 0.034), ('magnitudes', 0.034), ('fractional', 0.034), ('negahban', 0.033), ('sensors', 0.033), ('solution', 0.032), ('noise', 0.031), ('property', 0.031), ('tt', 0.031), ('robust', 0.031), ('constants', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen
Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1
2 0.26884288 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
Author: Po-ling Loh, Martin J. Wainwright
Abstract: Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide non-asymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1
3 0.23881856 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach
Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1
4 0.18778986 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
Author: Yusuke Watanabe
Abstract: While loopy Belief Propagation (LBP) has been utilized in a wide variety of applications with empirical success, it comes with few theoretical guarantees. Especially, if the interactions of random variables in a graphical model are strong, the behaviors of the algorithm can be difficult to analyze due to underlying phase transitions. In this paper, we develop a novel approach to the uniqueness problem of the LBP fixed point; our new “necessary and sufficient” condition is stated in terms of graphs and signs, where the sign denotes the types (attractive/repulsive) of the interaction (i.e., compatibility function) on the edge. In all previous works, uniqueness is guaranteed only in the situations where the strength of the interactions are “sufficiently” small in certain senses. In contrast, our condition covers arbitrary strong interactions on the specified class of signed graphs. The result of this paper is based on the recent theoretical advance in the LBP algorithm; the connection with the graph zeta function.
5 0.17065649 265 nips-2011-Sparse recovery by thresholded non-negative least squares
Author: Martin Slawski, Matthias Hein
Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1
6 0.14133701 259 nips-2011-Sparse Estimation with Structured Dictionaries
7 0.14068533 258 nips-2011-Sparse Bayesian Multi-Task Learning
8 0.12866329 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
9 0.11870696 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
10 0.1141259 264 nips-2011-Sparse Recovery with Brownian Sensing
11 0.11116455 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
12 0.10466299 78 nips-2011-Efficient Methods for Overlapping Group Lasso
13 0.10305926 262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
14 0.099895947 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
15 0.097653143 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
16 0.096644774 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
17 0.094075367 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
18 0.093389176 186 nips-2011-Noise Thresholds for Spectral Clustering
19 0.090793766 260 nips-2011-Sparse Features for PCA-Like Linear Regression
20 0.087346688 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
topicId topicWeight
[(0, 0.247), (1, 0.003), (2, -0.058), (3, -0.237), (4, -0.125), (5, 0.089), (6, 0.038), (7, 0.109), (8, 0.14), (9, 0.102), (10, 0.076), (11, -0.022), (12, -0.118), (13, 0.059), (14, 0.048), (15, 0.036), (16, 0.174), (17, -0.147), (18, 0.054), (19, -0.012), (20, 0.062), (21, 0.092), (22, 0.002), (23, 0.054), (24, 0.032), (25, -0.039), (26, -0.029), (27, -0.042), (28, 0.074), (29, -0.064), (30, 0.075), (31, 0.055), (32, 0.097), (33, -0.073), (34, 0.07), (35, -0.001), (36, 0.03), (37, 0.052), (38, 0.091), (39, 0.014), (40, 0.052), (41, 0.05), (42, -0.036), (43, -0.001), (44, -0.023), (45, 0.005), (46, -0.023), (47, -0.139), (48, -0.043), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.9520576 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen
Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1
2 0.78777641 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
Author: Po-ling Loh, Martin J. Wainwright
Abstract: Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide non-asymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1
3 0.78575957 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
Author: Edouard Grave, Guillaume R. Obozinski, Francis R. Bach
Abstract: Using the 1 -norm to regularize the estimation of the parameter vector of a linear model leads to an unstable estimator when covariates are highly correlated. In this paper, we introduce a new penalty function which takes into account the correlation of the design matrix to stabilize the estimation. This norm, called the trace Lasso, uses the trace norm of the selected covariates, which is a convex surrogate of their rank, as the criterion of model complexity. We analyze the properties of our norm, describe an optimization algorithm based on reweighted least-squares, and illustrate the behavior of this norm on synthetic data, showing that it is more adapted to strong correlations than competing methods such as the elastic net. 1
4 0.73721522 265 nips-2011-Sparse recovery by thresholded non-negative least squares
Author: Martin Slawski, Matthias Hein
Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1
5 0.72140908 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky
Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efficiently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1
6 0.65532666 264 nips-2011-Sparse Recovery with Brownian Sensing
7 0.64338863 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
8 0.63645703 259 nips-2011-Sparse Estimation with Structured Dictionaries
9 0.60451484 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
10 0.60273623 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
11 0.57802153 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
12 0.56184947 209 nips-2011-Orthogonal Matching Pursuit with Replacement
13 0.55133003 73 nips-2011-Divide-and-Conquer Matrix Factorization
14 0.53783935 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
15 0.52273154 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
16 0.50722885 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation
17 0.49500123 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
18 0.48754612 260 nips-2011-Sparse Features for PCA-Like Linear Regression
19 0.47452602 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
20 0.47023001 78 nips-2011-Efficient Methods for Overlapping Group Lasso
topicId topicWeight
[(0, 0.035), (4, 0.048), (13, 0.176), (20, 0.032), (26, 0.065), (31, 0.103), (33, 0.026), (43, 0.121), (45, 0.119), (57, 0.039), (65, 0.011), (74, 0.074), (83, 0.03), (84, 0.012), (99, 0.046)]
simIndex simValue paperId paperTitle
1 0.84594941 277 nips-2011-Submodular Multi-Label Learning
Author: James Petterson, Tibério S. Caetano
Abstract: In this paper we present an algorithm to learn a multi-label classifier which attempts at directly optimising the F -score. The key novelty of our formulation is that we explicitly allow for assortative (submodular) pairwise label interactions, i.e., we can leverage the co-ocurrence of pairs of labels in order to improve the quality of prediction. Prediction in this model consists of minimising a particular submodular set function, what can be accomplished exactly and efficiently via graph-cuts. Learning however is substantially more involved and requires the solution of an intractable combinatorial optimisation problem. We present an approximate algorithm for this problem and prove that it is sound in the sense that it never predicts incorrect labels. We also present a nontrivial test of a sufficient condition for our algorithm to have found an optimal solution. We present experiments on benchmark multi-label datasets, which attest the value of the proposed technique. We also make available source code that enables the reproduction of our experiments. 1
same-paper 2 0.84589785 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen
Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1
3 0.82533264 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
4 0.78706789 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
5 0.78302336 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
Author: Feng Yan, Yuan Qi
Abstract: For many real-world applications, we often need to select correlated variables— such as genetic variations and imaging features associated with Alzheimer’s disease—in a high dimensional space. The correlation between variables presents a challenge to classical variable selection methods. To address this challenge, the elastic net has been developed and successfully applied to many applications. Despite its great success, the elastic net does not exploit the correlation information embedded in the data to select correlated variables. To overcome this limitation, we present a novel hybrid model, EigenNet, that uses the eigenstructures of data to guide variable selection. Specifically, it integrates a sparse conditional classification model with a generative model capturing variable correlations in a principled Bayesian framework. We develop an efficient active-set algorithm to estimate the model via evidence maximization. Experimental results on synthetic data and imaging genetics data demonstrate the superior predictive performance of the EigenNet over the lasso, the elastic net, and the automatic relevance determination. 1
6 0.77757251 92 nips-2011-Expressive Power and Approximation Errors of Restricted Boltzmann Machines
7 0.77303147 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
8 0.77114993 22 nips-2011-Active Ranking using Pairwise Comparisons
9 0.76987797 186 nips-2011-Noise Thresholds for Spectral Clustering
10 0.76902777 276 nips-2011-Structured sparse coding via lateral inhibition
11 0.76892579 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
12 0.76825595 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
13 0.76788586 285 nips-2011-The Kernel Beta Process
14 0.76720297 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
15 0.76650685 210 nips-2011-PAC-Bayesian Analysis of Contextual Bandits
16 0.76448166 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
17 0.76366806 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
18 0.7636168 265 nips-2011-Sparse recovery by thresholded non-negative least squares
19 0.76209879 25 nips-2011-Adaptive Hedge
20 0.76178628 231 nips-2011-Randomized Algorithms for Comparison-based Search