nips nips2011 nips2011-259 knowledge-graph by maker-knowledge-mining

259 nips-2011-Sparse Estimation with Structured Dictionaries

Source: pdf

Author: David P. Wipf

Abstract: In the vast majority of recent work on sparse estimation algorithms, performance has been evaluated using ideal or quasi-ideal dictionaries (e.g., random Gaussian or Fourier) characterized by unit ℓ2 norm, incoherent columns or features. But in reality, these types of dictionaries represent only a subset of the dictionaries that are actually used in practice (largely restricted to idealized compressive sensing applications). In contrast, herein sparse estimation is considered in the context of structured dictionaries possibly exhibiting high coherence between arbitrary groups of columns and/or rows. Sparse penalized regression models are analyzed with the purpose of ﬁnding, to the extent possible, regimes of dictionary invariant performance. In particular, a Type II Bayesian estimator with a dictionarydependent sparsity penalty is shown to have a number of desirable invariance properties leading to provable advantages over more conventional penalties such as the ℓ1 norm, especially in areas where existing theoretical recovery guarantees no longer hold. This can translate into improved performance in applications such as model selection with correlated features, source localization, and compressive sensing with constrained measurement directions. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract In the vast majority of recent work on sparse estimation algorithms, performance has been evaluated using ideal or quasi-ideal dictionaries (e. [sent-3, score-0.357]

2 , random Gaussian or Fourier) characterized by unit ℓ2 norm, incoherent columns or features. [sent-5, score-0.248]

3 But in reality, these types of dictionaries represent only a subset of the dictionaries that are actually used in practice (largely restricted to idealized compressive sensing applications). [sent-6, score-0.599]

4 In contrast, herein sparse estimation is considered in the context of structured dictionaries possibly exhibiting high coherence between arbitrary groups of columns and/or rows. [sent-7, score-0.738]

5 Sparse penalized regression models are analyzed with the purpose of ﬁnding, to the extent possible, regimes of dictionary invariant performance. [sent-8, score-0.534]

6 This can translate into improved performance in applications such as model selection with correlated features, source localization, and compressive sensing with constrained measurement directions. [sent-10, score-0.322]

7 The objective is to estimate the unknown generative X0 under the assumption that it is row-sparse, meaning that many rows of X0 have zero norm. [sent-12, score-0.185]

8 The problem is compounded considerably by the additional assumption that m > n, meaning the dictionary Φ is overcomplete. [sent-13, score-0.47]

9 When t = 1, this then reduces to the canonical sparse estimation of a coefﬁcient vector with mostly zero-valued entries or minimal ℓ0 norm [7]. [sent-14, score-0.272]

10 In contrast, estimation of X0 with t > 1 represents the more general simultaneous sparse approximation problem [6, 15] relevant to numerous applications such as compressive sensing and multi-task learning [9, 16], manifold learning [13], array processing [10], and functional brain imaging [1]. [sent-15, score-0.455]

11 1 for nonzero rows there is no additional penalty for large magnitudes. [sent-19, score-0.363]

12 In particular, we discuss conventional Type I sparsity penalties, such as the ℓ1 norm and the ℓ1,2 mixed norm, and a Type II empirical Bayesian alternative characterized by dictionary dependency. [sent-29, score-0.642]

13 When the dictionary Φ is incoherent, meaning the columns are roughly orthogonal to one another, then certain Type I selections are well-known to produce good approximations of X0 via efﬁcient implementations. [sent-30, score-0.575]

14 However, as discussed in Section 3, more structured dictionary types can pose difﬁculties. [sent-31, score-0.48]

15 Many of these can be viewed simply as regression with a sparsity penalty convenient for optimization purposes. [sent-35, score-0.199]

16 The general regression problem we consider here involves solving min Y − ΦX X 2 F + λg(X), (4) where g is some penalty function of the row norms. [sent-36, score-0.208]

17 Type I methods use a separable penalty of the form g (I) (X) = h ( xi· 2 ) , (5) i where h is a non-decreasing, typically concave function. [sent-37, score-0.24]

18 While less transparent than Type I, it has been shown that (6) is a concave non-decreasing function of each row norm of X, hence it promotes row sparsity as well. [sent-42, score-0.39]

19 Moreover, the dictionary-dependency of this penalty appears to be the source of some desirable invariance properties as discussed in Section 4. [sent-43, score-0.356]

20 (7) Structured Dictionaries It is now well-established that when the dictionary Φ is constructed with appropriate randomness, e. [sent-47, score-0.404]

21 , iid Gaussian entries, then for certain choices of g, in particular the convex selection g(X) = i xi· 2 (which represents a generalization of the ℓ1 vector norm to row-sparse matrices), we can expect to recover X0 exactly in the noiseless case or to close approximation otherwise. [sent-49, score-0.294]

22 This assumes that d(X0 ) is sufﬁciently small relative to some function of the dictionary coherence or a related measure. [sent-50, score-0.475]

23 However, with highly structured dictionaries these types of performance guarantees completely break down. [sent-51, score-0.337]

24 2 At the most basic level, one attempt to standardize structured dictionaries is by utilizing some form of column normalization as a pre-processing step. [sent-53, score-0.434]

25 In general, if our given dictionary is effectively W ΦD, with W an arbitrary invertible matrix that scales and correlates rows, and D diagonal, the combined effect can be severely disruptive. [sent-59, score-0.506]

26 As an example from neuroimaging, the MEG/EEG source localization problem involves estimating sparse neural current sources within the brain using sensors placed near the surface of the scalp. [sent-60, score-0.509]

27 The effective dictionary or forward model is characterized by highly correlated rows (because the sensors are physically constrained to be near one another) and columns with drastically different scales (since deep brain sources produce much weaker signals at the surface than superﬁcial ones). [sent-61, score-1.019]

28 More problematic is the situation where Φ = ΦS, since an unrestricted matrix S can introduce arbitrary coherence structure between individual or groups of columns in Φ, meaning the structure of Φ is now arbitrary regardless of how well-behaved the original Φ. [sent-62, score-0.489]

29 4 Analysis We will now analyze the properties of both Type I and Type II cost functions when coherent or highly structured dictionaries are present. [sent-63, score-0.429]

30 Ideally, we would like to arrive at algorithms that are invariant, to the extent possible, to dictionary transformations that would otherwise disrupt the estimation efﬁcacy. [sent-64, score-0.577]

31 This strategy mirrors the progression in the literature of previous sparse estimation theory related to the ℓ1 norm [3, 7, 8]. [sent-66, score-0.272]

32 Let W denote an arbitrary full-rank n × n matrix and D an arbitrary full-rank m × m diagonal matrix. [sent-74, score-0.222]

33 But one possibility to compensate for dictionary structure is ˆ of D ˆ ˆ to jointly learn a W −1 and D−1 that produces a Φ satisfying: (i) ΦΦT = CI (meaning rows have a constant ℓ2 norm of C and are uncorrelated, (ii) φ·i 2 = 1 for all i. [sent-85, score-0.674]

34 However, as a ﬁnal point, we should mention that the invariance Type II exhibits towards W and D (or any corrected form of Type I) will no longer strictly hold once noise is added. [sent-88, score-0.172]

35 We will assume that S is arbitrary with the only restriction being that the resulting Φ satisﬁes spark[Φ] = n + 1, where matrix spark quantiﬁes the smallest number of linearly dependent columns [7]. [sent-91, score-0.353]

36 Let Φ be an arbitrary dictionary with spark [Φ] = n + 1 and X0 a coefﬁcient matrix with d(X0 ) < n. [sent-95, score-0.652]

37 , the nonzero rows of X0 are below some correlation threshold). [sent-98, score-0.276]

38 This i· T solution will be unique whenever ΦX0 X0 Φ = ΦΓΦT has a unique solution for some non-negative, 2 diagonal Γ. [sent-102, score-0.204]

39 There will always exist dictionaries Φ and coefﬁcients X0 , consistent with the conditions from Lemma 2, such that the optimization problem (7) with any possible g(X) of the form g (I) (X) = i h ( xi· 2 ) will have minimizing solutions not equal to X0 (with or without column normalization). [sent-104, score-0.375]

40 In general, Lemma 2 suggests that for estimation purposes uncorrelated rows in X0 can potentially compensate for troublesome dictionary structure, and together with Corollary 1 it also describes a potential advantage of Type II over Type I. [sent-105, score-0.686]

41 , effective sparse recovery is possible even with correlated rows (more on this below). [sent-108, score-0.386]

42 While space prevents a full treatment, in the context of MEG/EEG source estimation, we have successfully localized 500 nonzero sources (rows) using a 100×1000 dictionary. [sent-112, score-0.206]

43 However, what about the situation where strong correlations do exist between the nonzero rows of X0 ? [sent-113, score-0.371]

44 , it exactly reduces to the canonical sparse recovery problem. [sent-120, score-0.227]

45 Let Φ be an arbitrary dictionary with spark [Φ] = n + 1. [sent-129, score-0.616]

46 Then for any X (S, P) with ¯ |S| < n, there exists a non-empty subset X ⊂ X (S, P) (with nonzero Lebesgue measure), such ¯ , the Type II minimization problem that if x0 ∈ X min g (II) (x) x s. [sent-130, score-0.216]

47 In other words, no matter how poorly structured a particular dictionary is with regard to a given sparsity proﬁle, there will always be sparse coefﬁcients we are guaranteed to recover (provided we utilize a convergent algorithm). [sent-136, score-0.695]

48 Given an arbitrary Type I penalty g (I) (x) = i h(|xi |), with h a ﬁxed, non-decreasing function, there will always exist a dictionary Φ (with or without normalized columns) and set X (S, P) such that for any x0 ∈ X (S, P), the problem min g (I) (x) s. [sent-138, score-0.68]

49 While it is difﬁcult to analyze all possible algorithms, we can address one inﬂuential variety based on iterative reweighted ℓ1 minimization [4, 18]. [sent-144, score-0.285]

50 Here the idea is that if h is concave and differentiable, then a convergent means of minimizing (10) is to ˆ utilize a ﬁrst-order Taylor series approximation of g (I) (x) at some point x. [sent-145, score-0.185]

51 This method produces a sparse estimate at each iteration and is guaranteed to converge to a local minima (or stationary point) of (10). [sent-147, score-0.27]

52 Given an arbitrary g (I) (x) as in Lemma 4, there will always exist a Φ and X (S, P), such that for any x0 ∈ X (S, P), iterative reweighted ℓ1 minimization will not converge to x0 when initialized at the minimum ℓ1 norm solution. [sent-149, score-0.568]

53 In general, with highly structured dictionaries deviating from the ideal, the global minimum of convex penalties often does not correspond with x0 as theoretical equivalence results break down. [sent-152, score-0.476]

54 This in turn suggests the use of concave penalty functions to seek possible improvement. [sent-153, score-0.24]

55 However, as illustrated by the following result, even the simplest of sparse recovery problems, that of estimating some x0 with only one nonzero element using a dictionary with a 1D null-space, Type I can be characterized by problematic local minima with (strictly) concave penalties. [sent-154, score-1.112]

56 For this purpose we ¯ deﬁne φ∗ as an arbitrary column of Φ and Φ∗ as all columns of Φ excluding φ∗ . [sent-155, score-0.259]

57 Also, let Φ be a dictionary with unit ℓ2 norm columns and spark [Φ] = min m = n + 1 (i. [sent-158, score-0.851]

58 ∗ h′ min (11) This result has a very clear interpretation related to how dictionary coherence can potentially disrupt even the most rudimentary of estimation tasks. [sent-162, score-0.64]

59 Thus, even the slightest amount of curvature (or strict concavity) in h can lead to the inequality being satisﬁed when highly coherent columns are present. [sent-166, score-0.254]

60 While obviously with h(z) = z this will not be an issue (consistent with the well-known convexity of the ℓ1 problem), for many popular non-convex penalties, this gradient ratio may be large relative to the righthand side, indicating that local minima 5 are always possible. [sent-167, score-0.216]

61 Of course the point here is not that Type I algorithms are incapable of solving simple problems with x0 0 = 1 (and any iterative reweighted ℓ1 scheme will succeed on the ﬁrst step anyway). [sent-170, score-0.224]

62 Rather, Lemma 5 merely demonstrates how highly structured dictionaries can begin to have negative effects on Type I, potentially more so than with Type II, even on trivial tasks. [sent-171, score-0.337]

63 In the ﬁrst experiment, the dictionary represents an MEG leadﬁeld, which at a high level can be viewed as a mapping from the electromagnetic (EM) activity within m brain voxels to n sensors placed near the scalp surface. [sent-174, score-0.783]

64 These effects are well represented by a dictionary such as Φ = W ΦD as discussed previously. [sent-176, score-0.404]

65 Nonzero rows of X0 are drawn iid from a unit Gaussian distribution. [sent-180, score-0.295]

66 We compared Type II, implemented via a simple iterative reweighted ℓ2 approach, with two different Type I schemes. [sent-183, score-0.224]

67 The ﬁrst is a homotopy continuation method using the Type I penalty g (I) (X) = i log( xi· 2 + α), where α is gradually reduced to zero during the estimation 2 process [5]. [sent-184, score-0.31]

68 Secondly, we used the standard mixed-norm penalty g (I) (X) = X 1,2 = i xi· 2 , which leads to a convex minimization problem that generalizes basis pursuit (or the lasso), to the t > 1 domain [6, 10]. [sent-186, score-0.28]

69 First, we utilized basic ℓ2 column normalization, without which Type I will have difﬁculty with the vastly different column ˆ ˆ scalings of Φ. [sent-189, score-0.213]

70 Secondly, we developed an algorithm to learn a transformed dictionary U ΦΠ, with ˆ arbitrary, Π diagonal, such that the combined dictionary has uncorrelated, unit ℓ2 norm rows, and ˆ U unit ℓ2 norm columns (as discussed in Section 4. [sent-190, score-1.229]

71 Figure 1(left) contains results from all of these variants, where it is clear that some compensation for the dictionary structure is essential for good recovery performance. [sent-192, score-0.522]

72 As a ﬁnal point, the Type II success probability does not go to zero even when d(X0 ) = 50, implying that in some cases it is able to ﬁnd a number of nonzeros equal to the number of rows in Φ. [sent-195, score-0.213]

73 This is possible because even with only t = 5 columns, the nonzero rows of X0 display somewhat limited sample correlation, and so exact support recovery is still possible. [sent-196, score-0.354]

74 For each trial we generated a 50 × 100 Gaussian iid dictionary Φ. [sent-199, score-0.541]

75 We then generated a random x0 vector (t = 1 case) using iid Gaussian nonzero entries with d(x0 ) varied from 10 to 25 (with t = 1, we cannot expect to recover as many nonzeros as when t = 5). [sent-202, score-0.303]

76 Signal vectors are computed as y = Φx0 or, for purposes of direct comparison with a canonical iid dictionary, y = Φx0 . [sent-203, score-0.18]

77 We evaluated Type II and the Type I iterative reweighted ℓ1 minimization 6 method from [4], which is guaranteed to do as well or better than standard ℓ1 norm minimization. [sent-204, score-0.404]

78 9 Type II Type I, homotopy, norm Type I, homotopy, invariant Type I, basis pursuit, norm Type I, basis pursuit, invariant 0. [sent-208, score-0.4]

79 5 Type II, iid Φ Type II, coherent Φ Type I, iid Φ Type I, coherent Φ 0. [sent-214, score-0.458]

80 Two Type I methods were compared, a homotopy continuation method from [5] and a version of basis pursuit extended to the simultaneous sparse approximation problem by minmizing the ℓ1,2 mixed norm [6, 10]. [sent-220, score-0.497]

81 Type I methods were compared using standard ℓ2 column normalization and a learned invariance transformation. [sent-221, score-0.326]

82 Right: Probability of success recovering sparse vectors using a Gaussian iid dictionary Φ and a coherent dictionary Φ with clustered columns. [sent-222, score-1.191]

83 6 Conclusion When we are free to choose the basis vectors of an overcomplete signal dictionary, the sparse estimation problem is supported by strong analytical and empirical foundations. [sent-224, score-0.268]

84 However, there are many applications where physical restrictions or other factors impose rigid constraints on the dictionary structure such that the assumptions of theoretical recovery guarantees are violated. [sent-225, score-0.522]

85 Examples include model selection problems with correlated features, source localization, and compressive sensing with constrained measurement directions. [sent-226, score-0.322]

86 For example, in the source localization problem, correlated dictionary columns may correspond with drastically different regions (e. [sent-228, score-0.668]

87 With typical Type I sparsity penalties this can be a difﬁcult undertaking; however, with the natural dictionary dependence of the Type II penalty, to some extent it appears this structure can be accounted for, leading to more consistent performance across dictionary types. [sent-232, score-1.012]

88 It is then readily apparent that the penalty (6) satisﬁes g (II) (X) ≡ min Tr X T Γ−1 X + log |ΦDΓDΦT | = min Tr X T Γ−1 X + log |ΦΓΦT |. [sent-239, score-0.203]

89 Moreover, if ΦΓ∗ ΦT is sufﬁT −1 T T ciently close to t ΦX0 X0 Φ , meaning the off-diagonal elements of X0 X0 are not too large, then it can be shown by differentiating along the direction between any arbitrary point Γ′ and Γ∗ that no local minima exist, leading to the ﬁrst part of Lemma 2. [sent-245, score-0.293]

90 Consider a dictionary Φ and two coefﬁcient matrices     given by 0 1 1 1 ǫ ǫ 1 1  0 −1   1 −1  0 , X(1) =  , (14) , X(2) =  Φ = 1 −1 0 ǫ 0  0 0  0 0 ǫ −ǫ ǫ 0 0 0 It is easily veriﬁed that ΦX(1) = ΦX(2) and that X(1) = X0 , the maximally row-sparse √ solution. [sent-259, score-0.404]

91 Note that ℓ2 column normalization will not change this conclusion since all columns of Φ have equal norm already. [sent-262, score-0.378]

92 Proof of Lemma 4 and Corollary 2: For brevity, we will assume that h is concave and differentiable, as is typical of most sparsity penalties used in practice (the more general case follows with some additional effort). [sent-263, score-0.271]

93 This of course includes h(z) = z, which is both concave and convex, and leads to the ℓ1 norm penalty. [sent-264, score-0.232]

94 Assume we have the dictionary Φ from (14), and that S = {1, 2} and P = {+1, +1}, which implies that any x0 ∈ X (S, P) can be expressed as x0 = [α1 , α2 , 0, 0]T , for some α1 , α2 > 0. [sent-266, score-0.404]

95 To check if this is a local minimum, we can evaluate the gradient of the penalty function g (I) (x) along the feasible region near x(2) . [sent-270, score-0.21]

96 ) It is also straightforward to verify analytically that iterative reweighted ℓ1 minimization will fail on this example when initialized at the minimum ℓ1 norm solution. [sent-277, score-0.457]

97 Elad, “Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization,” Proc. [sent-339, score-0.313]

98 Fuchs, “On sparse representations in arbitrary redundant bases,” IEEE Trans. [sent-346, score-0.175]

99 Willsky, “Sparse signal reconstruction perspective for ¸ source localization with sensor arrays,” IEEE Transactions on Signal Processing, vol. [sent-364, score-0.202]

100 Nagarajan, “Iterative reweighted ℓ1 and ℓ2 methods for ﬁnding sparse solutions,” J. [sent-422, score-0.28]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('dictionary', 0.404), ('type', 0.309), ('ii', 0.233), ('dictionaries', 0.204), ('invariance', 0.172), ('reweighted', 0.171), ('spark', 0.146), ('iid', 0.137), ('penalty', 0.127), ('norm', 0.119), ('rows', 0.119), ('minima', 0.119), ('recovery', 0.118), ('nonzero', 0.117), ('lemma', 0.115), ('concave', 0.113), ('sparse', 0.109), ('columns', 0.105), ('compressive', 0.105), ('homotopy', 0.097), ('coherent', 0.092), ('meg', 0.09), ('column', 0.088), ('sensing', 0.086), ('penalties', 0.086), ('disrupt', 0.083), ('electromagnetic', 0.083), ('signal', 0.083), ('structured', 0.076), ('unique', 0.075), ('brain', 0.073), ('wipf', 0.073), ('sparsity', 0.072), ('coherence', 0.071), ('sensors', 0.066), ('meaning', 0.066), ('normalization', 0.066), ('arbitrary', 0.066), ('herein', 0.063), ('localization', 0.062), ('minimization', 0.061), ('coef', 0.061), ('pursuit', 0.06), ('incoherent', 0.057), ('corollary', 0.057), ('source', 0.057), ('highly', 0.057), ('engan', 0.055), ('limz', 0.055), ('righthand', 0.055), ('correlations', 0.054), ('diagonal', 0.054), ('minimum', 0.053), ('iterative', 0.053), ('invariant', 0.049), ('abt', 0.049), ('cotter', 0.049), ('nonzeros', 0.049), ('characterized', 0.047), ('dh', 0.046), ('extent', 0.046), ('exist', 0.045), ('success', 0.045), ('estimation', 0.044), ('uncorrelated', 0.044), ('row', 0.043), ('problematic', 0.043), ('purposes', 0.043), ('local', 0.042), ('continuation', 0.042), ('scalp', 0.042), ('april', 0.041), ('near', 0.041), ('correlated', 0.04), ('concavity', 0.04), ('counterexample', 0.04), ('voxels', 0.04), ('correlation', 0.04), ('secondly', 0.039), ('unit', 0.039), ('min', 0.038), ('wakin', 0.038), ('simultaneous', 0.038), ('noiseless', 0.038), ('minimizing', 0.038), ('vastly', 0.037), ('situation', 0.036), ('tr', 0.036), ('matrix', 0.036), ('penalized', 0.035), ('surface', 0.035), ('utilize', 0.034), ('measurement', 0.034), ('placed', 0.034), ('norms', 0.033), ('sources', 0.032), ('elad', 0.032), ('compensate', 0.032), ('located', 0.032), ('basis', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 259 nips-2011-Sparse Estimation with Structured Dictionaries

Author: David P. Wipf

2 0.2342646 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

Author: Ioannis A. Gkioulekas, Todd Zickler

Abstract: We propose an approach for linear unsupervised dimensionality reduction, based on the sparse linear model that has been used to probabilistically interpret sparse coding. We formulate an optimization problem for learning a linear projection from the original signal domain to a lower-dimensional one in a way that approximately preserves, in expectation, pairwise inner products in the sparse domain. We derive solutions to the problem, present nonlinear extensions, and discuss relations to compressed sensing. Our experiments using facial images, texture patches, and images of object categories suggest that the approach can improve our ability to recover meaningful structure in many classes of signals. 1

3 0.17597553 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

Author: Zhen J. Xiang, Hao Xu, Peter J. Ramadge

Abstract: Learning sparse representations on data adaptive dictionaries is a state-of-the-art method for modeling data. But when the dictionary is large and the data dimension is high, it is a computationally challenging problem. We explore three aspects of the problem. First, we derive new, greatly improved screening tests that quickly identify codewords that are guaranteed to have zero weights. Second, we study the properties of random projections in the context of learning sparse representations. Finally, we develop a hierarchical framework that uses incremental random projections and screening to learn, in small stages, a hierarchically structured dictionary for sparse representations. Empirical results show that our framework can learn informative hierarchical sparse representations more efﬁciently. 1

4 0.16575971 276 nips-2011-Structured sparse coding via lateral inhibition

Author: Arthur D. Szlam, Karol Gregor, Yann L. Cun

Abstract: This work describes a conceptually simple method for structured sparse coding and dictionary design. Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. We describe modiﬁcations of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efﬁcient. We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-speciﬁed. 1

5 0.14133701 239 nips-2011-Robust Lasso with missing and grossly corrupted observations

Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen

Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our ﬁrst result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1

6 0.13618723 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

7 0.13550627 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

8 0.12967826 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs

9 0.12189191 258 nips-2011-Sparse Bayesian Multi-Task Learning

10 0.11931021 264 nips-2011-Sparse Recovery with Brownian Sensing

11 0.11255695 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

12 0.10560597 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements

13 0.10448124 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

14 0.10395811 211 nips-2011-Penalty Decomposition Methods for Rank Minimization

15 0.10316253 261 nips-2011-Sparse Filtering

16 0.10102165 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

17 0.098979294 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

18 0.096545503 260 nips-2011-Sparse Features for PCA-Like Linear Regression

19 0.091638774 285 nips-2011-The Kernel Beta Process

20 0.091440171 265 nips-2011-Sparse recovery by thresholded non-negative least squares

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.255), (1, 0.061), (2, -0.04), (3, -0.175), (4, -0.11), (5, 0.086), (6, 0.098), (7, 0.215), (8, 0.053), (9, 0.038), (10, 0.04), (11, 0.033), (12, -0.023), (13, 0.066), (14, 0.015), (15, -0.115), (16, 0.077), (17, -0.094), (18, 0.03), (19, 0.024), (20, 0.181), (21, 0.043), (22, -0.022), (23, -0.038), (24, 0.021), (25, -0.077), (26, -0.053), (27, 0.036), (28, -0.123), (29, 0.056), (30, -0.049), (31, -0.011), (32, -0.046), (33, -0.057), (34, 0.086), (35, -0.1), (36, -0.036), (37, -0.009), (38, -0.022), (39, 0.087), (40, 0.017), (41, -0.05), (42, 0.008), (43, -0.093), (44, -0.109), (45, -0.013), (46, -0.008), (47, 0.021), (48, -0.019), (49, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97261614 259 nips-2011-Sparse Estimation with Structured Dictionaries

Author: David P. Wipf

2 0.77858639 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

Author: Ioannis A. Gkioulekas, Todd Zickler

3 0.74514544 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

Author: Zhen J. Xiang, Hao Xu, Peter J. Ramadge

4 0.66420817 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements

Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk

Abstract: We consider the problem of recovering a matrix M that is the sum of a low-rank matrix L and a sparse matrix S from a small set of linear measurements of the form y = A(M) = A(L + S). This model subsumes three important classes of signal recovery problems: compressive sensing, afﬁne rank minimization, and robust principal component analysis. We propose a natural optimization problem for signal recovery under this model and develop a new greedy algorithm called SpaRCS to solve it. Empirically, SpaRCS inherits a number of desirable properties from the state-of-the-art CoSaMP and ADMiRA algorithms, including exponential convergence and efﬁcient implementation. Simulation results with video compressive sensing, hyperspectral imaging, and robust matrix completion data sets demonstrate both the accuracy and efﬁcacy of the algorithm. 1

5 0.66190672 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure

Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky

Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efﬁciently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1

6 0.63605571 209 nips-2011-Orthogonal Matching Pursuit with Replacement

7 0.62721509 264 nips-2011-Sparse Recovery with Brownian Sensing

8 0.62176383 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices

9 0.60126233 239 nips-2011-Robust Lasso with missing and grossly corrupted observations

10 0.59916633 276 nips-2011-Structured sparse coding via lateral inhibition

11 0.58983576 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems

12 0.58071363 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

13 0.56514668 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

14 0.55168051 285 nips-2011-The Kernel Beta Process

15 0.54850227 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements

16 0.54503661 265 nips-2011-Sparse recovery by thresholded non-negative least squares

17 0.53493947 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data

18 0.525361 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

19 0.5111627 260 nips-2011-Sparse Features for PCA-Like Linear Regression

20 0.5105536 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.037), (4, 0.028), (20, 0.033), (26, 0.023), (31, 0.061), (33, 0.011), (43, 0.074), (45, 0.084), (57, 0.026), (65, 0.01), (74, 0.508), (83, 0.03), (99, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96513945 218 nips-2011-Predicting Dynamic Difficulty

Author: Olana Missura, Thomas Gärtner

Abstract: Motivated by applications in electronic games as well as teaching systems, we investigate the problem of dynamic difﬁculty adjustment. The task here is to repeatedly ﬁnd a game difﬁculty setting that is neither ‘too easy’ and bores the player, nor ‘too difﬁcult’ and overburdens the player. The contributions of this paper are (i) the formulation of difﬁculty adjustment as an online learning problem on partially ordered sets, (ii) an exponential update algorithm for dynamic difﬁculty adjustment, (iii) a bound on the number of wrong difﬁculty settings relative to the best static setting chosen in hindsight, and (iv) an empirical investigation of the algorithm when playing against adversaries. 1

2 0.95534086 104 nips-2011-Generalized Beta Mixtures of Gaussians

Author: Artin Armagan, Merlise Clyde, David B. Dunson

Abstract: In recent years, a rich variety of shrinkage priors have been proposed that have great promise in addressing massive regression problems. In general, these new priors can be expressed as scale mixtures of normals, but have more complex forms and better properties than traditional Cauchy and double exponential priors. We ﬁrst propose a new class of normal scale mixtures through a novel generalized beta distribution that encompasses many interesting priors as special cases. This encompassing framework should prove useful in comparing competing priors, considering properties and revealing close connections. We then develop a class of variational Bayes approximations through the new hierarchy presented that will scale more efﬁciently to the types of truly massive data sets that are now encountered routinely. 1

same-paper 3 0.95103878 259 nips-2011-Sparse Estimation with Structured Dictionaries

Author: David P. Wipf

4 0.8433789 155 nips-2011-Learning to Agglomerate Superpixel Hierarchies

Author: Viren Jain, Srinivas C. Turaga, K Briggman, Moritz N. Helmstaedter, Winfried Denk, H. S. Seung

Abstract: An agglomerative clustering algorithm merges the most similar pair of clusters at every iteration. The function that evaluates similarity is traditionally handdesigned, but there has been recent interest in supervised or semisupervised settings in which ground-truth clustered data is available for training. Here we show how to train a similarity function by regarding it as the action-value function of a reinforcement learning problem. We apply this general method to segment images by clustering superpixels, an application that we call Learning to Agglomerate Superpixel Hierarchies (LASH). When applied to a challenging dataset of brain images from serial electron microscopy, LASH dramatically improved segmentation accuracy when clustering supervoxels generated by state of the boundary detection algorithms. The naive strategy of directly training only supervoxel similarities and applying single linkage clustering produced less improvement. 1

5 0.76930368 68 nips-2011-Demixed Principal Component Analysis

Author: Wieland Brendel, Ranulfo Romo, Christian K. Machens

Abstract: In many experiments, the data points collected live in high-dimensional observation spaces, yet can be assigned a set of labels or parameters. In electrophysiological recordings, for instance, the responses of populations of neurons generally depend on mixtures of experimentally controlled parameters. The heterogeneity and diversity of these parameter dependencies can make visualization and interpretation of such data extremely difﬁcult. Standard dimensionality reduction techniques such as principal component analysis (PCA) can provide a succinct and complete description of the data, but the description is constructed independent of the relevant task variables and is often hard to interpret. Here, we start with the assumption that a particularly informative description is one that reveals the dependency of the high-dimensional data on the individual parameters. We show how to modify the loss function of PCA so that the principal components seek to capture both the maximum amount of variance about the data, while also depending on a minimum number of parameters. We call this method demixed principal component analysis (dPCA) as the principal components here segregate the parameter dependencies. We phrase the problem as a probabilistic graphical model, and present a fast Expectation-Maximization (EM) algorithm. We demonstrate the use of this algorithm for electrophysiological data and show that it serves to demix the parameter-dependence of a neural population response. 1

6 0.58229077 276 nips-2011-Structured sparse coding via lateral inhibition

7 0.57850766 196 nips-2011-On Strategy Stitching in Large Extensive Form Multiplayer Games

8 0.55527437 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

9 0.54840267 285 nips-2011-The Kernel Beta Process

10 0.54115915 191 nips-2011-Nonnegative dictionary learning in the exponential noise model for adaptive music signal representation

11 0.53933555 265 nips-2011-Sparse recovery by thresholded non-negative least squares

12 0.53171748 158 nips-2011-Learning unbelievable probabilities

13 0.53043824 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)

14 0.52999359 258 nips-2011-Sparse Bayesian Multi-Task Learning

15 0.52606577 186 nips-2011-Noise Thresholds for Spectral Clustering

16 0.51788265 200 nips-2011-On the Analysis of Multi-Channel Neural Spike Data

17 0.51205844 79 nips-2011-Efficient Offline Communication Policies for Factored Multiagent POMDPs

18 0.51183891 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data

19 0.50975251 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks

20 0.50945765 125 nips-2011-Identifying Alzheimer's Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis