nips nips2011 nips2011-264 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos
Abstract: We consider the problem of recovering the parameter α ∈ RK of a sparse function f (i.e. the number of non-zero entries of α is small compared to the number K of features) given noisy evaluations of f at a set of well-chosen sampling points. We introduce an additional randomization process, called Brownian sensing, based on the computation of stochastic integrals, which produces a Gaussian sensing matrix, for which good recovery properties are proven, independently on the number of sampling points N , even when the features are arbitrarily non-orthogonal. Under the assumption that f is H¨ lder continuous with exponent at least √ we proo 1/2, vide an estimate α of the parameter such that �α − α�2 = O(�η�2 / N ), where � � η is the observation noise. The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. We report numerical experiments illustrating our method. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Under the assumption that f is H¨ lder continuous with exponent at least √ we proo 1/2, vide an estimate α of the parameter such that �α − α�2 = O(�η�2 / N ), where � � η is the observation noise. [sent-11, score-0.191]
2 The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. [sent-12, score-0.347]
3 1 Introduction We consider the problem of sensing an unknown function f : X → R (where X ⊂ Rd ), where f belongs to span of a large set of (known) features {ϕk }1≤k≤K of L2 (X ): f (x) = K � αk ϕk (x), k=1 def where α ∈ RK is the unknown parameter, and is assumed to be S-sparse, i. [sent-14, score-0.579]
4 In the setting considered here we are allowed to select the points {xn }1≤n≤N ∈ X where the function f is evaluated, which results in the noisy observations yn = f (xn ) + ηn , (1) where ηn is an observation noise term. [sent-18, score-0.208]
5 Note that whenever the noise is non-zero, the recovery cannot be perfect, so we wish to express the estimation error �α − α�2 in terms of N , where α is our estimate. [sent-26, score-0.353]
6 We address the problem of sparse recovery by combining the two ideas: • Sparse recovery theorems (see Section 2) essentially say that in order to recover a vector with a small number of measurements, one needs incoherence. [sent-28, score-0.485]
7 The measurement basis, corresponding to the pointwise evaluations f (xn ), should to be incoherent with the representation basis, corresponding to the one on which the vector α is sparse. [sent-29, score-0.168]
8 Interpreting these basis in terms of linear operators, pointwise evaluation of f is equivalent to meadef suring f using Dirac masses δxn (f ) = f (xn ). [sent-30, score-0.144]
9 Since in general the representation basis {ϕk }1≤k≤K is not incoherent with the measurement basis induced by Dirac operators, we would like to consider another measurement basis, possibly randomized, in order that it becomes incoherent with any representation basis. [sent-31, score-0.466]
10 Thus, instead of considering the N ×K sensing matrix Φ = (δxn (ϕk ))k,n , we consider a new M ×K sensing matrix A = (Tm (ϕk ))k,m , where the operators {Tm }1≤m≤M enforce incoherence between bases. [sent-33, score-1.23]
11 The Brownian sensing approach followed here uses stochastic integral operators {Tm }1≤m≤M , which makes the measurement basis incoherent with any representation basis, and generates a sensing matrix A which is Gaussian (with i. [sent-35, score-1.447]
12 Contribution: Our contribution is a sparse recovery result for arbitrary non-orthonormal functional basis {ϕk }k≤K of a H¨ lder continuous function f . [sent-40, score-0.51]
13 This result is obtained by combining two contributions: • We show that when the sensing matrix A is Gaussian, i. [sent-42, score-0.544]
14 • The sensing matrix A is made Gaussian by choosing the operators Tm to be stochastic inte� def grals: Tm f = √1 C f dB m , where B m are Brownian motions, and C is a 1-dimensional M curve of X appropriately chosen according to the functions {ϕk }k≤K (see the discussion in Section 4). [sent-50, score-1.004]
15 We have the property that the recovery property using the Brownian sensing matrix A only depends on the number of Brownian motions M used in the stochastic integrals and not on the number of sampled points N . [sent-52, score-1.118]
16 The number of sample N appears in the quality of estimation of b only, and this is where the assumption that f is H¨ lder continuous comes into the picture. [sent-54, score-0.152]
17 o Outline: In Section 2, we survey the large body of existing results about sparse recovery and relate our contribution to this literature. [sent-55, score-0.265]
18 In Section 3, we explain in detail the Brownian sensing recovery method sketched above and state our main result in Theorem 4. [sent-56, score-0.703]
19 Then we comment on the choice and influence of the sampling domain C on the recovery performance. [sent-58, score-0.295]
20 Finally in Section 5, we report numerical experiments illustrating the recovery properties of the Brownian sensing method, and the benefit of the latter compared to a straightforward application of compressed sensing when there is noise and very few sampling points. [sent-59, score-1.441]
21 2 2 Relation to existing results A standard approach in order to recover α is to consider the N × K matrix Φ = (ϕk (xn ))k,n , and solve the system Φ� ≈ y where y is the vector with components yn . [sent-61, score-0.12]
22 A weaker condition for recovery is the compatibility condition which leads to the following result from [16]: Theorem 2 (Van de Geer & Buhlmann, 2009) Assuming that the compatibility condition is satisfied, i. [sent-71, score-0.334]
23 2 C(L, S)2 N The second sub-problem (b) of the global reconstruction problem is to provide the user with a simple way to efficiently sample the space in order to build a matrix Φ such that the conditions for recovery are fulfilled, at least with high probability. [sent-74, score-0.33]
24 For instance, to the best of our knowledge, there is no result explaining how to sample the space so that the corresponding sensing matrix Φ satisfies the nice recovery properties needed by the previous theorems, for a general family of features {ϕk }k≤K . [sent-76, score-0.836]
25 However, it is proven in [12] that under some hypotheses on the functional basis, we are able to recover the strong RIP property for the matrix Φ with high probability. [sent-77, score-0.119]
26 This result, combined with a recovery result, is stated as follows: Theorem 3 (Rauhut, 2010) Assume that {ϕk }k≤K is an orthonormal basis of functions under a measure ν, bounded by a constant Cϕ , and that we build DN by sampling f at random according N 2 to ν. [sent-78, score-0.516]
27 But the weakness of the result is that the initial basis has to be orthonormal and bounded under the given measure ν in order to get the RIP satisfied: the two conditions ensure incoherence with Dirac observation basis. [sent-86, score-0.27]
28 , Legendre Polynomial basis, has been considered in [13], but to the best of our knowledge, the problem of designing a general sampling strategy such that the resulting sensing matrix possesses nice recovery properties in the case of non-orthonormal basis remains unaddressed. [sent-89, score-1.019]
29 When the representation and observation basis are not incoherent, the sensing matrix Φ does not possess a nice recovery property. [sent-92, score-0.978]
30 A natural idea is to change the observation basis by introducing a set of M linear operators {Tm }m≤M acting on the functions {ϕk }k≤K . [sent-93, score-0.279]
31 K � We have Tm (f ) = αk Tm (ϕk ) for all 1 ≤ m ≤ M and our goal is to define the operators k=1 {Tm }m≤M in order that the sensing matrix (Tm (ϕk ))m,k enjoys a nice recovery property, whatever the representation basis {ϕk }k≤K . [sent-94, score-1.101]
32 We now consider linear operators defined by stochastic integrals on a 1-dimensional curve C of X . [sent-96, score-0.469]
33 We will discuss the existence of a such a curve later in Section 4. [sent-98, score-0.199]
34 Then, we define the � def linear operators {Tm }1≤m≤M as stochastic integrals over the curve C: Tm (g) = √1 C gdB m , M where {B m }m≤M are M independent Brownian motions defined on C. [sent-99, score-0.682]
35 Thus, the samples only serve for estimating b and for this purpose, we sample f at points {xn }1≤n≤N regularly chosen along the curve C. [sent-104, score-0.284]
36 In general, for a curve C parametrized with speed-preserving parametrization g : [0, l] → X of C, n we have xn = g( N l) and the resulting estimate � ∈ RM of b is defined with components: b N −1 � �m = √1 yn (B m (xn+1 ) − B m (xn )) . [sent-105, score-0.358]
37 The final step of the proposed method is to apply standard recovery techniques (e. [sent-107, score-0.208]
38 , l1 minimization or Lasso) to compute α for the system (2) where b is perturbed by the so-called sensing noise � def � (estimation error of the stochastic integrals). [sent-109, score-0.753]
39 1 Properties of the transformed objects We now give two properties of the Brownian sensing matrix A and the sensing noise ε = b − � . [sent-111, score-1.14]
40 By definition of the stochastic integral operators {Tm }m≤M , the sensing matrix A = (Tm (ϕk ))m,k is a centered Gaussian matrix, with � 1 ϕk (x)ϕk� (x)dx . [sent-113, score-0.719]
41 We consider the simplest 2 deterministic sensing design where we choose the sensing points to be uniformly distributed along the curve C 2 . [sent-121, score-1.274]
42 2 ∀(x, y) ∈ X 2 , |f (x) − f (y)| ≤ L|x − y|β , then for any p ∈ (0, 1], with probability at least 1 − p, we have the following bound on the sensing noise ε = b − � b: σ 2 (N, M, p) ˜ �ε�2 ≤ , 2 N where � �� � L2 l2β log(1/p) log(1/p) � def 2 2 . [sent-124, score-0.658]
43 The observation noise term (when σ 2 > 0) dominates the approximation error term whenever β ≥ 1/2. [sent-126, score-0.152]
44 In this section, we state our main recovery result for the Brownian sensing method, described in Figure 1, using a uniform sampling method along a one-dimensional curve C ⊂ X ⊂ Rd . [sent-129, score-1.012]
45 Theorem 4 (Main result) Assume that f is (L, β)-H¨ lder on X and that VC is invertible. [sent-131, score-0.129]
46 Then, with probability at least 1 − 3p, the solution α obtained by the Brownian sensing approach described in � Figure 1, satisfies � σ 2 (N, M, p) � ˜ κ4 C � 2 �� − α�2 ≤ C α , 2 N maxk C ϕk where C is a numerical constant and σ (N, M, p) is defined in Proposition 2. [sent-135, score-0.539]
47 5 Input: a curve C of length l such that VC is invertible. [sent-141, score-0.199]
48 • Select N uniform samples {xn }1≤n≤N along the curve C, • Generate M Brownian motions {B m }1≤m≤M along C. [sent-143, score-0.422]
49 • Compute the Brownian sensing matrix A ∈ RM ×K � (i. [sent-144, score-0.544]
50 min �a�1 such that �Aa − � 2 ≤ a N Figure 1: The Brownian sensing approach using a uniform sampling along the curve C. [sent-151, score-0.804]
51 We then comment on the choice of the curve C and illustrate examples of such curves for different bases. [sent-154, score-0.223]
52 Concerning the scaling of the estimation error in terms of the number of sensing points N , Theorem 3 of [12] (reminded in Section 2) states that when N is large enough 2 (i. [sent-157, score-0.577]
53 Thus, provided that the α 2 N N 2β function f has a H¨ lder exponent β ≥ 1/2, we obtain the same rate as in Theorem 3. [sent-161, score-0.161]
54 Note that our recovery performance scales with the condition number κC of VC as well as the length l of the curve C. [sent-163, score-0.435]
55 However, concerning the hypothesis on the functions {ϕk }k≤K , we only assume that the covariance matrix VC is invertible on the curve C, which enables to handle arbitrarily non-orthonormal bases. [sent-164, score-0.344]
56 This means that the orthogonality condition on the basis functions is not a crucial requirement to deduce sparse recovery properties. [sent-165, score-0.442]
57 Also the computational complexity of the Brownian sensing increases when κC is large, since it is necessary to take a large M , i. [sent-168, score-0.495]
58 Theorem 4 requires a constraint on the number of Brownian motions M (i. [sent-172, score-0.129]
59 , that M = Ω(S log K)) and not on the number of sampling points N (as in [12], see Theorem 3). [sent-174, score-0.123]
60 This is due to the fact that the Brownian sensing matrix A only depends on the computation of the M stochastic integrals of the K functions ϕk , and does not depend on the samples. [sent-176, score-0.729]
61 2 The choice of the curve Why sampling along a 1-dimensional curve C instead of sampling over the whole space X ? [sent-181, score-0.591]
62 But in dimension d > 1, following the Brownian sensing approach while sampling over the whole space would require generating M Brownian sheets (extension of Brownian motions to d > 1 dimensions) over X , and then building 6 � m m the M × K matrix A with elements Am,k = X ϕk (t1 , . [sent-183, score-0.756]
63 Assuming that the covariance matrix VX is invertible, this Brownian sensing matrix is also Gaussian and enjoys the same recovery properties as in the one-dimensional case. [sent-190, score-0.847]
64 However, in this case, estimating � the stochastic integrals bm = X f dB m using sensing points along a (d-dimensional) grid would provide an estimation error ε = b−� that scales poorly with d since we integrate over a d dimensional b space. [sent-191, score-0.822]
65 This explains our choice of selecting a 1-dimensional curve C instead of the whole space X and sampling N points along the curve. [sent-192, score-0.367]
66 This choice provides indeed a better estimation of b which is defined by a 1-dimensional stochastic integrals over C. [sent-193, score-0.184]
67 Note that the only requirement for the choice of the curve C is that the covariance matrix VC defined along this curve should be invertible. [sent-194, score-0.52]
68 In addition, in some specific applications the sampling process can be very constrained by physical systems and sampling uniformly in all the domain is typically costly. [sent-195, score-0.126]
69 What the parameters of the curve tell us on a basis. [sent-202, score-0.199]
70 In the result of Theorem 4, the length l of the curve C as well as the condition number κC = νmax,C /νmin,C are essential characteristics of the efficiency of the method. [sent-203, score-0.227]
71 Indeed, it may not be possible to find a short curve C such that κC is small. [sent-205, score-0.199]
72 For instance in the case where the basis functions have compact support, if the curve C does not pass through the support of all functions, VC will not be invertible. [sent-206, score-0.339]
73 Any function whose support does not intersect with the curve would indeed be an eigenvector of VC with a 0 eigenvalue. [sent-207, score-0.199]
74 This indicates that the method will not work well in the case of a very localized basis {ϕk }k≤K (e. [sent-208, score-0.135]
75 wavelets with compact support), since the curve would have to cover the whole domain and thus l will be very large. [sent-210, score-0.219]
76 On the other hand, the situation may be much nicer when the basis is not localized, as in the case of a Fourier basis. [sent-211, score-0.116]
77 We show in the next subsection that in a d-dimensional Fourier basis, it is possible to find a curve C (actually a segment) such that the basis is orthonormal along the chosen line (i. [sent-212, score-0.403]
78 3 Examples of curves For illustration, we exhibit three cases for which one can easily derive a curve C such that VC is invertible. [sent-216, score-0.199]
79 X is a segment of R: In this case, we simply take C = X , and the sparse recovery is possible whenever the functions {ϕk }k≤K are linearly independent in L2 . [sent-218, score-0.293]
80 Coordinate functions: Consider the case when the basis are the coordinate functions ϕk (t1 , . [sent-219, score-0.14]
81 Then we can define the parametrization of the curve C by g(t) = α(t)(t, t2 , . [sent-223, score-0.239]
82 Fourier basis: Let us now consider the Fourier basis in Rd with frequency T : � � 2iπnj tj � exp − ϕn1 ,. [sent-229, score-0.139]
83 Note that this basis is orthonormal under the uniform d−1 1 T distribution on [0, 1]d . [sent-238, score-0.157]
84 ,nd ) with the one d dimensional Fourier basis with frequency T , which means that the condition number ρ = 1 for λ this curve. [sent-262, score-0.144]
85 Therefore, for a d-dimensional function f , sparse in the Fourier basis, it is sufficient to sample along the curve induced by g to ensure that VC is invertible. [sent-263, score-0.285]
86 7 5 Numerical Experiments In this section, we illustrate the method of Brownian sensing in dimension one. [sent-264, score-0.495]
87 In the experiments, we use a function f whose decomposition is 3-sparse and 2π which is (10, 1)-H¨ lder, and we consider a bounded observation noise η, with different noise levels, o �N 2 where the noise level is defined by σ 2 = n=1 ηn . [sent-266, score-0.294]
88 Comparison of l1−minimization and Brownian Sensing with noise variance 0 Comparison of l1−minimization and Brownian Sensing with noise variance 0. [sent-267, score-0.158]
89 In Figure 2, the plain curve represents the recovery performance, i. [sent-270, score-0.43]
90 95 2(100/N + 2) using b� M = 100 Brownian motions and a regular grid of N points, as a function of N 3 . [sent-275, score-0.152]
91 The dashed curve represents the mean squared error of a regular l1 minimization of �a�1 under the constraint that �Φa − y�2 ≤ σ 2 (as described e. [sent-276, score-0.273]
92 Figure 2 illustrates that, as expected, Brownian sensing outperforms the method described in [12] for noisy measurements4 . [sent-282, score-0.515]
93 Note also that the method described in [12] recovers the sparse vector when there is no noise, and that Brownian sensing in this case has a smoother dependency w. [sent-283, score-0.534]
94 Note that this improvement comes from the fact that we use the H¨ lder regularity of the function: o Compressed sensing may outperform Brownian sensing for arbitrarily non regular functions. [sent-287, score-1.166]
95 Conclusion In this paper, we have introduced a so-called Brownian sensing approach, as a way to sample an unknown function which has a sparse representation on a given non-orthonormal basis. [sent-288, score-0.552]
96 Our approach differs from previous attempts to apply compressed sensing in the fact that we build a “Brownian sensing” matrix A based on a set of Brownian motions, which is independent of the function f . [sent-289, score-0.596]
97 This enables us to guarantee nice recovery properties of A. [sent-290, score-0.27]
98 We pro√ vided competitive reconstruction error rates of order O(�η�2 / N ) when the observation noise η is bounded and f is assumed to be H¨ lder continuous with exponent at least 1/2. [sent-295, score-0.35]
99 We believe that the o H¨ lder assumption is not strictly required (the smoothness of f is assumed to derive nice estimations o of the stochastic integrals only), and future works will consider weakening this assumption, possibly by considering randomized sampling designs. [sent-296, score-0.415]
100 Stable signal recovery from incomplete and inaccurate e measurements. [sent-335, score-0.208]
wordName wordTfidf (topN-words)
[('brownian', 0.576), ('sensing', 0.495), ('tm', 0.284), ('recovery', 0.208), ('curve', 0.199), ('vc', 0.149), ('motions', 0.129), ('lder', 0.129), ('integrals', 0.117), ('basis', 0.116), ('operators', 0.109), ('rip', 0.09), ('def', 0.084), ('noise', 0.079), ('xn', 0.078), ('incoherent', 0.067), ('fourier', 0.065), ('sampling', 0.063), ('nice', 0.062), ('rk', 0.054), ('matrix', 0.049), ('td', 0.047), ('along', 0.047), ('db', 0.045), ('numerical', 0.044), ('stochastic', 0.044), ('satis', 0.042), ('orthonormal', 0.041), ('yn', 0.041), ('parametrization', 0.04), ('legendre', 0.039), ('isometry', 0.039), ('lille', 0.039), ('sparse', 0.039), ('points', 0.038), ('aa', 0.037), ('dirac', 0.037), ('bm', 0.037), ('theorem', 0.037), ('cand', 0.037), ('nj', 0.037), ('rm', 0.036), ('rauhut', 0.034), ('compressed', 0.034), ('incoherence', 0.033), ('exponent', 0.032), ('reconstruction', 0.032), ('measurement', 0.032), ('compressive', 0.032), ('romberg', 0.032), ('minimization', 0.03), ('observation', 0.03), ('inria', 0.03), ('recover', 0.03), ('sara', 0.03), ('pointwise', 0.028), ('condition', 0.028), ('dn', 0.028), ('tk', 0.027), ('deduce', 0.027), ('bounded', 0.027), ('covariance', 0.026), ('possesses', 0.026), ('selector', 0.026), ('geer', 0.025), ('dantzig', 0.025), ('proposition', 0.024), ('whatever', 0.024), ('comment', 0.024), ('functions', 0.024), ('arbitrarily', 0.024), ('tj', 0.023), ('conditions', 0.023), ('regular', 0.023), ('estimation', 0.023), ('evaluations', 0.023), ('illustrating', 0.023), ('plain', 0.023), ('log', 0.022), ('whenever', 0.022), ('invertible', 0.022), ('transformed', 0.022), ('family', 0.022), ('integral', 0.022), ('cos', 0.022), ('proven', 0.021), ('compatibility', 0.021), ('es', 0.021), ('error', 0.021), ('enjoys', 0.02), ('whole', 0.02), ('noisy', 0.02), ('property', 0.019), ('localized', 0.019), ('communications', 0.019), ('stated', 0.019), ('build', 0.018), ('representation', 0.018), ('ensuring', 0.018), ('contribution', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 264 nips-2011-Sparse Recovery with Brownian Sensing
Author: Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos
Abstract: We consider the problem of recovering the parameter α ∈ RK of a sparse function f (i.e. the number of non-zero entries of α is small compared to the number K of features) given noisy evaluations of f at a set of well-chosen sampling points. We introduce an additional randomization process, called Brownian sensing, based on the computation of stochastic integrals, which produces a Gaussian sensing matrix, for which good recovery properties are proven, independently on the number of sampling points N , even when the features are arbitrarily non-orthogonal. Under the assumption that f is H¨ lder continuous with exponent at least √ we proo 1/2, vide an estimate α of the parameter such that �α − α�2 = O(�η�2 / N ), where � � η is the observation noise. The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. We report numerical experiments illustrating our method. 1
2 0.18771844 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky
Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efficiently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1
3 0.14485508 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk
Abstract: We consider the problem of recovering a matrix M that is the sum of a low-rank matrix L and a sparse matrix S from a small set of linear measurements of the form y = A(M) = A(L + S). This model subsumes three important classes of signal recovery problems: compressive sensing, affine rank minimization, and robust principal component analysis. We propose a natural optimization problem for signal recovery under this model and develop a new greedy algorithm called SpaRCS to solve it. Empirically, SpaRCS inherits a number of desirable properties from the state-of-the-art CoSaMP and ADMiRA algorithms, including exponential convergence and efficient implementation. Simulation results with video compressive sensing, hyperspectral imaging, and robust matrix completion data sets demonstrate both the accuracy and efficacy of the algorithm. 1
4 0.11931021 259 nips-2011-Sparse Estimation with Structured Dictionaries
Author: David P. Wipf
Abstract: In the vast majority of recent work on sparse estimation algorithms, performance has been evaluated using ideal or quasi-ideal dictionaries (e.g., random Gaussian or Fourier) characterized by unit ℓ2 norm, incoherent columns or features. But in reality, these types of dictionaries represent only a subset of the dictionaries that are actually used in practice (largely restricted to idealized compressive sensing applications). In contrast, herein sparse estimation is considered in the context of structured dictionaries possibly exhibiting high coherence between arbitrary groups of columns and/or rows. Sparse penalized regression models are analyzed with the purpose of finding, to the extent possible, regimes of dictionary invariant performance. In particular, a Type II Bayesian estimator with a dictionarydependent sparsity penalty is shown to have a number of desirable invariance properties leading to provable advantages over more conventional penalties such as the ℓ1 norm, especially in areas where existing theoretical recovery guarantees no longer hold. This can translate into improved performance in applications such as model selection with correlated features, source localization, and compressive sensing with constrained measurement directions. 1
5 0.1141259 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen
Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1
6 0.11319225 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
7 0.09353637 209 nips-2011-Orthogonal Matching Pursuit with Replacement
8 0.091142818 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
9 0.066319056 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
10 0.061442312 283 nips-2011-The Fixed Points of Off-Policy TD
11 0.059286043 265 nips-2011-Sparse recovery by thresholded non-negative least squares
12 0.058772299 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
13 0.056203786 245 nips-2011-Selecting the State-Representation in Reinforcement Learning
14 0.055947565 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
15 0.054126546 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
16 0.053841583 97 nips-2011-Finite Time Analysis of Stratified Sampling for Monte Carlo
17 0.049911149 278 nips-2011-TD gamma: Re-evaluating Complex Backups in Temporal Difference Learning
18 0.048546977 258 nips-2011-Sparse Bayesian Multi-Task Learning
19 0.047354061 305 nips-2011-k-NN Regression Adapts to Local Intrinsic Dimension
20 0.046934318 73 nips-2011-Divide-and-Conquer Matrix Factorization
topicId topicWeight
[(0, 0.147), (1, -0.018), (2, -0.004), (3, -0.108), (4, -0.102), (5, 0.062), (6, 0.024), (7, 0.067), (8, 0.065), (9, 0.062), (10, 0.034), (11, 0.024), (12, -0.02), (13, 0.022), (14, 0.105), (15, -0.086), (16, 0.054), (17, -0.145), (18, -0.107), (19, 0.078), (20, 0.053), (21, 0.054), (22, -0.018), (23, -0.134), (24, 0.009), (25, 0.018), (26, -0.129), (27, -0.021), (28, 0.015), (29, 0.059), (30, 0.012), (31, 0.104), (32, 0.043), (33, -0.116), (34, 0.073), (35, -0.183), (36, -0.065), (37, 0.046), (38, 0.078), (39, -0.096), (40, 0.041), (41, 0.074), (42, 0.113), (43, 0.009), (44, -0.088), (45, -0.027), (46, -0.03), (47, 0.002), (48, -0.007), (49, -0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.94402605 264 nips-2011-Sparse Recovery with Brownian Sensing
Author: Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos
Abstract: We consider the problem of recovering the parameter α ∈ RK of a sparse function f (i.e. the number of non-zero entries of α is small compared to the number K of features) given noisy evaluations of f at a set of well-chosen sampling points. We introduce an additional randomization process, called Brownian sensing, based on the computation of stochastic integrals, which produces a Gaussian sensing matrix, for which good recovery properties are proven, independently on the number of sampling points N , even when the features are arbitrarily non-orthogonal. Under the assumption that f is H¨ lder continuous with exponent at least √ we proo 1/2, vide an estimate α of the parameter such that �α − α�2 = O(�η�2 / N ), where � � η is the observation noise. The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. We report numerical experiments illustrating our method. 1
2 0.79925889 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
Author: Andrew E. Waters, Aswin C. Sankaranarayanan, Richard Baraniuk
Abstract: We consider the problem of recovering a matrix M that is the sum of a low-rank matrix L and a sparse matrix S from a small set of linear measurements of the form y = A(M) = A(L + S). This model subsumes three important classes of signal recovery problems: compressive sensing, affine rank minimization, and robust principal component analysis. We propose a natural optimization problem for signal recovery under this model and develop a new greedy algorithm called SpaRCS to solve it. Empirically, SpaRCS inherits a number of desirable properties from the state-of-the-art CoSaMP and ADMiRA algorithms, including exponential convergence and efficient implementation. Simulation results with video compressive sensing, hyperspectral imaging, and robust matrix completion data sets demonstrate both the accuracy and efficacy of the algorithm. 1
3 0.79919338 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky
Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efficiently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1
4 0.76929516 209 nips-2011-Orthogonal Matching Pursuit with Replacement
Author: Prateek Jain, Ambuj Tewari, Inderjit S. Dhillon
Abstract: In this paper, we consider the problem of compressed sensing where the goal is to recover all sparse vectors using a small number offixed linear measurements. For this problem, we propose a novel partial hard-thresholding operator that leads to a general family of iterative algorithms. While one extreme of the family yields well known hard thresholding algorithms like ITI and HTP[17, 10], the other end of the spectrum leads to a novel algorithm that we call Orthogonal Matching Pursnit with Replacement (OMPR). OMPR, like the classic greedy algorithm OMP, adds exactly one coordinate to the support at each iteration, based on the correlation with the current residnal. However, unlike OMP, OMPR also removes one coordinate from the support. This simple change allows us to prove that OMPR has the best known guarantees for sparse recovery in terms of the Restricted Isometry Property (a condition on the measurement matrix). In contrast, OMP is known to have very weak performance guarantees under RIP. Given its simple structore, we are able to extend OMPR using locality sensitive hashing to get OMPR-Hasb, the first provably sub-linear (in dimensionality) algorithm for sparse recovery. Our proof techniques are novel and flexible enough to also permit the tightest known analysis of popular iterative algorithms such as CoSaMP and Subspace Pursnit. We provide experimental results on large problems providing recovery for vectors of size up to million dimensions. We demonstrste that for large-scale problems our proposed methods are more robust and faster than existing methods.
5 0.69925725 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
Author: Yi-kai Liu
Abstract: We study the problem of reconstructing an unknown matrix M of rank r and dimension d using O(rd poly log d) Pauli measurements. This has applications in quantum state tomography, and is a non-commutative analogue of a well-known problem in compressed sensing: recovering a sparse vector from a few of its Fourier coefficients. We show that almost all sets of O(rd log6 d) Pauli measurements satisfy the rankr restricted isometry property (RIP). This implies that M can be recovered from a fixed (“universal”) set of Pauli measurements, using nuclear-norm minimization (e.g., the matrix Lasso), with nearly-optimal bounds on the error. A similar result holds for any class of measurements that use an orthonormal operator basis whose elements have small operator norm. Our proof uses Dudley’s inequality for Gaussian processes, together with bounds on covering numbers obtained via entropy duality. 1
6 0.63918793 73 nips-2011-Divide-and-Conquer Matrix Factorization
7 0.55845761 259 nips-2011-Sparse Estimation with Structured Dictionaries
8 0.54536498 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
9 0.49187055 265 nips-2011-Sparse recovery by thresholded non-negative least squares
10 0.45788383 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
11 0.40727451 5 nips-2011-A Denoising View of Matrix Completion
12 0.39208874 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion
13 0.3702082 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
14 0.3566739 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
15 0.34340322 285 nips-2011-The Kernel Beta Process
16 0.3272348 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
17 0.32604516 69 nips-2011-Differentially Private M-Estimators
18 0.31463623 206 nips-2011-Optimal Reinforcement Learning for Gaussian Systems
19 0.30551946 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation
20 0.29859215 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems
topicId topicWeight
[(0, 0.03), (4, 0.017), (20, 0.03), (26, 0.025), (31, 0.066), (33, 0.014), (35, 0.012), (39, 0.027), (43, 0.476), (45, 0.076), (57, 0.026), (74, 0.046), (83, 0.018), (99, 0.047)]
simIndex simValue paperId paperTitle
1 0.99219477 26 nips-2011-Additive Gaussian Processes
Author: David K. Duvenaud, Hannes Nickisch, Carl E. Rasmussen
Abstract: We introduce a Gaussian process model of functions which are additive. An additive function is one which decomposes into a sum of low-dimensional functions, each depending on only a subset of the input variables. Additive GPs generalize both Generalized Additive Models, and the standard GP models which use squared-exponential kernels. Hyperparameter learning in this model can be seen as Bayesian Hierarchical Kernel Learning (HKL). We introduce an expressive but tractable parameterization of the kernel function, which allows efficient evaluation of all input interaction terms, whose number is exponential in the input dimension. The additional structure discoverable by this model results in increased interpretability, as well as state-of-the-art predictive power in regression tasks. 1
2 0.95890367 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
Author: Shilin Ding, Grace Wahba, Xiaojin Zhu
Abstract: In discrete undirected graphical models, the conditional independence of node labels Y is specified by the graph structure. We study the case where there is another input random vector X (e.g. observed features) such that the distribution P (Y | X) is determined by functions of X that characterize the (higher-order) interactions among the Y ’s. The main contribution of this paper is to learn the graph structure and the functions conditioned on X at the same time. We prove that discrete undirected graphical models with feature X are equivalent to multivariate discrete models. The reparameterization of the potential functions in graphical models by conditional log odds ratios of the latter offers advantages in representation of the conditional independence structure. The functional spaces can be flexibly determined by kernels. Additionally, we impose a Structure Lasso (SLasso) penalty on groups of functions to learn the graph structure. These groups with overlaps are designed to enforce hierarchical function selection. In this way, we are able to shrink higher order interactions to obtain a sparse graph structure. 1
3 0.95305878 282 nips-2011-The Fast Convergence of Boosting
Author: Matus J. Telgarsky
Abstract: This manuscript considers the convergence rate of boosting under a large class of losses, including the exponential and logistic losses, where the best previous rate of convergence was O(exp(1/✏2 )). First, it is established that the setting of weak learnability aids the entire class, granting a rate O(ln(1/✏)). Next, the (disjoint) conditions under which the infimal empirical risk is attainable are characterized in terms of the sample and weak learning class, and a new proof is given for the known rate O(ln(1/✏)). Finally, it is established that any instance can be decomposed into two smaller instances resembling the two preceding special cases, yielding a rate O(1/✏), with a matching lower bound for the logistic loss. The principal technical hurdle throughout this work is the potential unattainability of the infimal empirical risk; the technique for overcoming this barrier may be of general interest. 1
4 0.94337034 44 nips-2011-Bayesian Spike-Triggered Covariance Analysis
Author: Jonathan W. Pillow, Il M. Park
Abstract: Neurons typically respond to a restricted number of stimulus features within the high-dimensional space of natural stimuli. Here we describe an explicit modelbased interpretation of traditional estimators for a neuron’s multi-dimensional feature space, which allows for several important generalizations and extensions. First, we show that traditional estimators based on the spike-triggered average (STA) and spike-triggered covariance (STC) can be formalized in terms of the “expected log-likelihood” of a Linear-Nonlinear-Poisson (LNP) model with Gaussian stimuli. This model-based formulation allows us to define maximum-likelihood and Bayesian estimators that are statistically consistent and efficient in a wider variety of settings, such as with naturalistic (non-Gaussian) stimuli. It also allows us to employ Bayesian methods for regularization, smoothing, sparsification, and model comparison, and provides Bayesian confidence intervals on model parameters. We describe an empirical Bayes method for selecting the number of features, and extend the model to accommodate an arbitrary elliptical nonlinear response function, which results in a more powerful and more flexible model for feature space inference. We validate these methods using neural data recorded extracellularly from macaque primary visual cortex. 1
same-paper 5 0.94015497 264 nips-2011-Sparse Recovery with Brownian Sensing
Author: Alexandra Carpentier, Odalric-ambrym Maillard, Rémi Munos
Abstract: We consider the problem of recovering the parameter α ∈ RK of a sparse function f (i.e. the number of non-zero entries of α is small compared to the number K of features) given noisy evaluations of f at a set of well-chosen sampling points. We introduce an additional randomization process, called Brownian sensing, based on the computation of stochastic integrals, which produces a Gaussian sensing matrix, for which good recovery properties are proven, independently on the number of sampling points N , even when the features are arbitrarily non-orthogonal. Under the assumption that f is H¨ lder continuous with exponent at least √ we proo 1/2, vide an estimate α of the parameter such that �α − α�2 = O(�η�2 / N ), where � � η is the observation noise. The method uses a set of sampling points uniformly distributed along a one-dimensional curve selected according to the features. We report numerical experiments illustrating our method. 1
6 0.9301672 288 nips-2011-Thinning Measurement Models and Questionnaire Design
7 0.76361567 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
8 0.75891989 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
9 0.7357406 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
10 0.73261732 123 nips-2011-How biased are maximum entropy models?
11 0.71833497 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
12 0.70955241 132 nips-2011-Inferring Interaction Networks using the IBP applied to microRNA Target Prediction
13 0.69569516 294 nips-2011-Unifying Framework for Fast Learning Rate of Non-Sparse Multiple Kernel Learning
14 0.69565821 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
15 0.6950925 24 nips-2011-Active learning of neural response functions with Gaussian processes
16 0.69134897 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
17 0.69058144 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs
18 0.68573993 306 nips-2011-t-divergence Based Approximate Inference
19 0.68225145 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
20 0.6802817 152 nips-2011-Learning in Hilbert vs. Banach Spaces: A Measure Embedding Viewpoint