nips nips2011 nips2011-265 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martin Slawski, Matthias Hein
Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Sparse recovery by thresholded non-negative least squares Martin Slawski and Matthias Hein Department of Computer Science Saarland University Campus E 1. [sent-1, score-0.46]
2 de Abstract Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. [sent-4, score-0.196]
3 Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. [sent-7, score-0.231]
4 1 Introduction Consider the linear regression model y = Xβ ∗ + ε, (1) where y is a vector of observations, X ∈ Rn×p a design matrix, ε a vector of noise and β ∗ a vector of coefficients to be estimated. [sent-9, score-0.059]
5 p = O(n) or even p n, in which case one cannot hope to recover the target β ∗ if it does not satisfy one of various kinds of sparsity constraints, the simplest being that ∗ β ∗ is supported on S = {j : βj = 0}, |S| = s < n. [sent-12, score-0.059]
6 Non-negativity constraints emerge in numerous deconvolution and unmixing problems in diverse fields such as acoustics [1], astronomical imaging [2], computer vision [3], genomics [4], proteomics [5] and spectroscopy [6]; see [7] for a survey. [sent-19, score-0.145]
7 Sparse recovery of non-negative signals in a noiseless setting (ε = 0) has been studied in a series of recent papers [8, 9, 10, 11]. [sent-20, score-0.132]
8 One important finding of this body of work is that non-negativity constraints alone may suffice for sparse recovery, without the need to employ sparsity-promoting 1 -regularization as usually. [sent-21, score-0.09]
9 More specifically, we study non-negative least squares (NNLS) 1 2 min y − Xβ 2 (2) β 0 n with minimizer β and its counterpart after hard thresholding β(λ), βj (λ) = βj , 0, βj > λ, otherwise, j = 1, . [sent-23, score-0.441]
10 On the other hand, the rather restrictive ’irrepresentable condition’ on the design is essentially necessary in order to infer the support S from the sparsity pattern of the lasso [15, 16]. [sent-28, score-0.323]
11 [17, 18, 19], suggest to apply hard thresholding to the lasso solution to achieve support recovery. [sent-34, score-0.333]
12 In light of this, thresholding a non-negative least squares solution, provided it is close to the target w. [sent-35, score-0.266]
13 the ∞ -norm, is more attractive for at least two reasons: first, there is no need to carefully tune the amount of 1 -regularization prior to thresholding; second, one may hope to detect relatively small non-zero coefficients whose recovery is negatively affected by the bias of 1 -regularization. [sent-38, score-0.173]
14 We first prove a bound on the mean square prediction error of the NNLS estimator, demonstrating that it may be resistant to overfitting. [sent-40, score-0.05]
15 Section 3 contains our main results on sparse recovery with noise. [sent-41, score-0.181]
16 Experiments providing strong support of our theoretical findings are presented in Section 4. [sent-42, score-0.072]
17 For a matrix A ∈ Rn×m , AJ denotes the matrix one obtains by extracting the columns corresponding to J. [sent-46, score-0.134]
18 The matrix AJK is the sub-matrix of A by extracting rows in J and columns in K. [sent-51, score-0.067]
19 The matrix X is assumed to be non-random and scaled s. [sent-59, score-0.065]
20 zero-mean sub-Gaussian entries with parameter σ > 0, cf. [sent-65, score-0.05]
21 2 Prediction error and uniqueness of the solution In the following, the quantity of interest is the mean squared prediction error (MSE) 1 n Xβ ∗ −X β 2 2. [sent-67, score-0.063]
22 It is well-known that the MSE of ordinary least squares (OLS) as well as that of ridge regression in general does not vanish unless p/n → 0. [sent-69, score-0.235]
23 To make this clear, let a design matrix X be given and set X = [X − X] by concatenating X and −X columnwise. [sent-72, score-0.097]
24 The non-negativity constraint is then vacuous in the sense that X β = X β ols , where β ols is any OLS solution. [sent-73, score-0.15]
25 However, non-negativity constraints on β can be strong when coupled with the following 1 condition imposed on the Gram matrix Σ = n X X. [sent-74, score-0.103]
26 We call a design self-regularizing with universal constant κ ∈ (0, 1] if β Σβ ≥ κ(1 β)2 ∀β 0. [sent-76, score-0.059]
27 (4) The term ’self-regularizing’ refers to the fact that the quadratic form in Σ restricted to the nonnegative orthant acts like a regularizer arising from the design itself. [sent-77, score-0.117]
28 all entries of the Gram matrix are at least κ0 , then (4) holds with κ = κ0 . [sent-80, score-0.129]
29 , BB such that min1≤b≤B n XBb XBb κ0 , then B min β Σβ ≥ β 0 b=1 1 βBb XBb XBb βBb ≥ κ0 n B (1 βBb )2 ≥ b=1 κ0 (1 β)2 . [sent-87, score-0.088]
30 Then, with probability no less than 1 - 2/p, the NNLS estimator obeys 1 Xβ ∗ − X β n 2 2 ≤ 8σ κ 2 log p ∗ β n 1 + 8σ 2 log p . [sent-92, score-0.128]
31 It is important to note that exact sparsity of β ∗ is not needed for Theorem 1 to hold. [sent-94, score-0.059]
32 The rate is the same as for the lasso if no further assumptions on the design are made, a result that is essentially obtained in the pioneering work [20]. [sent-95, score-0.216]
33 φ1 φ15 y' y w B1 B2 B3 B4 B5 Figure 2: A polyhedral cone in R3 and its intersection with the simplex (right). [sent-96, score-0.087]
34 The point y is contained in a face (bold) with normal vector w, whereas y is not. [sent-97, score-0.086]
35 Denote by C = XRp the polyhedral cone generated + by the columns {Xj }p of X, which are henceforth assumed to be in general position in Rn . [sent-102, score-0.155]
36 As j=1 visualized in Figure 2, sparse recovery by non-negativity constraints can be analyzed by studying the |F | face lattice of C [9, 10, 11]. [sent-103, score-0.285]
37 , p}, we say that XF R+ is a face of C if there exists a separating hyperplane with normal vector w passing through the origin such that Xj , w > 0, j ∈ / F , Xj , w = 0, j ∈ F . [sent-107, score-0.205]
38 Sparse recovery in a noiseless setting (ε = 0) can then be characterized concisely by the following statement which can essentially be found in prior work [9, 10, 11, 21]. [sent-108, score-0.132]
39 If XS Rs is a face of C and the + columns of X are in general position in Rn , then the constrained linear system Xβ = y sb. [sent-111, score-0.127]
40 By definition, since XS Rs is a face of C, there exists a w ∈ Rn s. [sent-115, score-0.087]
41 Given Theorem 1 and Proposition 1, we turn to uniqueness in the noisy case. [sent-124, score-0.088]
42 Using general position of the columns of X, Proposition 1 implies that β is unique. [sent-131, score-0.064]
43 2 3 3 Sparse recovery in the presence of noise Proposition 1 states that support recovery requires XS Rs to be a face of XRp , which is equivalent + + to the existence of a hyperplane separating XS Rs from the rest of C. [sent-134, score-0.493]
44 For the noisy case, mere + separation is not enough − a quantification is needed, which is provided by the following two incoherence constants that are of central importance for our main result. [sent-135, score-0.108]
45 Both are specific to NNLS and have not been used previously in the literature on sparse recovery. [sent-136, score-0.049]
46 , p}, the separating hyperplane constant is defined as τ (S) = max τ τ,w 1 1 sb. [sent-141, score-0.118]
47 √ XS w = 0, √ XS c w τ 1, w n n 1 duality √ XS θ − XS c λ 2 , = min n θ∈Rs , λ∈T p−s−1 2 ≤ 1, (6) (7) where T m−1 = {v ∈ Rm : v 0, 1 v = 1} denotes the simplex in Rm , i. [sent-143, score-0.111]
48 We denote by ΠS and Π⊥ the orthogonal projections on the subspace spanned by {Xj }j∈S and its S orthogonal complement, respectively, and set Z = Π⊥ XS c . [sent-146, score-0.115]
49 One can equivalently express (7) as S 1 2 τ (S) = min λ Z Zλ. [sent-147, score-0.088]
50 , p} and Z = Π⊥ XS c , ω(S) is defined as S ω(S) = min min ∅=F ⊆{1,. [sent-154, score-0.176]
51 (9) In the supplement, we show that i) ω(S) > 0 ⇔ τ (S) > 0 ⇔ XS Rs is a face of C, and ii) + 1 ω(S) ≤ 1, with equality if {Xj }j∈S and {Xj }j∈S c are orthogonal and n XS c XS c is entry-wise non1 negative. [sent-158, score-0.109]
52 Denoting the entries of Σ = n X X by σjk , 1 ≤ j, k ≤ p, our main result additionally involves the constants ∗ µ(S) = maxj∈S maxk∈S c |σjk |, µ+ (S) = maxj∈S k∈S c |σjk |, βmin (S) = minj∈S βj , −1 K(S) = maxv: v ∞ =1 ΣSS v ∞ , φmin (S) = minv: v 2 =1 ΣSS v 2 . [sent-159, score-0.091]
53 Consider the thresholded NNLS estimator β(λ) defined in (3) with support S(λ). [sent-161, score-0.27]
54 (i) If λ > 2 log p n 2σ τ 2 (S) b and βmin (S) > λ, λ = λ(1 + K(S)µ(S)) + (ii) or if λ > 2 log p n 2σ ω (S) b ∞ 2 log p , n and βmin (S) > λ, λ = λ(1 + K(S)µ+ (S)) + then β(λ) − β ∗ 2σ {φmin (S)}1/2 2σ {φmin (S)}1/2 2 log p , n ≤ λ and S(λ) = S with probability no less than 1 − 10/p. [sent-162, score-0.192]
55 The concept of a separating functional as in (6) is also used to show support recovery for the lasso [15, 16] as well as for orthogonal matching pursuit [22, 23]. [sent-164, score-0.487]
56 β is a minimizer of (2) if and only if there exists F ⊆ {1, . [sent-169, score-0.07]
57 n j The next lemma is crucial, since it permits us to decouple βS from βS c . [sent-173, score-0.064]
58 Consider the two non-negative least squares problems (P 1) : min β (P 1) 0 1 ⊥ Π (ε − XS c β (P 1) ) n S 2 2 (P 2) : min β (P 2) 0 1 ΠS y − XS β (P 2) − ΠS XS c β (P 1) n 2 2 with minimizers β (P 1) of (P 1) and β (P 2) of (P 2), respectively. [sent-175, score-0.314]
59 If β (P 2) 0, then setting βS = (P 2) (P 1) c = β β and βS yields a minimizer β of the non-negative least squares problem (2). [sent-176, score-0.184]
60 can be lower bounded via 1 ξ − Z β (P 1) n 2 2 (β (P 1) ) 1 ξ n ≤ 2 2 ⇒ (β (P 1) ) 1 Z Z β (P 1) ≥ n 1 Z Zλ n min λ p−s−1 λ∈T β (P 1) 2 1 = τ 2 (S) β (P 1) 2 . [sent-188, score-0.088]
61 Using the closed form expression for the ordinary least squares estimator, one obtains 1 1 ∗ ∗ ¯ β (P 2) = Σ−1 XS (XS βS + ΠS ε − ΠS XS c β (P 1) ) = βS + Σ−1 XS ε − Σ−1 ΣSS c β (P 1) . [sent-201, score-0.235]
62 It follows that the two events {M ≤ 2σ 2 log p } and {M ≤ {φmin2σ 1/2 2 log p } both hold with probability n n (S)} no less than 1 − 10/p, cf. [sent-207, score-0.096]
63 Subsequent thresholding with the respective choices made for λ yields the assertion. [sent-211, score-0.128]
64 2 In the sequel, we apply Theorem 2 to specific classes of designs commonly studied in the literature, for which thresholded NNLS achieves an ∞ -error of the optimal order O( log(p)/n). [sent-212, score-0.315]
65 Let the entries of the Gram matrix Σ be given by σjk = ρ|j−k| , 1 ≤ j, k ≤ p, 0 ≤ ρ < 1, so that the {Xj }p form a Markov random field in which Xj is conditionally j=1 independent of {Xk }k∈{j−1,j,j+1} given {Xj−1 , Xj+1 }, cf. [sent-215, score-0.088]
66 The conditional independence / structure implies that all entries of Z Z are non-negative, such that, using the definition of ω(S), ω(S) ≥ min 1 Zj =1 n ∞ min 1≤j≤p−s v 0, v Zv = min 1≤j≤(p−s) 1 1 (Z Z)jj + n n min{(Z Z)jk , 0}, k=j 2 2ρ 1 the sum on the r. [sent-217, score-0.314]
67 For the remaining constants in (10), one can show that ΣSS is a band matrix of bandwidth no more than 3 for all choices of S such that φmin (S) and K(S) are uniformly lower and upper bounded, respectively, by constants depending on ρ only. [sent-221, score-0.12]
68 For any S, one computes that the matrix n Z Z is of the same regular structure with diagonal entries all equal to 1 − δ and off-diagonal entries all equal to ρ − δ, where δ = ρ2 s/(1 + (s − 1)ρ). [sent-226, score-0.138]
69 Therefore, using (8), the separating hyperplane constant (7) can be computed in closed form: τ 2 (S) = (1 − ρ)ρ 1−ρ + = O(s−1 ). [sent-227, score-0.118]
70 2 log p as in part (ii) of Theorem 2 and combining the strong Choosing the threshold λ = ω2σ b (S) n 1 -bound (17) on the off-support coefficients with a slight modification of the bound (14) together with φmin (S) = 1 − ρ yields again the desired optimal bound of the form (15). [sent-230, score-0.1]
71 So far, the design matrix X has been assumed to be fixed. [sent-232, score-0.124]
72 As shown above, the incoherence constant τ 2 (S), which gives rise to a strong bound on βS c 1 , scales favourably and can be computed in closed form. [sent-244, score-0.066]
73 For random designs from Ens+ , one additionally has to take into account the deviation between Σ and Σ∗ . [sent-245, score-0.125]
74 Then there exists constants c, c1 , c2 , c3 , C, C > 0 such that for all n ≥ C log(p)s2 , τ 2 (S) ≥ cs−1 − C log(p)/n with probability no less than 1 − 3/p − exp(−c1 n) − 2 exp(−c2 log p) − exp(−c3 log1/2 (p)s). [sent-255, score-0.113]
75 from a Gaussian distribution whose covariance matrix has the power decay structure of Example 1 with parameter ρ = 0. [sent-265, score-0.181]
76 the population Gram matrix Σ∗ has equicorrelation structure with ρ = 3/4. [sent-269, score-0.088]
77 In the first part, the parameter b is kept fixed while the aspect ratio p/n of X and the fraction of sparsity s/n vary. [sent-275, score-0.059]
78 The grid used for b is chosen specific to the designs, calibrated such that the sparse recovery problems are sufficiently challenging. [sent-285, score-0.181]
79 For the design from Ens+ , p/n ∈ {2, 3, 5, 10}, whereas for power decay p/n ∈ {1. [sent-286, score-0.202]
80 Across these runs, we compare the probability of ’success’ of thresholded NNLS (tNNLS), non-negative lasso (NN 1 ), thresholded non-negative lasso (tNN 1 ) and orthogonal matching pursuit (OMP, [22, 23]). [sent-292, score-0.78]
81 For OMP, we check whether the support S is recovered in the first s steps. [sent-301, score-0.073]
82 Note that, when comparing tNNLS and tNN 1 , the lasso is given an advantage, since we optimize over a range of solutions. [sent-302, score-0.157]
83 6 b Figure 3: Comparison of thresholded NNLS (red) and thresholded non-negative lasso (blue) for the experiments with constant s/n, while b (abscissa) and p/n (symbols) vary. [sent-342, score-0.537]
84 3 s/n s/n power decay, w/o thresholding power decay, non−negative lasso vs. [sent-375, score-0.385]
85 3 s/n Figure 4: Top: Comparison of thresholded NNLS (red) and the thresholded non-negative lasso (blue) for the experiments with constant b, while s/n (abscissa) and p/n (symbols) vary. [sent-407, score-0.537]
86 Bottom left: Non-negative lasso without thresholding (blue) and orthogonal matching pursuit (magenta). [sent-408, score-0.371]
87 Bottom right: Thresholded non-negative lasso (blue) and thresholded ordinary lasso (green). [sent-409, score-0.572]
88 15 for power decay as displayed in the bottom left panel of Figure 4. [sent-412, score-0.169]
89 This is in accordance with the literature where thresholding is proposed as remedy [17, 18, 19]. [sent-414, score-0.128]
90 Yet, for a wide range of configurations, tNNLS visibly outperforms tNN 1 , a notable exception being power decay with larger values for p/n. [sent-415, score-0.143]
91 This is in contrast to the design from Ens+ , where even p/n = 10 can be handled. [sent-416, score-0.059]
92 To deal with higher levels of sparsity, thresholding seems to be inevitable. [sent-419, score-0.128]
93 Thresholding the biased solution obtained by 1 -regularization requires a proper choice of the regularization parameter and is likely to be inferior to thresholded NNLS with regard to the detection of small signals. [sent-420, score-0.216]
94 The experimental results provide strong support for the central message of the paper: even in high-dimensional, noisy settings, non-negativity constraints can be unexpectedly powerful when interacting with ’self-regularizing ’properties of the design. [sent-421, score-0.138]
95 A split Bregman method for non-negative sparsity penalized least squares with applications to hyperspectral demixing. [sent-442, score-0.197]
96 In NIPS workshop on practical applications of sparse modelling, 2010. [sent-453, score-0.049]
97 On the uniqueness of nonnegative sparse solutions to underdetermined systems of equations. [sent-470, score-0.219]
98 A unique nonnegative solution to an undetermined system: from vectors to matrices. [sent-486, score-0.058]
99 Sharp thresholds for noisy and high-dimensional recovery of sparsity using 1 constrained quadratic programming (Lasso). [sent-508, score-0.216]
100 Sparse nonnegative solution of underdetermined linear equations by linear programming. [sent-531, score-0.107]
wordName wordTfidf (topN-words)
[('nnls', 0.521), ('xs', 0.43), ('ss', 0.237), ('thresholded', 0.19), ('ens', 0.169), ('lasso', 0.157), ('recovery', 0.132), ('thresholding', 0.128), ('designs', 0.125), ('zf', 0.124), ('xj', 0.12), ('gram', 0.111), ('tnn', 0.099), ('tnnls', 0.099), ('xbb', 0.099), ('squares', 0.097), ('decay', 0.093), ('rs', 0.089), ('min', 0.088), ('ols', 0.075), ('xrp', 0.074), ('jk', 0.074), ('ordinary', 0.068), ('bb', 0.068), ('separating', 0.064), ('uniqueness', 0.063), ('face', 0.063), ('deconvolution', 0.06), ('success', 0.06), ('design', 0.059), ('sparsity', 0.059), ('nonnegative', 0.058), ('nn', 0.055), ('maxj', 0.055), ('omp', 0.054), ('hyperplane', 0.054), ('power', 0.05), ('mse', 0.05), ('entries', 0.05), ('equicorrelation', 0.05), ('resistant', 0.05), ('slawski', 0.05), ('minj', 0.049), ('underdetermined', 0.049), ('sparse', 0.049), ('ii', 0.048), ('log', 0.048), ('support', 0.048), ('jj', 0.047), ('minimizer', 0.046), ('orthogonal', 0.046), ('symbols', 0.045), ('relegated', 0.044), ('astronomical', 0.044), ('irrepresentable', 0.044), ('abscissa', 0.044), ('incoherence', 0.042), ('constraints', 0.041), ('counterpart', 0.041), ('constants', 0.041), ('least', 0.041), ('pursuit', 0.04), ('zj', 0.04), ('xij', 0.039), ('matrix', 0.038), ('polyhedral', 0.038), ('kkt', 0.038), ('donoho', 0.038), ('theorem', 0.036), ('lemma', 0.036), ('position', 0.035), ('johnstone', 0.034), ('controlling', 0.034), ('estimator', 0.032), ('proposition', 0.031), ('columns', 0.029), ('obtains', 0.029), ('vanish', 0.029), ('threshold', 0.028), ('moderate', 0.028), ('permits', 0.028), ('rn', 0.028), ('assumed', 0.027), ('regard', 0.026), ('cone', 0.026), ('bottom', 0.026), ('cients', 0.026), ('noisy', 0.025), ('aj', 0.025), ('check', 0.025), ('coef', 0.025), ('exists', 0.024), ('strong', 0.024), ('blue', 0.024), ('uj', 0.024), ('annals', 0.023), ('spanned', 0.023), ('simplex', 0.023), ('contained', 0.023), ('sign', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 265 nips-2011-Sparse recovery by thresholded non-negative least squares
Author: Martin Slawski, Matthias Hein
Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1
2 0.20975918 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
Author: Le Song, Eric P. Xing, Ankur P. Parikh
Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1
3 0.17065649 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen
Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1
4 0.16669071 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1
5 0.15690976 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
Author: Jiarong Jiang, Piyush Rai, Hal Daume
Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1
6 0.11497479 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
7 0.099430293 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
8 0.096126862 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
9 0.091440171 259 nips-2011-Sparse Estimation with Structured Dictionaries
10 0.087556496 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
11 0.084884144 209 nips-2011-Orthogonal Matching Pursuit with Replacement
12 0.082278609 189 nips-2011-Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels
13 0.074073493 276 nips-2011-Structured sparse coding via lateral inhibition
14 0.071427196 258 nips-2011-Sparse Bayesian Multi-Task Learning
15 0.070714623 12 nips-2011-A Two-Stage Weighting Framework for Multi-Source Domain Adaptation
16 0.066702552 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
17 0.066239521 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
18 0.066045523 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
19 0.066013142 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
20 0.064652756 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
topicId topicWeight
[(0, 0.192), (1, 0.011), (2, -0.055), (3, -0.157), (4, -0.083), (5, 0.039), (6, 0.003), (7, 0.004), (8, 0.107), (9, -0.06), (10, 0.029), (11, -0.016), (12, -0.105), (13, 0.164), (14, 0.023), (15, -0.051), (16, 0.121), (17, -0.134), (18, 0.021), (19, 0.009), (20, -0.037), (21, 0.052), (22, -0.033), (23, -0.041), (24, 0.042), (25, -0.032), (26, -0.018), (27, 0.078), (28, 0.062), (29, 0.073), (30, 0.082), (31, 0.042), (32, 0.1), (33, -0.086), (34, 0.091), (35, 0.106), (36, 0.056), (37, -0.02), (38, 0.095), (39, -0.088), (40, -0.002), (41, 0.046), (42, -0.115), (43, 0.064), (44, 0.003), (45, -0.011), (46, 0.065), (47, 0.148), (48, 0.035), (49, 0.079)]
simIndex simValue paperId paperTitle
same-paper 1 0.92226869 265 nips-2011-Sparse recovery by thresholded non-negative least squares
Author: Martin Slawski, Matthias Hein
Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1
2 0.69850981 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1
3 0.61864388 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables
Author: Jiarong Jiang, Piyush Rai, Hal Daume
Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1
4 0.60118467 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
Author: Nasser M. Nasrabadi, Trac D. Tran, Nam Nguyen
Abstract: This paper studies the problem of accurately recovering a sparse vector β from highly corrupted linear measurements y = Xβ + e + w where e is a sparse error vector whose nonzero entries may be unbounded and w is a bounded noise. We propose a so-called extended Lasso optimization which takes into consideration sparse prior information of both β and e . Our first result shows that the extended Lasso can faithfully recover both the regression and the corruption vectors. Our analysis is relied on a notion of extended restricted eigenvalue for the design matrix X. Our second set of results applies to a general class of Gaussian design matrix X with i.i.d rows N (0, Σ), for which we provide a surprising phenomenon: the extended Lasso can recover exact signed supports of both β and e from only Ω(k log p log n) observations, even the fraction of corruption is arbitrarily close to one. Our analysis also shows that this amount of observations required to achieve exact signed support is optimal. 1
5 0.58548158 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky
Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efficiently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1
6 0.52321988 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
7 0.51400757 209 nips-2011-Orthogonal Matching Pursuit with Replacement
8 0.48209155 73 nips-2011-Divide-and-Conquer Matrix Factorization
9 0.47049734 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
10 0.46944314 264 nips-2011-Sparse Recovery with Brownian Sensing
11 0.4667092 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
12 0.44920874 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
13 0.4470332 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
14 0.43723673 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices
15 0.41744247 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
16 0.41607317 259 nips-2011-Sparse Estimation with Structured Dictionaries
17 0.40422839 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions
18 0.40144706 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods
19 0.39130631 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning
20 0.38443908 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty
topicId topicWeight
[(0, 0.057), (4, 0.048), (20, 0.038), (26, 0.043), (31, 0.05), (33, 0.011), (43, 0.091), (45, 0.123), (48, 0.012), (57, 0.07), (65, 0.027), (74, 0.075), (83, 0.033), (98, 0.208), (99, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.80482626 265 nips-2011-Sparse recovery by thresholded non-negative least squares
Author: Martin Slawski, Matthias Hein
Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1
Author: Po-ling Loh, Martin J. Wainwright
Abstract: Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difficult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide non-asymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1
3 0.67491269 22 nips-2011-Active Ranking using Pairwise Comparisons
Author: Kevin G. Jamieson, Robert Nowak
Abstract: This paper examines the problem of ranking a collection of objects using pairwise comparisons (rankings of two objects). In general, the ranking of n objects can be identified by standard sorting methods using n log2 n pairwise comparisons. We are interested in natural situations in which relationships among the objects may allow for ranking using far fewer pairwise comparisons. Specifically, we assume that the objects can be embedded into a d-dimensional Euclidean space and that the rankings reflect their relative distances from a common reference point in Rd . We show that under this assumption the number of possible rankings grows like n2d and demonstrate an algorithm that can identify a randomly selected ranking using just slightly more than d log n adaptively selected pairwise comparisons, on average. If instead the comparisons are chosen at random, then almost all pairwise comparisons must be made in order to identify any ranking. In addition, we propose a robust, error-tolerant algorithm that only requires that the pairwise comparisons are probably correct. Experimental studies with synthetic and real datasets support the conclusions of our theoretical analysis. 1
4 0.67324185 276 nips-2011-Structured sparse coding via lateral inhibition
Author: Arthur D. Szlam, Karol Gregor, Yann L. Cun
Abstract: This work describes a conceptually simple method for structured sparse coding and dictionary design. Supposing a dictionary with K atoms, we introduce a structure as a set of penalties or interactions between every pair of atoms. We describe modifications of standard sparse coding algorithms for inference in this setting, and describe experiments showing that these algorithms are efficient. We show that interesting dictionaries can be learned for interactions that encode tree structures or locally connected structures. Finally, we show that our framework allows us to learn the values of the interactions from the data, rather than having them pre-specified. 1
5 0.6722374 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov
Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1
6 0.67058355 199 nips-2011-On fast approximate submodular minimization
7 0.66991895 258 nips-2011-Sparse Bayesian Multi-Task Learning
8 0.66979647 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection
9 0.66949826 186 nips-2011-Noise Thresholds for Spectral Clustering
10 0.66888845 109 nips-2011-Greedy Model Averaging
11 0.66769058 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
12 0.66664129 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
13 0.66640121 231 nips-2011-Randomized Algorithms for Comparison-based Search
14 0.66566491 226 nips-2011-Projection onto A Nonnegative Max-Heap
15 0.66496557 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound
16 0.66450256 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
17 0.66429478 251 nips-2011-Shaping Level Sets with Submodular Functions
18 0.66384566 29 nips-2011-Algorithms and hardness results for parallel large margin learning
19 0.6634059 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
20 0.66335452 198 nips-2011-On U-processes and clustering performance