nips nips2013 nips2013-4 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan
Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1
Reference: text
sentIndex sentText sentNum sentScore
1 A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. [sent-11, score-0.112]
2 We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. [sent-13, score-0.063]
3 , the support set S(β ∗ ) {i|βi = 0} has cardinality k < n), a mainstay algorithm for such settings is the Lasso [10]: 1 2 ˆ Lasso: β = argminβ∈Rp ||y − Xβ||2 + λ ||β||1 . [sent-21, score-0.057]
4 2n (2) For a particular choice of λ, the variable selection properties of the Lasso can be analyzed by quanˆ tifying how well the estimated support S(β) approximates the true support S(β ∗ ). [sent-22, score-0.167]
5 More careful analyses focus instead on recovering the signed support S± (β ∗ ), ∗ S± (βi ) ∗ +1 if βi > 0 ∗ −1 if βi < 0 . [sent-23, score-0.334]
6 (3) Theoretical developments during the last decade have shed light onto the support recovery properties of the Lasso and highlighted practical difficulties when the columns of X are correlated. [sent-26, score-0.235]
7 These developments have led to various conditions on X for support recovery, such as the mutual incoherence or the irrepresentable condition [1, 3, 8, 12, 13]. [sent-27, score-0.138]
8 1 In recent years, several modifications of the standard Lasso have been proposed to improve its support recovery properties [2, 7, 14, 15]. [sent-28, score-0.18]
9 (4) improves or degrades signed support recovery relative to the standard Lasso of Eq. [sent-41, score-0.456]
10 A major roadblock to a one-to-one comparison are the auxiliary penalty param¯ eters, λ, λ, which trade off the 1 penalty to the quadratic objective in both Eq. [sent-43, score-0.196]
11 A correct choice of penalty parameter is essential for signed support recovery: If it is too small, the algorithm behaves like ordinary least squares; if it is too large, the estimated support may be empty. [sent-46, score-0.509]
12 Unfortunately, in all but the simplest cases, pre-multiplying data X, y by matrices PX , Py changes the relative geometry of the 1 penalty contours to the elliptical objective contours in a nontrivial way. [sent-47, score-0.181]
13 For a fair comparison, the resulting mapping would have to capture the change of relative geometry induced by preconditioning of X, y, ¯ i. [sent-51, score-0.167]
14 In the Preconditioned Lasso literature this problem is commonly sidestepped either by resorting to asymptotic comparisons [6, 9], empirically comparing regularization paths [5], or using modelselection techniques which aim to choose reasonably “good” matching penalty parameters [6]. [sent-56, score-0.164]
15 We deem these approaches to be unsatisfactory—asymptotic and empirical analyses provide limited insight, and model selection strategies add a layer of complexity that may lead to unfair comparisons. [sent-57, score-0.076]
16 It is our view that all of these approaches place unnecessary emphasis on particular choices of penalty parameter. [sent-58, score-0.098]
17 In this paper we propose an alternative strategy that instead compares the Lasso to the Preconditioned Lasso by comparing data-dependent upper and lower penalty parameter bounds. [sent-59, score-0.138]
18 (2) is guaranteed to recover the signed support iff λl < λ < λu . [sent-61, score-0.336]
19 Consequently, if λl > λu signed support recovery is not possible. [sent-62, score-0.44]
20 The comparison of Lasso and Preconditioned Lasso on an instance X, β ∗ ¯ ¯ bounds (λ ¯ then proceeds by suitably comparing the bounds on λ and λ. [sent-65, score-0.119]
21 The advantage of this approach is that the upper and lower bounds are easy to compute, even though a general mapping between specific penalty parameters cannot be readily derived. [sent-66, score-0.137]
22 We outline our comparative framework in Section 3 and highlight some immediate consequences for [5] and [9] on general matrices X in Section 4. [sent-72, score-0.134]
23 More detailed comparisons can be made by considering a generative model for X. [sent-73, score-0.079]
24 In Section 5 we introduce such a model based on a block-wise SVD of X and then analyze [6] for specific instances of this generative model. [sent-74, score-0.068]
25 Finally, we show that in terms of signed support recovery, this generative model can be thought of as a limit point of a Gaussian 2 construction. [sent-75, score-0.375]
26 Huang and Jojic proposed Correlation Sifting [5], which, although not presented as a preconditioning algorithm, can be rewritten as one. [sent-85, score-0.136]
27 An earlier instance of the preconditioning idea was put forward by Paul et al. [sent-91, score-0.136]
28 Jia and Rohe [6] propose a preconditioning method that amounts to whitening the matrix X. [sent-99, score-0.136]
29 3 Comparative Framework In this section we propose a new comparative approach for Preconditioned Lasso algorithms which ¯ avoids choosing particular penalty parameters λ, λ. [sent-106, score-0.133]
30 We first derive upper and lower bounds for λ ¯ respectively so that signed support recovery can be guaranteed iff λ and λ satisfy the bounds. [sent-107, score-0.498]
31 1 Conditions for signed support recovery Before proceeding, we make some definitions motivated by Wainwright [12]. [sent-110, score-0.44]
32 Suppose that the support set of β ∗ is S S(β ∗ ), with |S| = k. [sent-111, score-0.057]
33 Denote by Xj column j of X and by XA the submatrix of X consisting of columns indexed by set A. [sent-119, score-0.094]
34 (9) i = ei n S n S n 1 The choice of smallest singular vectors is considered for matrices X with sharply decaying spectrum. [sent-121, score-0.064]
35 Figure 1: Empirical evaluation of the penalty parameter bounds of Lemma 1. [sent-138, score-0.151]
36 Then we ran Lasso using penalty parameters f λl in Figure (a) and f λu in Figure (b), where the factor f = 0. [sent-140, score-0.098]
37 The figures show the empirical probability of signed support recovery as a function of the factor f for both λl and λu . [sent-146, score-0.44]
38 (2), results in (for example) Wainwright [12] connect settings of λ with instances of X, β ∗ , w to certify whether or not Lasso will recover the signed support. [sent-149, score-0.289]
39 We invert these results and, for particular instances of X, β ∗ , w, derive bounds on λ so that signed support recovery is guaranteed if and only if the bounds are satisfied. [sent-150, score-0.547]
40 Then ˆ ˆ the Lasso has a unique solution β which recovers the signed support (i. [sent-154, score-0.317]
41 On the other hand, if XS XS is not invertible, then the signed support cannot in general be recovered. [sent-157, score-0.317]
42 Lemma 1 recapitulates well-worn intuitions about when the Lasso has difficulty recovering the signed support. [sent-158, score-0.291]
43 In extreme cases we might have λl > λu so that signed support recovery is impossible. [sent-162, score-0.454]
44 Figure 1 empirically validates the bounds of Lemma 1 by estimating probabilities of signed support recovery for a range of penalty parameters on synthetic Lasso problems. [sent-163, score-0.577]
45 2 Comparisons In this paper we propose to compare a preconditioning algorithm to the traditional Lasso by comparing the penalty parameter bounds produced by Lemma 1. [sent-165, score-0.33]
46 Provided the conditions of Lemma 1 hold for X, β ∗ we can ¯ ¯ u , λl on the penalty parameter λ can ¯ ¯ define updated variables µj , γi , ηj , ¯i from which the bounds λ ¯ ¯ ¯ be derived. [sent-170, score-0.222]
47 In order for our comparison to be scale-invariant, we will compare algorithms by ratios of resulting penalty parameter bounds. [sent-171, score-0.146]
48 ¯ disproportionately larger than λ ¯ u = 0, λl = 0 in which case we define λu /λl ¯ ¯ ¯ We will later encounter the special case λ ∞ to ¯ ¯ indicate that the preconditioned problem is very easy. [sent-174, score-0.293]
49 If λu /λl < 1 then signed support recovery is ¯ ¯ ¯ ¯ in general impossible. [sent-175, score-0.44]
50 4 4 General Comparisons We begin our comparisons with some immediate consequences of Lemma 1 for HJ and PBHT. [sent-179, score-0.078]
51 As we will see, both HJ and PBHT have the potential to improve signed support recovery relative to the traditional Lasso, provided the matrices PX , Py are suitably estimated. [sent-182, score-0.515]
52 We also let US be a minimal basis for the column space of the submatrix XS , and define span(US ) = x ∃c ∈ Rk s. [sent-184, score-0.084]
53 Recall from Section 2 that HJ uses PX = Py = UA UA , where UA is a column basis estimated from X. [sent-189, score-0.092]
54 If span(US ) ⊆ span(UA ), then after preconditioning using HJ the conditions continue to hold, and ¯ λu λu (12) ¯ , λl λl where the stochasticity on both sides is due to independent noise vectors w. [sent-192, score-0.261]
55 On the other hand, if XS PX PX XS is not invertible, then HJ cannot in general recover the signed support. [sent-193, score-0.26]
56 Notice that because µj and γi are un¯ ¯ ¯ changed, if the conditions of Lemma 1 hold for the original Lasso problem (i. [sent-197, score-0.071]
57 , XS XS is invertible, ∗ |µj | < 1 ∀j ∈ S c and sgn(βi )γi > 0 ∀i ∈ S), they will continue to hold for the preconditioned problem. [sent-199, score-0.37]
58 In extreme cases, XS PX PX XS is singular and so signed support recovery is not in general possible. [sent-205, score-0.473]
59 Recall from Section 2 that PBHT uses PX = In×n , Py = UA UA , where UA is a column basis estimated from X. [sent-207, score-0.092]
60 If span(US ) ⊆ span(UA ), after preconditioning using PBHT the conditions continue to hold, and ¯ λu λu (16) ¯ , λl λl where the stochasticity on both sides is due to independent noise vectors w. [sent-211, score-0.261]
61 On the other hand, if span(US c ) = span(UA ), then PBHT cannot recover the signed support. [sent-212, score-0.26]
62 ’s of penalty parameter bounds ratios estimated from 1000 variable selection problems. [sent-238, score-0.238]
63 Because µj and γi are again unchanged, ¯ ¯ ¯ the conditions of Lemma 1 will continue to hold for the preconditioned problem if they hold for the original Lasso problem. [sent-255, score-0.441]
64 Because P(λl ≥ 0) = 1, signed support recovery is not possible. [sent-261, score-0.44]
65 Our theoretical analyses show that both HJ and PBHT can indeed lead to improved signed support recovery relative to the Lasso on finite datasets. [sent-263, score-0.473]
66 ’s for penalty parameter bounds ratios of Lasso and Preconditioned Lasso for various subspaces UA . [sent-267, score-0.204]
67 Further comparison of HJ and PBHT must thus analyze how the subspaces span(UA ) are estimated in the context of the assumptions made in [5] and [9]. [sent-270, score-0.056]
68 Both HJ and PBHT were proposed with the implicit goal of finding a basis UA that has the same span as US . [sent-272, score-0.221]
69 Theorems 1 and 2 suggest that underestimating k can be more detrimental to signed support recovery than overestimating it. [sent-274, score-0.462]
70 In this section we briefly present this model and will show the resulting penalty parameter bounds for JR. [sent-280, score-0.151]
71 1 Generative model for X As discussed in Section 2, many preconditioning algorithms can be phrased as truncating or reweighting column subspaces associated with X [5, 6, 9]. [sent-282, score-0.194]
72 Furthermore, we let U, VS , VS c be orthonormal bases of dimension n × n, k × k and p − k × p − k respectively. [sent-286, score-0.057]
73 Then we let the Lasso problem be y = Xβ ∗ + w with X = U Σ S V S , ΣS c V S c w ∼ N (0, σ 2 In×n ), (19) To ensure that the column norms of X are controlled, we compute the spectra ΣS , ΣS c by normalˆ ˆ izing spectra ΣS and ΣS c with arbitrary positive elements on the diagonal. [sent-292, score-0.099]
74 (20) We verify in the supplementary material that with these assumptions the squared column norms of X are in expectation n (provided the orthonormal bases are chosen uniformly at random). [sent-294, score-0.128]
75 Note that any matrix X can be decomposed using a block-wise SVD as X = [XS , XS c ] = U ΣS VS , T ΣS c VS c , (21) with orthonormal bases U, T, VS , VS c . [sent-296, score-0.057]
76 In previous sections we used US to denote a basis for the column space of XS . [sent-305, score-0.069]
77 We will continue to use this notation, and let US contain the first k columns of U . [sent-306, score-0.061]
78 We let the diagonal elements of ΣS , ΣS , ΣS c , ΣS c ˆ be identified by their column indices. [sent-308, score-0.066]
79 That is, the diagonal entries σS,c of ΣS and σS,c of ΣS ˆ ˆ S c are indexed are indexed by c ∈ {1, . [sent-309, score-0.076]
80 Each of the diagonal entries in ΣS , ΣS c is associated with a column of U . [sent-316, score-0.085]
81 Then the conditions of Lemma 1 hold before and after preconditioning using JR. [sent-332, score-0.207]
82 λ (22) In other words, JR deterministically scales the ratio of penalty parameter bounds. [sent-334, score-0.146]
83 (19) we find that the SVD becomes X = U DV , where U is the same column basis as in Eq. [sent-340, score-0.069]
84 Substituting this into the definitions of µj , γi , ηj , ¯i , we have that after preconditioning using JR ¯ ¯ ¯ n(p − k)κ2 µj = µj ¯ γi = n + 2 ¯ γi (23) kκ + n − k (kκ2 + n − k) ηj = ¯ ηj ¯i = i . [sent-342, score-0.136]
85 (24) n(p − k) Thus, if the conditions of Lemma 1 hold for X, β ∗ , they will continue to hold after preconditioning using JR. [sent-343, score-0.284]
86 (19) uses an orthonormal matrix U as the column basis of X. [sent-350, score-0.096]
87 However, as we show in the supplementary material, one can construct Lasso problems using a Gaussian basis W m which lead to penalty parameter bounds ratios that converge in distribution to those of the Lasso problem in Eq. [sent-352, score-0.233]
88 (19), and one according to 1 m y m = X m β ∗ + wm with X m = √ W m ΣS VS , ΣS c VS c wm ∼ N 0, σ 2 Im×m , (25) n n where W m is an m × n standard Gaussian ensemble. [sent-355, score-0.06]
89 The increased variance is necessary because the matrix W m has expected column length m, while columns in U are of length 1. [sent-363, score-0.064]
90 Let the penalty parameter bounds ratio induced by the problem in Eq. [sent-365, score-0.185]
91 If the conditions of Lemma 1 hold for X, β ∗ , then for m large enough they will hold for X m , β ∗ . [sent-371, score-0.112]
92 Furthermore, as m → ∞ λm d λu u → , (26) λm λl l where the stochasticity on the left is due to W m , wm and on the right is due to w. [sent-372, score-0.06]
93 Thus, with respect to the bounds ratio λu /λl , the construction of Eq. [sent-373, score-0.077]
94 Indeed, as the experiment in Figure 2(b) shows, Theorem 3 can be used to predict the scaling factor for penalty parameter bounds ¯ ¯ ratios (i. [sent-378, score-0.185]
95 6 Conclusions This paper proposes a new framework for comparing Preconditioned Lasso algorithms to the standard Lasso which skirts the difficulty of choosing penalty parameters. [sent-381, score-0.141]
96 By eliminating this parameter from consideration, finite data comparisons can be greatly simplified, avoiding the use of model selection strategies. [sent-382, score-0.084]
97 To demonstrate the framework’s usefulness, we applied it to a number of Preconditioned Lasso algorithms and in the process confirmed intuitions and revealed fragilities and mitigation strategies. [sent-383, score-0.06]
98 Additionally, we presented an SVD-based generative model for Lasso problems that can be thought of as the limit point of a less restrictive Gaussian model. [sent-384, score-0.058]
99 Stable recovery of sparse overcomplete representations in the presence of noise. [sent-392, score-0.123]
100 Sharp thresholds for high-dimensional and noisy sparsity recovery using 1 -constrained quadratic programming (Lasso). [sent-458, score-0.123]
wordName wordTfidf (topN-words)
[('ua', 0.562), ('lasso', 0.357), ('preconditioned', 0.293), ('px', 0.269), ('signed', 0.26), ('xs', 0.231), ('py', 0.203), ('span', 0.191), ('pbht', 0.187), ('preconditioning', 0.136), ('hj', 0.129), ('recovery', 0.123), ('penalty', 0.098), ('jr', 0.094), ('vs', 0.08), ('jojic', 0.059), ('support', 0.057), ('lemma', 0.055), ('rohe', 0.055), ('jia', 0.051), ('sgn', 0.045), ('hold', 0.041), ('comparisons', 0.04), ('column', 0.039), ('generative', 0.039), ('bounds', 0.039), ('irrepresentable', 0.038), ('consequences', 0.038), ('continue', 0.036), ('paul', 0.035), ('svd', 0.035), ('comparative', 0.035), ('ratios', 0.034), ('dv', 0.032), ('invertible', 0.032), ('intuitions', 0.031), ('basis', 0.03), ('theorems', 0.03), ('spectra', 0.03), ('stochasticity', 0.03), ('wm', 0.03), ('bases', 0.03), ('selection', 0.03), ('conditions', 0.03), ('deem', 0.029), ('fragilities', 0.029), ('instances', 0.029), ('orthonormal', 0.027), ('matrices', 0.027), ('diagonal', 0.027), ('comparing', 0.026), ('misaligned', 0.025), ('huang', 0.025), ('columns', 0.025), ('xj', 0.025), ('us', 0.024), ('estimated', 0.023), ('overestimating', 0.022), ('plugging', 0.021), ('contours', 0.02), ('ratio', 0.019), ('thought', 0.019), ('construction', 0.019), ('iff', 0.019), ('subspaces', 0.019), ('entries', 0.019), ('dd', 0.019), ('singular', 0.019), ('sharply', 0.018), ('supplementary', 0.018), ('gaussian', 0.017), ('highlight', 0.017), ('framework', 0.017), ('meinshausen', 0.017), ('met', 0.017), ('analyses', 0.017), ('highlighted', 0.017), ('traditional', 0.017), ('relative', 0.016), ('dim', 0.016), ('xa', 0.016), ('indexed', 0.015), ('oracle', 0.015), ('deterministically', 0.015), ('submatrix', 0.015), ('suitably', 0.015), ('noise', 0.015), ('derivations', 0.015), ('induced', 0.015), ('suppose', 0.014), ('differ', 0.014), ('extreme', 0.014), ('sides', 0.014), ('theorem', 0.014), ('parameter', 0.014), ('assumptions', 0.014), ('developments', 0.013), ('piecewise', 0.013), ('curve', 0.013), ('nitions', 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan
Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1
2 0.19582835 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
3 0.18645141 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
Author: Adel Javanmard, Andrea Montanari
Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1
4 0.17476028 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor
Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
5 0.14963304 9 nips-2013-A Kernel Test for Three-Variable Interactions
Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
6 0.12545508 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
7 0.1135815 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
8 0.099571869 269 nips-2013-Regression-tree Tuning in a Streaming Setting
9 0.098360583 217 nips-2013-On Poisson Graphical Models
10 0.094886228 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
11 0.092932239 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
12 0.085500427 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
13 0.080871306 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
14 0.073830858 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
15 0.072862633 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
16 0.069379114 88 nips-2013-Designed Measurements for Vector Count Data
17 0.065550379 5 nips-2013-A Deep Architecture for Matching Short Texts
18 0.061340827 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
19 0.059912361 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
20 0.053871386 188 nips-2013-Memory Limited, Streaming PCA
topicId topicWeight
[(0, 0.124), (1, 0.061), (2, 0.069), (3, 0.073), (4, -0.022), (5, 0.066), (6, -0.021), (7, 0.027), (8, -0.157), (9, -0.026), (10, 0.114), (11, -0.052), (12, -0.12), (13, -0.208), (14, -0.122), (15, -0.07), (16, 0.151), (17, -0.112), (18, -0.052), (19, -0.03), (20, 0.004), (21, 0.11), (22, -0.035), (23, 0.088), (24, 0.047), (25, 0.053), (26, -0.027), (27, 0.03), (28, -0.038), (29, 0.033), (30, 0.088), (31, -0.057), (32, 0.111), (33, -0.005), (34, -0.033), (35, 0.09), (36, 0.099), (37, 0.047), (38, -0.165), (39, -0.079), (40, -0.075), (41, -0.06), (42, 0.054), (43, -0.044), (44, 0.077), (45, -0.098), (46, -0.005), (47, -0.061), (48, -0.047), (49, -0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.96475869 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan
Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1
2 0.7405228 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor
Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
3 0.72793525 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
Author: Adel Javanmard, Andrea Montanari
Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1
4 0.64215666 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
Author: Divyanshu Vats, Richard Baraniuk
Abstract: We consider the problem of accurately estimating a high-dimensional sparse vector using a small number of linear measurements that are contaminated by noise. It is well known that standard computationally tractable sparse recovery algorithms, such as the Lasso, OMP, and their various extensions, perform poorly when the measurement matrix contains highly correlated columns. We develop a simple greedy algorithm, called SWAP, that iteratively swaps variables until a desired loss function cannot be decreased any further. SWAP is surprisingly effective in handling measurement matrices with high correlations. We prove that SWAP can easily be used as a wrapper around standard sparse recovery algorithms for improved performance. We theoretically quantify the statistical guarantees of SWAP and complement our analysis with numerical results on synthetic and real data.
5 0.6378817 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
6 0.62934798 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
7 0.59302074 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
8 0.54554385 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
9 0.51879138 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
10 0.46815541 217 nips-2013-On Poisson Graphical Models
11 0.43005598 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
12 0.39378867 147 nips-2013-Lasso Screening Rules via Dual Polytope Projection
13 0.36826435 88 nips-2013-Designed Measurements for Vector Count Data
14 0.3634873 9 nips-2013-A Kernel Test for Three-Variable Interactions
15 0.33706564 249 nips-2013-Polar Operators for Structured Sparse Estimation
16 0.32308927 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
17 0.32282451 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
18 0.32256499 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
19 0.30904782 67 nips-2013-Conditional Random Fields via Univariate Exponential Families
20 0.29682916 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
topicId topicWeight
[(16, 0.042), (33, 0.171), (34, 0.134), (41, 0.021), (49, 0.027), (56, 0.088), (70, 0.02), (85, 0.044), (86, 0.261), (89, 0.04), (93, 0.024), (95, 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.82533175 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan
Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difficult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1
2 0.80683064 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
Author: Remi Gribonval, Pierre Machart
Abstract: There are two major routes to address linear inverse problems. Whereas regularization-based approaches build estimators as solutions of penalized regression optimization problems, Bayesian estimators rely on the posterior distribution of the unknown, given some assumed family of priors. While these may seem radically different approaches, recent results have shown that, in the context of additive white Gaussian denoising, the Bayesian conditional mean estimator is always the solution of a penalized regression problem. The contribution of this paper is twofold. First, we extend the additive white Gaussian denoising results to general linear inverse problems with colored Gaussian noise. Second, we characterize conditions under which the penalty function associated to the conditional mean estimator can satisfy certain popular properties such as convexity, separability, and smoothness. This sheds light on some tradeoff between computational efficiency and estimation accuracy in sparse regularization, and draws some connections between Bayesian estimation and proximal optimization. 1
3 0.79147792 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
Author: Ben Shababo, Brooks Paige, Ari Pakman, Liam Paninski
Abstract: With the advent of modern stimulation techniques in neuroscience, the opportunity arises to map neuron to neuron connectivity. In this work, we develop a method for efficiently inferring posterior distributions over synaptic strengths in neural microcircuits. The input to our algorithm is data from experiments in which action potentials from putative presynaptic neurons can be evoked while a subthreshold recording is made from a single postsynaptic neuron. We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for significant prior information about the connectivity and neuronal cell types to be incorporated if available. Due to the technical challenges and sparsity of these systems, it is important to focus experimental time stimulating the neurons whose synaptic strength is most ambiguous, therefore we also develop an online optimal design algorithm for choosing which neurons to stimulate at each trial. 1
4 0.75830895 348 nips-2013-Variational Policy Search via Trajectory Optimization
Author: Sergey Levine, Vladlen Koltun
Abstract: In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and highdimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present a method that uses trajectory optimization as a powerful exploration strategy that guides the policy search. A variational decomposition of a maximum likelihood policy objective allows us to use standard trajectory optimization algorithms such as differential dynamic programming, interleaved with standard supervised learning for the policy itself. We demonstrate that the resulting algorithm can outperform prior methods on two challenging locomotion tasks. 1
5 0.69527429 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion
Author: Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu
Abstract: We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. This objective function, called the policy Gibbs error, is the expected error rate of a random classifier drawn from the prior distribution on the examples adaptively selected by the active learning policy. Exact maximization of the policy Gibbs error is hard, so we propose a greedy strategy that maximizes the Gibbs error at each iteration, where the Gibbs error on an instance is the expected error of a random classifier selected from the posterior label distribution on that instance. We apply this maximum Gibbs error criterion to three active learning scenarios: non-adaptive, adaptive, and batch active learning. In each scenario, we prove that the criterion achieves near-maximal policy Gibbs error when constrained to a fixed budget. For practical implementations, we provide approximations to the maximum Gibbs error criterion for Bayesian conditional random fields and transductive Naive Bayes. Our experimental results on a named entity recognition task and a text classification task show that the maximum Gibbs error criterion is an effective active learning criterion for noisy models. 1
6 0.68313944 173 nips-2013-Least Informative Dimensions
7 0.68146664 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
8 0.67738354 294 nips-2013-Similarity Component Analysis
9 0.67654246 350 nips-2013-Wavelets on Graphs via Deep Learning
10 0.67606938 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
11 0.67524648 201 nips-2013-Multi-Task Bayesian Optimization
12 0.6745643 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
13 0.6744222 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
14 0.67357838 161 nips-2013-Learning Stochastic Inverses
15 0.67347187 43 nips-2013-Auxiliary-variable Exact Hamiltonian Monte Carlo Samplers for Binary Distributions
16 0.6734001 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
17 0.67279041 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
18 0.67224157 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
19 0.67192221 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
20 0.67169279 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements