nips nips2013 nips2013-265 knowledge-graph by maker-knowledge-mining

265 nips-2013-Reconciling "priors" & "priors" without prejudice?

Source: pdf

Author: Remi Gribonval, Pierre Machart

Abstract: There are two major routes to address linear inverse problems. Whereas regularization-based approaches build estimators as solutions of penalized regression optimization problems, Bayesian estimators rely on the posterior distribution of the unknown, given some assumed family of priors. While these may seem radically different approaches, recent results have shown that, in the context of additive white Gaussian denoising, the Bayesian conditional mean estimator is always the solution of a penalized regression problem. The contribution of this paper is twofold. First, we extend the additive white Gaussian denoising results to general linear inverse problems with colored Gaussian noise. Second, we characterize conditions under which the penalty function associated to the conditional mean estimator can satisfy certain popular properties such as convexity, separability, and smoothness. This sheds light on some tradeoff between computational efﬁciency and estimation accuracy in sparse regularization, and draws some connections between Bayesian estimation and proximal optimization. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fr Abstract There are two major routes to address linear inverse problems. [sent-6, score-0.11]

2 Whereas regularization-based approaches build estimators as solutions of penalized regression optimization problems, Bayesian estimators rely on the posterior distribution of the unknown, given some assumed family of priors. [sent-7, score-0.285]

3 While these may seem radically different approaches, recent results have shown that, in the context of additive white Gaussian denoising, the Bayesian conditional mean estimator is always the solution of a penalized regression problem. [sent-8, score-0.392]

4 First, we extend the additive white Gaussian denoising results to general linear inverse problems with colored Gaussian noise. [sent-10, score-0.316]

5 Second, we characterize conditions under which the penalty function associated to the conditional mean estimator can satisfy certain popular properties such as convexity, separability, and smoothness. [sent-11, score-0.221]

6 This sheds light on some tradeoff between computational efﬁciency and estimation accuracy in sparse regularization, and draws some connections between Bayesian estimation and proximal optimization. [sent-12, score-0.067]

7 However, leveraging some prior knowledge or information, a profusion of schemes have been developed in order to provide an appropriate estimation of z. [sent-15, score-0.079]

8 1 Two families of approaches for linear inverse problems On the one hand, Bayesian approaches are based on the assumption that z and b are drawn from probability distributions PZ and PB respectively. [sent-18, score-0.11]

9 From that point, a straightforward way to estimate z is to build, for instance, the Minimum Mean Squared Estimator (MMSE), sometimes referred to as Bayesian Least Squares, conditional expectation or conditional mean estimator, and deﬁned as: ψMMSE (y) := E(Z|Y = y). [sent-19, score-0.091]

10 (1) This estimator has the nice property of being optimal (in a least squares sense) but suffers from its explicit reliance on the prior distribution, which is usually unknown in practice. [sent-20, score-0.211]

11 On the other hand, regularization-based approaches have been at the centre of a tremendous amount of work from a wide community of researchers in machine learning, signal processing, and more ∗ The authors are with the PANAMA project-team at IRISA, Rennes, France. [sent-22, score-0.093]

12 These approaches focus on building estimators (also called decoders) with no explicit reference to the prior distribution. [sent-24, score-0.185]

13 Instead, these estimators are built as an optimal trade-off between a data ﬁdelity term and a term promoting some regularity on the solution. [sent-25, score-0.149]

14 Among these, we will focus on a widely studied family of estimators ψ that write in this form: ψ(y) := argmin z∈RD 1 y − Az 2 2 + φ(z). [sent-26, score-0.232]

15 (2) For instance, the speciﬁc choice φ(z) = λ z 2 gives rise to a method often referred to as the ridge 2 regression [1] while φ(z) = λ z 1 gives rise to the famous Lasso [2]. [sent-27, score-0.166]

16 The 1 decoder associated to φ(z) = λ z 1 has attracted a particular attention, for the associated optimization problem is convex, and generalizations to other “mixed” norms are being intensively studied [3, 4]. [sent-28, score-0.074]

17 Regularization and Bayesian estimation seemingly yield radically different viewpoints on inverse problems. [sent-32, score-0.203]

18 The “regularization prior” is embodied by the penalty function φ(z) which promotes certain solutions, somehow carving an implicit signal model. [sent-34, score-0.145]

19 In the Bayesian framework, the “Bayesian prior” is embodied by where the mass of the signal distribution PZ lies. [sent-35, score-0.081]

20 The MAP quid pro quo A quid pro quo between these distinct notions of priors has crystallized around the notion of maximum a posteriori (MAP) estimation, leading to a long lasting incomprehension between two worlds. [sent-36, score-0.333]

21 In fact, a simple application of Bayes rule shows that under a Gaussian noise model b ∼ N (0, I) and Bayesian prior PZ (z ∈ E) = E pZ (z)dz, E ⊂ RN , MAP estimation1 yields the optimization problem (2) with regularization prior φZ (z) := − log pZ (z). [sent-37, score-0.283]

22 By a trivial identiﬁcation, the optimization problem (2) with regularization prior φ(z) is now routinely called “MAP with prior exp(−φ(z))”. [sent-38, score-0.194]

23 As an unfortunate consequence of an erroneous “reverse reading” of this fact, this identiﬁcation has given rise to the erroneous but common myth that the optimization approach is particularly well adapted when the unknown is distributed as exp(−φ(z)). [sent-40, score-0.188]

24 Laplacian and A ∈ Rn×D is drawn from the Gaussian ensemble, the 1 decoder – and indeed any sparse decoder – will be outperformed by the least squares decoder ψLS (y) := pinv(A)y, unless n 0. [sent-44, score-0.254]

25 Furthermore, to point out that erroneous conception, a deeper connection is dug, showing that in the more restricted context of (white) Gaussian denoising, for any prior, there exists a regularizer φ such that the MMSE estimator can be expressed as the solution of problem (2). [sent-47, score-0.286]

26 This result essentially exhibits a regularization-oriented formulation for which two radically different interpretations can be made. [sent-48, score-0.113]

27 It highlights the important following fact: the speciﬁc choice of a regularizer φ does not, alone, induce an implicit prior on the supposed distribution of the unknown; besides a prior PZ , a Bayesian estimator also involves the choice of a loss function. [sent-49, score-0.322]

28 For certain regularizers φ, there can in fact exist (at least two) different priors PZ for which the optimization problem (2) yields the optimal Bayesian estimator, associated to (at least) two different losses (e. [sent-50, score-0.187]

29 3 Main contributions A ﬁrst major contribution of that paper is to extend the aforementioned result [8] to a more general linear inverse problem setting. [sent-55, score-0.11]

30 Our ﬁrst main results can be introduced as follows: 1 which is the Bayesian optimal estimator in a 0/1 loss sense, for discrete signals. [sent-56, score-0.1]

31 Roughly, it states that for the considered inverse problem, for any prior on z, the MMSE estimate with Gaussian noise is also the solution of a regularization-based problem (the converse is not true). [sent-59, score-0.238]

32 , zn ) if, and only if, the evidence is multiplicatively separable: pY (y) = Πn pYi (yi ). [sent-63, score-0.115]

33 To do so, we review an existing result from the literature and explicit the different steps that make it possible to generalize it to the linear inverse problem setting. [sent-66, score-0.11]

34 In Section 3, we provide further results that shed some light on the connections between MMSE and regularization-oriented estimators. [sent-67, score-0.104]

35 Namely, we introduce some commonly desired properties on the regularizing function such as separability and convexity and show how they relate to the priors in the Bayesian framework. [sent-68, score-0.32]

36 1 An existing result for white Gaussian denoising As a starting point, let us recall the existing results in [8] (Lemma II. [sent-72, score-0.154]

37 2) dealing with the additive white Gaussian denoising problem, A = I, Σ = I. [sent-74, score-0.206]

38 ∀y ∈ Rn , ψI,I,PZ (y) is the unique global minimum and unique stationary point of z→ φ(z) = φI,I,PZ (z) := 1 y − Iz 2 −1 1 − 2 ψI,I,PZ (z) − z +∞, 2 + φ(z), with: 2 2 (4) −1 − log pY [ψMMSE (z)]; for z ∈ ImψI,I,PZ ; for x ∈ ImψI,I,PZ ; / 4. [sent-79, score-0.304]

39 Any penalty function φ(z) such that ψI,I,PZ (y) is a stationary point of (4) satisﬁes φ(z) = φI,I,PZ (z) + C for some constant C and all z. [sent-81, score-0.136]

40 This makes our denoising problem equivalent to yΣ = zΣ +bΣ , with yΣ := Σ−1/2 y, zΣ := Σ−1/2 z and bΣ := Σ−1/2 b, which is drawn from a Gaussian distribution with an identity covariance matrix. [sent-88, score-0.077]

41 For any non-degenerate prior PZ , any non-degenerate Σ, Y = Z + B, we have: 1. [sent-92, score-0.079]

42 ∀y ∈ Rn , ψI,Σ,PZ (y) is the unique global minimum and stationary point of z→ 1 y − Iz 2 2 Σ + φI,Σ,PZ (z). [sent-97, score-0.202]

43 As with white noise, up to an additive constant, φI,Σ,PZ is the only penalty with these properties. [sent-100, score-0.193]

44 For any function f : Rn → R and any M ∈ Rn×p , we have: M argmin f (M v) = v∈Rp argmin f (u). [sent-104, score-0.252]

45 Now, using Lemma 1 with M = Σ1/2 , we get: ψI,Σ,PZ (y) = Σ1/2 argmin z ∈Rn = argmin z∈Rn = argmin z∈Rn 1 −1/2 Σ y−z 2 2 + φI,I,PΣ−1/2 Z (z ) 1 −1/2 Σ y − Σ−1/2 z 2 + φI,I,PΣ−1/2 Z (Σ−1/2 z) 2 1 y − z 2 + φI,Σ,PZ (z) , Σ 2 with φI,Σ,PZ (z) := φI,I,PΣ−1/2 Z (Σ−1/2 z). [sent-107, score-0.378]

46 This deﬁnition also makes it clear that φI,Σ,PZ is C ∞ , and that this minimizer is unique (and is the only stationary point). [sent-108, score-0.233]

47 3 A simple under-determined problem As a step towards handling the more generic linear inverse problem y = Az + b, we will investigate the particular case where A = [I 0]. [sent-110, score-0.11]

48 First and foremost let us decompose the MMSE estimator into two parts, composed of the ﬁrst n and last (D − n) components : ψ[I 0],Σ,PZ (y) := [ψ1 (y); ψ2 (y)] Corollary 2 (simple under-determined problem). [sent-112, score-0.1]

49 For any non-degenerate prior PZ , any nondegenerate Σ, we have: 1. [sent-113, score-0.079]

50 ∀y ∈ Rn , ψ[I 0],Σ,PZ (y) is the unique global minimum and stationary point of 1 y − [I 0]z 2 + φh 0],Σ,PZ (z) Σ [I 2 −1 0],Σ,PZ (z) := φI,Σ,PZ (z1 ) + h z2 , ψ2 ◦ ψ1 (z1 ) and z = [z1 ; z2 ]. [sent-119, score-0.202]

51 Given the properties of h, we have: −1 ψ2 (y) = argmin h z2 , ψ2 ◦ ψ1 ψ1 (y) . [sent-128, score-0.126]

52 z2 ∈R(D−n) It follows that: ψ[I 0],Σ,PZ (y) = argmin z=[z1 ;z2 ]∈RD 1 y − z1 2 2 Σ −1 + φI,Σ,PZ (z1 ) + h z2 , ψ2 ◦ ψ1 (z1 ) . [sent-129, score-0.126]

53 From the deﬁnitions of ψ[I 0],Σ,PZ and h, it is clear, using Corollary 1 that ψ[I 0],Σ,PZ is injective, is the unique minimizer and stationary point, and that φ[I 0],Σ,PZ is C ∞ if and only if h is. [sent-130, score-0.233]

54 For any non-degenerate prior PZ , any non-degenerate Σ and A, we have: 1. [sent-139, score-0.079]

55 ∀y ∈ Rn , ψ[I 0],Σ,PZ (y) is the unique global minimum and stationary point of h h z → 1 y − Az 2 + φh (V z). [sent-142, score-0.202]

56 First note that: V ψA,Σ,PZ (y) = V E(Z|Y = y) = E(Z |Y = y ) = ψ[I = argmin z 1 U y − [I 0]z 2 2 ˜ Σ + φh [I ˜ 0],Σ,PZ ˜ 0],Σ,PV (z ), Z using Corollary 2. [sent-146, score-0.126]

57 Now, using Lemma 1, we have: 1 2 h y − U [I 0]V U ˜ + φ[I 0],Σ,P ˜ Σ V 2 z 1 = argmin y − Az 2 + φh 0],Σ,P (V z) ˜ Σ [I V Z 2 z ψA,Σ,PZ (y) = argmin (V z) Z The other properties come naturally from those of Corollary 2. [sent-147, score-0.252]

58 If A is invertible (hence square), ψA,Σ,PZ is one-to-one. [sent-149, score-0.097]

59 It is also C ∞ , as well as its inverse and φA,Σ,PZ . [sent-150, score-0.11]

60 3 Connections between the MMSE and regularization-based estimators Equipped with the results from the previous sections, we can now have a clearer idea about how MMSE estimators, and those produced by a regularization-based approach relate to each other. [sent-151, score-0.143]

61 1 Obvious connections Some simple observations of the main theorem can already shed some light on those connections. [sent-154, score-0.104]

62 First, for any prior, and as long as A is invertible, we have shown that there exists a corresponding regularizing term (which is unique up to an additive constant). [sent-155, score-0.149]

63 This simply means that the set of MMSE estimators in linear inverse problems with Gaussian noise is a subset of the set of estimators that can be produced by a regularization approach with a quadratic data-ﬁtting term. [sent-156, score-0.438]

64 Second, since the corresponding penalty is necessarily smooth, it is in fact only a strict subset of such regularization estimators. [sent-157, score-0.1]

65 For instance, as pinpointed by [8], all the non-C ∞ regularizers belong to that category. [sent-159, score-0.076]

66 Among them, all the sparsity-inducing regularizers (the 1 norm, among others) fall into this scope. [sent-160, score-0.076]

67 This means that when it comes to solving a linear inverse problem (with an invertible A) under Gaussian noise, sparsity inducing penalties are necessarily suboptimal (in a mean squared error sense). [sent-161, score-0.235]

68 2 Relating desired computational properties to the evidence Let us now focus on the MMSE estimators (which also can be written as regularization-based estimators). [sent-163, score-0.136]

69 An interesting question then is: can we relate these properties of the regularizer to the Bayesian priors, when interpreting the solution as an MMSE estimate? [sent-165, score-0.101]

70 For instance, when the regularizer is separable, one may easily rely on coordinate descent algorithms [9]. [sent-166, score-0.121]

71 A vector-valued function f : Rn → Rn is separable if there exists a set n of functions f1 , . [sent-168, score-0.149]

72 6 A scalar-valued function g : Rn → R is additively separable (resp. [sent-172, score-0.239]

73 multiplicatively separable) if n there exists a set of functions g1 , . [sent-173, score-0.085]

74 Even more evidently, when solving optimization problems, dealing with convex functions ensures that many algorithms will provably converge to the global minimizer [6]. [sent-179, score-0.194]

75 As a consequence, it would be interesting to be able to characterize the set of priors for which the MMSE estimate can be expressed as a minimizer of a convex function. [sent-180, score-0.345]

76 For any non-degenerate prior PZ , Theorem 1 says that 1 ∀y ∈ Rn , ψI,I,PZ (y) is the unique global minimum and stationary point of z → 2 y − Iz 2 + φI,I,PZ (z). [sent-184, score-0.281]

77 φI,I,PZ is convex if and only if pY (y) := pB PZ (y) is log-concave, 2. [sent-186, score-0.064]

78 φI,I,PZ is additively separable if and only if pY (y) is multiplicatively separable. [sent-187, score-0.324]

79 It also follows that: 2 φI,I,PZ convex ⇔ φI,I,PZ [ψI,I,PZ (y)] ⇔ −J −1 [ψI,I,PZ ](y) 2 0 log pY (y) 0 2 As J[ψI,I,PZ ](y) = In + 2 log pY (y), the matrices log pY (y), J[ψI,I,PZ ](y), and J −1 [ψI,I,PZ ](y) are simultaneously diagonalisable. [sent-195, score-0.184]

80 Now, as J[ψI,I,PZ ](y) is positive deﬁnite, we have: −J −1 [ψI,I,PZ ](y) 2 log pY (y) 0⇔ It follows that φI,I,PZ is convex if and only if pY (y) := pB 2 log pY (y) 0. [sent-197, score-0.144]

81 3) in [8], it is clear that: φI,I,PZ is additively separable ⇔ ψI,I,PZ is separable. [sent-200, score-0.239]

82 2) in [8], we have: ψI,I,PZ is separable ⇔ log pY is separable ⇔ log pY is additively separable ⇔ pY is multiplicatively separable. [sent-202, score-0.702]

83 This lemma focuses on the speciﬁc case where A = I and a white Gaussian noise. [sent-204, score-0.129]

84 However, considering the transformations induced by a shift to an arbitrary invertible matrix A and to an arbitrary non-degenerate covariance matrix Σ, which are depicted throughout Section 2, it is easy to see that the result on convexity carries over. [sent-205, score-0.152]

85 These results provide a precise characterization of conditions on the Bayesian priors so that the MMSE estimator can be expressed as minimizer of a convex or separable function. [sent-208, score-0.566]

86 Interestingly, those conditions are expressed in terms of the probability distribution function (pdf in short) of the observations pY , which is sometimes referred to as the evidence. [sent-209, score-0.076]

87 Given that we assume that the noise is Gaussian, its pdf pB always is log-concave. [sent-211, score-0.084]

88 Thanks to a simple property of the convolution of log-concave functions, it is sufﬁcient that the prior on the signal pZ is log-concave to ensure that pY also is. [sent-212, score-0.143]

89 This means that there are some priors pX that are not log-concave such that the associated MMSE estimator can still be expressed as the minimizer of a functional with a convex regularizer. [sent-214, score-0.417]

90 For a more detailed analysis of this last point, in the speciﬁc context of Bernoulli-Gaussian priors (which are not log-concave), please refer to the technical report [13]. [sent-215, score-0.137]

91 If the distribution of the observation y is not log-concave, then, the MMSE estimate cannot be expressed as the solution of a convex regularization-oriented formulation. [sent-217, score-0.107]

92 This means that, with a quadratic data-ﬁtting term, a convex approach to signal estimation cannot be optimal (in a mean squared error sense). [sent-218, score-0.133]

93 4 Prospects In this paper we have extended a result, stating that in the context of linear inverse problems with Gaussian noise, for any Bayesian prior, there exists a regularizer φ such that the MMSE estimator can be expressed as the solution of regularized regression problem (2). [sent-219, score-0.354]

94 For instance, our proof of the result naturally builds on elementary bricks that combine in a way that is imposed by the deﬁnition of the linear inverse problem. [sent-222, score-0.169]

95 Moreover, in the situation where A is not invertible (i. [sent-224, score-0.097]

96 the problem is under-determined), there is a large set of admissible regularizers (i. [sent-226, score-0.076]

97 But a careful study of the proofs in [8] suggests that there is a peculiar connection between the Gaussian noise model on the one hand and the quadratic loss on the other hand. [sent-234, score-0.08]

98 Finally, we have stated a number of negative results regarding the non-optimality of sparse decoders or of convex formulations for handling observations drawn from a distribution that is not log-concave. [sent-236, score-0.112]

99 It would be interesting to develop a metric in the estimators space in order to quantify, for instance, how “far” one arbitrary estimator is from an optimal one, or, in other words, what is the intrinsic cost of convex relaxations. [sent-237, score-0.27]

100 Should penalized least squares regression be interpreted as maximum a pose teriori estimation? [sent-265, score-0.105]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mmse', 0.45), ('py', 0.415), ('pz', 0.363), ('az', 0.167), ('separable', 0.149), ('pb', 0.147), ('rn', 0.136), ('argmin', 0.126), ('priors', 0.111), ('inverse', 0.11), ('estimators', 0.106), ('estimator', 0.1), ('minimizer', 0.099), ('invertible', 0.097), ('additively', 0.09), ('rennes', 0.088), ('bayesian', 0.087), ('multiplicatively', 0.085), ('separability', 0.082), ('prior', 0.079), ('gribonval', 0.078), ('iz', 0.078), ('white', 0.077), ('denoising', 0.077), ('regularizers', 0.076), ('inria', 0.075), ('decoder', 0.074), ('stationary', 0.072), ('penalty', 0.064), ('convex', 0.064), ('regularizer', 0.064), ('gaussian', 0.062), ('unique', 0.062), ('radically', 0.061), ('atlantique', 0.059), ('bretagne', 0.059), ('bricks', 0.059), ('derivating', 0.059), ('machart', 0.059), ('prejudice', 0.059), ('quid', 0.059), ('pierre', 0.057), ('convexity', 0.055), ('centre', 0.055), ('interpretations', 0.052), ('erroneous', 0.052), ('quo', 0.052), ('myth', 0.052), ('additive', 0.052), ('lemma', 0.052), ('noise', 0.049), ('decoders', 0.048), ('rodolphe', 0.048), ('reconciling', 0.048), ('corollary', 0.045), ('readability', 0.045), ('mi', 0.044), ('expressed', 0.043), ('embodied', 0.043), ('promoting', 0.043), ('log', 0.04), ('map', 0.039), ('guillaume', 0.038), ('signal', 0.038), ('connections', 0.037), ('minimum', 0.037), ('shed', 0.037), ('relate', 0.037), ('regression', 0.037), ('regularization', 0.036), ('penalized', 0.036), ('pdf', 0.035), ('regularizing', 0.035), ('jenatton', 0.034), ('referred', 0.033), ('francis', 0.033), ('seemingly', 0.032), ('squares', 0.032), ('rise', 0.032), ('ridge', 0.032), ('quadratic', 0.031), ('global', 0.031), ('px', 0.03), ('linearity', 0.03), ('light', 0.03), ('evidence', 0.03), ('coordinate', 0.03), ('im', 0.03), ('really', 0.029), ('conditional', 0.029), ('sake', 0.029), ('ciency', 0.028), ('laplacian', 0.028), ('penalties', 0.028), ('characterize', 0.028), ('equipped', 0.027), ('descent', 0.027), ('deeper', 0.027), ('convolution', 0.026), ('please', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 265 nips-2013-Reconciling "priors" & "priors" without prejudice?

Author: Remi Gribonval, Pierre Machart

2 0.18090761 9 nips-2013-A Kernel Test for Three-Variable Interactions

Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma

Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak inﬂuence on a third dependent variable, but their combined effect has a strong inﬂuence. This makes the Lancaster test especially suited to ﬁnding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.

3 0.13235639 88 nips-2013-Designed Measurements for Vector Count Data

Author: Liming Wang, David Carlson, Miguel Rodrigues, David Wilcox, Robert Calderbank, Lawrence Carin

Abstract: We consider design of linear projection measurements for a vector Poisson signal model. The projections are performed on the vector Poisson rate, X ∈ Rn , and the + observed data are a vector of counts, Y ∈ Zm . The projection matrix is designed + by maximizing mutual information between Y and X, I(Y ; X). When there is a latent class label C ∈ {1, . . . , L} associated with X, we consider the mutual information with respect to Y and C, I(Y ; C). New analytic expressions for the gradient of I(Y ; X) and I(Y ; C) are presented, with gradient performed with respect to the measurement matrix. Connections are made to the more widely studied Gaussian measurement model. Example results are presented for compressive topic modeling of a document corpora (word counting), and hyperspectral compressive sensing for chemical classiﬁcation (photon counting). 1

4 0.12545508 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms

Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan

Abstract: The Lasso is a cornerstone of modern multivariate data analysis, yet its performance suffers in the common situation in which covariates are correlated. This limitation has led to a growing number of Preconditioned Lasso algorithms that pre-multiply X and y by matrices PX , Py prior to running the standard Lasso. A direct comparison of these and similar Lasso-style algorithms to the original Lasso is difﬁcult because the performance of all of these methods depends critically on an auxiliary penalty parameter λ. In this paper we propose an agnostic framework for comparing Preconditioned Lasso algorithms to the Lasso without having to choose λ. We apply our framework to three Preconditioned Lasso instances and highlight cases when they will outperform the Lasso. Additionally, our theory reveals fragilities of these algorithms to which we provide partial solutions. 1

5 0.10691227 109 nips-2013-Estimating LASSO Risk and Noise Level

Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari

Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefﬁcient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the ﬁrst that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1

6 0.094232805 171 nips-2013-Learning with Noisy Labels

7 0.083256155 91 nips-2013-Dirty Statistical Models

8 0.080614984 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

9 0.079932019 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

10 0.075604856 201 nips-2013-Multi-Task Bayesian Optimization

11 0.069925539 173 nips-2013-Least Informative Dimensions

12 0.068646275 321 nips-2013-Supervised Sparse Analysis and Synthesis Operators

13 0.062803254 17 nips-2013-A multi-agent control framework for co-adaptation in brain-computer interfaces

14 0.061894141 5 nips-2013-A Deep Architecture for Matching Short Texts

15 0.059148327 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

16 0.058183532 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

17 0.057147406 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory

18 0.056011539 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization

19 0.054072987 242 nips-2013-PAC-Bayes-Empirical-Bernstein Inequality

20 0.052114975 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.171), (1, 0.059), (2, 0.068), (3, 0.033), (4, -0.014), (5, 0.067), (6, -0.014), (7, 0.031), (8, -0.062), (9, -0.018), (10, 0.066), (11, 0.012), (12, -0.086), (13, -0.115), (14, -0.026), (15, -0.022), (16, 0.032), (17, -0.005), (18, -0.112), (19, -0.016), (20, -0.065), (21, 0.068), (22, -0.024), (23, 0.007), (24, -0.024), (25, 0.005), (26, -0.025), (27, 0.009), (28, 0.058), (29, 0.041), (30, 0.043), (31, 0.028), (32, 0.016), (33, -0.021), (34, 0.004), (35, -0.001), (36, 0.063), (37, -0.011), (38, -0.028), (39, -0.038), (40, -0.04), (41, -0.064), (42, -0.01), (43, 0.014), (44, 0.023), (45, -0.013), (46, -0.081), (47, -0.017), (48, -0.103), (49, -0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9107334 265 nips-2013-Reconciling "priors" & "priors" without prejudice?

Author: Remi Gribonval, Pierre Machart

2 0.73407102 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms

Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan

3 0.66450638 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

Author: Adel Javanmard, Andrea Montanari

Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefﬁcients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefﬁcients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantiﬁed through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that ﬁrst solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1

4 0.65725976 109 nips-2013-Estimating LASSO Risk and Noise Level

Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari

5 0.64785981 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

Author: Adel Javanmard, Andrea Montanari

Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical signiﬁcance as conﬁdence intervals or p-values. We consider here a broad class of regression problems, and propose an efﬁcient algorithm for constructing conﬁdence intervals and p-values. The resulting conﬁdence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the ﬁeld in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1

6 0.60926199 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory

7 0.60760117 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements

8 0.57016742 9 nips-2013-A Kernel Test for Three-Variable Interactions

9 0.55832899 88 nips-2013-Designed Measurements for Vector Count Data

10 0.55690968 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

11 0.54305285 111 nips-2013-Estimation, Optimization, and Parallelism when Data is Sparse

12 0.54144239 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions

13 0.53833276 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning

14 0.53587162 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis

15 0.53490251 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

16 0.53291267 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima

17 0.52414095 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures

18 0.51810795 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models

19 0.50120622 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

20 0.48932895 249 nips-2013-Polar Operators for Structured Sparse Estimation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.027), (33, 0.144), (34, 0.115), (41, 0.022), (49, 0.025), (56, 0.125), (70, 0.042), (85, 0.044), (86, 0.258), (89, 0.054), (93, 0.046), (95, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.8200106 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms

Author: Fabian L. Wauthier, Nebojsa Jojic, Michael Jordan

same-paper 2 0.81308174 265 nips-2013-Reconciling "priors" & "priors" without prejudice?

Author: Remi Gribonval, Pierre Machart

3 0.78321266 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

Author: Ben Shababo, Brooks Paige, Ari Pakman, Liam Paninski

Abstract: With the advent of modern stimulation techniques in neuroscience, the opportunity arises to map neuron to neuron connectivity. In this work, we develop a method for efﬁciently inferring posterior distributions over synaptic strengths in neural microcircuits. The input to our algorithm is data from experiments in which action potentials from putative presynaptic neurons can be evoked while a subthreshold recording is made from a single postsynaptic neuron. We present a realistic statistical model which accounts for the main sources of variability in this experiment and allows for signiﬁcant prior information about the connectivity and neuronal cell types to be incorporated if available. Due to the technical challenges and sparsity of these systems, it is important to focus experimental time stimulating the neurons whose synaptic strength is most ambiguous, therefore we also develop an online optimal design algorithm for choosing which neurons to stimulate at each trial. 1

4 0.75992197 348 nips-2013-Variational Policy Search via Trajectory Optimization

Author: Sergey Levine, Vladlen Koltun

Abstract: In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and highdimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present a method that uses trajectory optimization as a powerful exploration strategy that guides the policy search. A variational decomposition of a maximum likelihood policy objective allows us to use standard trajectory optimization algorithms such as differential dynamic programming, interleaved with standard supervised learning for the policy itself. We demonstrate that the resulting algorithm can outperform prior methods on two challenging locomotion tasks. 1

5 0.71878976 23 nips-2013-Active Learning for Probabilistic Hypotheses Using the Maximum Gibbs Error Criterion

Author: Nguyen Viet Cuong, Wee Sun Lee, Nan Ye, Kian Ming A. Chai, Hai Leong Chieu

Abstract: We introduce a new objective function for pool-based Bayesian active learning with probabilistic hypotheses. This objective function, called the policy Gibbs error, is the expected error rate of a random classiﬁer drawn from the prior distribution on the examples adaptively selected by the active learning policy. Exact maximization of the policy Gibbs error is hard, so we propose a greedy strategy that maximizes the Gibbs error at each iteration, where the Gibbs error on an instance is the expected error of a random classiﬁer selected from the posterior label distribution on that instance. We apply this maximum Gibbs error criterion to three active learning scenarios: non-adaptive, adaptive, and batch active learning. In each scenario, we prove that the criterion achieves near-maximal policy Gibbs error when constrained to a ﬁxed budget. For practical implementations, we provide approximations to the maximum Gibbs error criterion for Bayesian conditional random ﬁelds and transductive Naive Bayes. Our experimental results on a named entity recognition task and a text classiﬁcation task show that the maximum Gibbs error criterion is an effective active learning criterion for noisy models. 1

6 0.67598122 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

7 0.67232031 150 nips-2013-Learning Adaptive Value of Information for Structured Prediction

8 0.67223579 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables

9 0.67152399 249 nips-2013-Polar Operators for Structured Sparse Estimation

10 0.67012423 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

11 0.67000514 259 nips-2013-Provable Subspace Clustering: When LRR meets SSC

12 0.66983336 158 nips-2013-Learning Multiple Models via Regularized Weighting

13 0.66937953 233 nips-2013-Online Robust PCA via Stochastic Optimization

14 0.6692099 318 nips-2013-Structured Learning via Logistic Regression

15 0.66878355 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

16 0.66844541 186 nips-2013-Matrix factorization with binary components

17 0.66817194 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

18 0.66799688 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

19 0.667822 173 nips-2013-Least Informative Dimensions

20 0.66772377 99 nips-2013-Dropout Training as Adaptive Regularization