nips nips2013 nips2013-68 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. [sent-6, score-0.307]
2 We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. [sent-7, score-0.406]
3 When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. [sent-9, score-0.425]
4 Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. [sent-10, score-0.145]
5 The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. [sent-11, score-0.107]
6 We test our method on a diabetes prediction problem. [sent-13, score-0.17]
7 A widely applicable approach consists in optimizing a suitably regularized likelihood function. [sent-19, score-0.09]
8 In classical statistics, generic and well accepted procedures are available for characterizing the uncertainty associated to a certain parameter estimate in terms of confidence intervals or p-values [28, 14]. [sent-25, score-0.299]
9 In this paper we develop a computationally efficient procedure for constructing confidence intervals and p-values for a broad class of high-dimensional regression problems. [sent-27, score-0.406]
10 The salient features of our procedure are: (i) Our approach guarantees nearly optimal confidence interval sizes and testing power. [sent-28, score-0.257]
11 (ii) It is the first one that achieves this goal under essentially no assumptions on the population covariance matrix of the parameters, beyond the standard conditions for high-dimensional consistency. [sent-29, score-0.16]
12 1 Table 1: Unbiased estimator for θ0 in high dimensional linear regression models Input: Measurement vector y, design matrix X, parameter γ. [sent-31, score-0.398]
13 1: Set λ = σγ, and let θ n be the Lasso estimator as per Eq. [sent-33, score-0.152]
14 6: Define the estimator θ u as follows: θu = θn (λ) + 1 M XT (Y − Xθn (λ)) n (5) (iv) Our method has a natural generalization non-linear regression models (e. [sent-44, score-0.249]
15 For the sake of clarity, we will focus our presentation on the case of linear regression, deferring the generalization to Section 4. [sent-48, score-0.109]
16 , Yn )T and T T denoting by X the design matrix with rows X1 , . [sent-60, score-0.205]
17 In particular θ is Gaussian with mean θ0 and covariance σ 2 (XT X)−1 . [sent-66, score-0.078]
18 A copious theoretical literature [6, 2, 4] shows that, under suitable assumptions on X, the Lasso is nearly as accurate as if the support S was known a priori. [sent-74, score-0.1]
19 Deriving an exact characterization for the distribution of θn is not tractable in general, and hence there is no simple procedure to construct confidence intervals and p-values. [sent-77, score-0.255]
20 In order to overcome this challenge, we construct a de-biased estimator from the Lasso solution. [sent-78, score-0.195]
21 The de-biased estimator is given by the simple formula θu = θn +(1/n) M XT (Y −Xθn ), as in Eq. [sent-79, score-0.152]
22 96σ Qii /n] is a 95% confi- We will prove in Section 2 that θu is approximately Gaussian, with mean θ0 and covariance σ 2 (M ΣM )/n, where Σ = (XT X/n) is the empirical covariance of the feature vectors. [sent-86, score-0.156]
23 This result allows to construct confidence intervals and p-values in complete analogy with classical statistics u u procedures. [sent-87, score-0.255]
24 In practice the noise standard deviation is not known, but σ can be replaced by any consistent estimator σ. [sent-94, score-0.152]
25 We propose here to construct M by solving a convex program that aims at optimizing two objectives. [sent-96, score-0.086]
26 The idea of constructing a de-biased estimator of the form θu = θn + (1/n) M XT (Y − Xθn ) was used by Javanmard and Montanari in [10], that suggested the choice M = cΣ−1 , with Σ = T E{X1 X1 } the population covariance matrix and c a positive constant. [sent-100, score-0.327]
27 A simple estimator for Σ was proposed for sparse covariances, but asymptotic validity and optimality were proven only for uncorrelated Gaussian designs (i. [sent-101, score-0.26]
28 These authors prove semi-parametric optimality in a non-asymptotic setting, provided the sample size is at least n = Ω(s2 log p). [sent-105, score-0.09]
29 In this paper, we do not assume any sparsity constraint on 0 Σ−1 , but still require the sample size scaling n = Ω(s2 log p). [sent-106, score-0.095]
30 From a technical point of view, our proof starts from a simple decomposition of the de-biased estimator θu into a Gaussian part and an error term, already used in [25]. [sent-108, score-0.152]
31 However –departing radically from earlier work– we realize that M need not be a good estimator of Σ−1 in order for the de-biasing procedure to work. [sent-109, score-0.201]
32 As a consequence of this choice, our approach applies to general covariance structures Σ. [sent-111, score-0.078]
33 The only assumptions we make on Σ are the standard compatibility conditions required for high-dimensional consistency [4]. [sent-113, score-0.168]
34 Restricting ourselves to linear regression, earlier work investigated prediction error [8], model selection properties [17, 31, 27, 5], 2 consistency [6, 2]. [sent-117, score-0.089]
35 Zhang and Zhang [30], and B¨ hlmann [3] proposed hypothesis u testing procedures under restricted eigenvalue or compatibility conditions [4]. [sent-120, score-0.557]
36 [15] develop a test for the hypothesis that a newly added coefficient along the Lasso regularization path is irrelevant. [sent-125, score-0.216]
37 Finally, resampling methods for hypothesis testing were studied in [29, 18, 19]. [sent-128, score-0.303]
38 2 Preliminaries and notations We let Σ ≡ XT X/n be the sample covariance matrix. [sent-130, score-0.078]
39 For a matrix Σ and a set S of size s0 , the compatibility condition is met, if for some φ0 > 0, and all θ satisfying θS c 1 ≤ 3 θS 1 , it holds that θS 2 1 ≤ s0 T θ Σθ . [sent-135, score-0.213]
40 The sub-gaussian norm of a random variable X, denoted by X X ψ2 ψ2 , is defined as = sup p−1/2 (E|X|p )1/p . [sent-138, score-0.124]
41 p≥1 For a matrix A and set of indices I, J, we let AI,J denote the submatrix formed by the rows in I and columns in J. [sent-141, score-0.141]
42 A·,I ) denotes the submatrix containing just the rows (reps. [sent-143, score-0.099]
43 We write v p for the standard p norm of a vector v and v 0 for the number of nonzero entries of v. [sent-149, score-0.112]
44 In addition assume the rows of the whitened matrix XΣ−1/2 are sub-gaussian, i. [sent-167, score-0.098]
45 Let E be the event that the compatibility condition holds for Σ, and maxi∈[p] Σi,i = O(1). [sent-170, score-0.214]
46 Note that compatibility condition (and hence the event E) holds w. [sent-175, score-0.214]
47 In fact [22] shows that under some general assumptions, the compatibility condition on Σ implies a similar condition on Σ, w. [sent-179, score-0.128]
48 Hence, θu is an asymptotically unbiased estimator for θ0 . [sent-194, score-0.207]
49 1 is to derive confidence intervals and statistical hypothesis tests √ for high dimensional models. [sent-196, score-0.474]
50 Throughout, we make the sparsity assumption s0 = o( n/ log p). [sent-197, score-0.095]
51 1 Confidence intervals We first show that the variances of variables Zj |X are Ω(1). [sent-199, score-0.212]
52 , mp )T be the matrix with rows mT obtained by solving convex i program (4). [sent-205, score-0.186]
53 (7) + δ(α, n)] is an asymptotic two-sided confidence interval for θ0,i with Notice that the same corollary applies to any other consistent estimator σ of the noise standard deviation. [sent-223, score-0.276]
54 2 Hypothesis testing An important advantage of sparse linear regression models is that they provide parsimonious explanations of the data in terms of a small number of covariates. [sent-225, score-0.238]
55 More precisely, we are interested in testing an individual null hypothesis H0,i : θ0,i = 0 versus the alternative HA,i : θ0,i = 0, and assigning p-values for these tests. [sent-228, score-0.365]
56 We construct a p-value Pi for the test H0,i as follows: √ u n |θi | . [sent-229, score-0.097]
57 (9) We measure the quality of the test Ti,X (y) in terms of its significance level αi and statistical power 1 − βi . [sent-231, score-0.176]
58 Further note that, without further assumption, no nontrivial power can be achieved. [sent-240, score-0.109]
59 We take a minimax perspective and require the test to behave uniformly well over s0 -sparse vectors. [sent-243, score-0.133]
60 Also, Pθ (·) is the induced probability for random design X and noise realization w, given the fixed parameter vector θ. [sent-247, score-0.107]
61 Consider a random√ design model that satisfies the conditions of Theorem 2. [sent-251, score-0.107]
62 Under the sparsity assumption s0 = o( n/ log p), the following holds true for any fixed sequence of integers i = i(n): lim αi (n) ≤ α . [sent-253, score-0.184]
63 Moreover, G(α, 0) = α which is the trivial power obtained by randomly rejecting H0,i with probability α. [sent-256, score-0.125]
64 1 Minimax optimality The authors of [10] prove an upper bound for the minimax power of tests with a given significance level α, under the Gaussian random design models (see Theorem 2. [sent-261, score-0.342]
65 √ In asymptotic regime and under our sparsity assumption s0 = o( n/ log p), the bound of [10] simplifies to lim n→∞ opt 1 − βi (α; µ) ≤ 1, G(α, µ/σeff ) σeff = √ σ , n ηΣ,s0 (12) Using the bound of (12) and specializing the result of Theorem 3. [sent-267, score-0.209]
66 3 to Gaussian design X, we obtain that our scheme achieves a near optimal minimax power for a broad class of covariance matrices. [sent-268, score-0.375]
67 We can compare our test to the optimal test by computing how much µ must be increased in order to achieve the minimax optimal power. [sent-269, score-0.187]
68 It follows from the above that µ must be increased to µ, with ˜ the two differing by a factor: µ/µ = ˜ Σ−1 ηΣ,s0 ≤ ii Σ−1 Σi,i ≤ i,i σmax (Σ)/σmin (Σ) , since Σ−1 ≤ (σmin (Σ))−1 , and Σi|S ≤ Σi,i ≤ σmax (Σ) due to ΣS,S ii 4 0. [sent-270, score-0.19]
69 General regularized maximum likelihood In this section, we generalize our results beyond the linear regression model to general regularized maximum likelihood. [sent-271, score-0.277]
70 Formal guarantees can be obtained under suitable restricted strong convexity assumptions [20] and will be the object of a forthcoming publication. [sent-273, score-0.114]
71 We consider the following regularized estimator: θ ≡ arg min L(θ) + λR(θ) , p (13) θ∈R where λ is a regularization parameter and R : Rp → R+ is a norm. [sent-284, score-0.09]
72 Let Ii (θ) be the Fisher information of fθ (Y |Xi ), defined as T Ii (θ) ≡ E θ log fθ (Y |Xi ) θ log fθ (Y |Xi ) 2 θ Xi = −E log f (Y |Xi , θ) Xi , where the second identity holds under suitable regularity conditions [13], and sian operator. [sent-286, score-0.193]
73 Finally, the de-biased estimator θu is defined by θu ≡ θ − M θ L(θ) , with M given again by the solution of the convex program (4), and the definition of Σ provided here. [sent-289, score-0.195]
74 Approximating 2 L(θ0 ) ≈ Σ θ θ (which amounts to taking expectation with respect to the response variables yi ), we get θu − θ0 ≈ −M θ L(θ0 ) − [M Σ − I](θ − θ0 ). [sent-293, score-0.151]
75 The bias term −[M Σ − I](θ − θ0 ) can be bounded as in the linear regression case, building on the fact that M is chosen such that |M Σ − I|∞ ≤ γ. [sent-296, score-0.097]
76 Similar to the linear case, an asymptotic two-sided confidence interval for θ0,i (with significance α) u u is given by Ii = [θi − δ(α, n), θi + δ(α, n)], where 1/2 δ(α, n) = Φ−1 (1 − α/2)n−1/2 [M ΣM T ]i,i . [sent-297, score-0.124]
77 Moreover, an asymptotically valid p-value Pi for testing null hypothesis H0,i is constructed as: √ u n|θi | . [sent-298, score-0.42]
78 It is easy to see that in this case T Ii (θ) = qi (1 − qi )Xi Xi , with qi = (1 + e− θ,Xi )−1 , and thus Σ= 5 1 n n T qi (1 − qi )Xi Xi . [sent-301, score-0.475]
79 i=1 Diabetes data example We consider the problem of estimating relevant attributes in predicting type-2 diabetes. [sent-302, score-0.113]
80 We evaluate the performance of our hypothesis testing procedure on the Practice Fusion Diabetes dataset [1]. [sent-303, score-0.303]
81 This dataset contains de-identified medical records of 10000 patients, including information on diagnoses, medications, lab results, allergies, immunizations, and vital signs. [sent-304, score-0.115]
82 From this dataset, we extract p numerical attributes resulting in a sparse design matrix Xtot ∈ Rntot ×p , with ntot = 10000, 7 0. [sent-305, score-0.346]
83 4 ~ Histograms of Z -3 -2 -1 0 1 2 3 -10 -5 0 5 10 Standard normal quantiles (a) (b) Q-Q plot of Z ˜ Normalized histograms of Z for one realization. [sent-310, score-0.317]
84 ˜ ˜ Figure 1: Q-Q plot of Z and normalized histograms of ZS (in red) and ZS c (in blue) for one realization. [sent-311, score-0.129]
85 The attributes consist of: (i)Transcript records: year of birth, gender and BMI; (ii)Diagnoses informations: 80 binary attributes corresponding to different ICD-9 codes. [sent-316, score-0.226]
86 (iii)Medications: 80 binary attributes indicating the use of different medications. [sent-317, score-0.113]
87 (iv) Lab results: For 70 lab test observations, we include attributes indicating patients tested, abnormality flags, and the observed values. [sent-318, score-0.34]
88 We also bin the observed values into 10 quantiles and make 10 binary attributes indicating the bin of the corresponding observed value. [sent-319, score-0.333]
89 We consider logistic model as described in the previous section with a binary response identifying the patients diagnosed with type-2 diabetes. [sent-320, score-0.234]
90 Letting L(θ) be the logistic loss corresponding to the design Xtot and response vector Y ∈ Rntot , we take θ0 as the minimizer of L(θ). [sent-322, score-0.239]
91 Next, we take random subsamples of size n = 500 from the patients, and examine the performance of our testing procedure. [sent-324, score-0.202]
92 The experiment is done using glmnet-package in R that fits the entire path of the regularized logistic estimator. [sent-325, score-0.149]
93 1(a), sample quantiles of Z are depicted versus the quantiles of a standard normal distribution. [sent-334, score-0.328]
94 1(b), we plot the normalized histograms of ZS (in red) and ZS c (in ˜ ˜ ˜ blue). [sent-338, score-0.129]
95 As the plot showcases, ZS c has roughly standard normal distribution, and the entries of ZS ˜S with larger magnitudes are easier to be marked appear as distinguishable spikes. [sent-339, score-0.146]
96 The entries of Z off from the normal distribution tail. [sent-340, score-0.094]
97 Hypothesis testing in high-dimensional regression under the gaussian random design model: Asymptotic theory. [sent-398, score-0.387]
98 A perturbation method for inference on regularized regression estimates. [sent-458, score-0.187]
99 Regularized multivariate regression for identifying master predictors with application to integrative genomics study of breast cancer. [sent-479, score-0.155]
100 On asymptotically optimal confidence regions and tests for u high-dimensional models. [sent-503, score-0.102]
wordName wordTfidf (topN-words)
[('cance', 0.238), ('dence', 0.232), ('intervals', 0.212), ('zs', 0.183), ('rp', 0.173), ('qii', 0.167), ('hypothesis', 0.162), ('estimator', 0.152), ('javanmard', 0.145), ('testing', 0.141), ('quantiles', 0.14), ('lasso', 0.137), ('compatibility', 0.128), ('xtot', 0.125), ('xt', 0.123), ('diabetes', 0.116), ('attributes', 0.113), ('design', 0.107), ('patients', 0.102), ('regression', 0.097), ('con', 0.096), ('ii', 0.095), ('qi', 0.095), ('xi', 0.09), ('regularized', 0.09), ('pi', 0.089), ('ols', 0.084), ('medications', 0.084), ('ntot', 0.084), ('rntot', 0.084), ('hlmann', 0.081), ('minimax', 0.079), ('yi', 0.078), ('covariance', 0.078), ('histograms', 0.077), ('diagnoses', 0.074), ('forthcoming', 0.074), ('response', 0.073), ('supp', 0.072), ('lab', 0.071), ('power', 0.069), ('asymptotic', 0.068), ('deferring', 0.068), ('norm', 0.066), ('letting', 0.062), ('null', 0.062), ('stanford', 0.061), ('subsamples', 0.061), ('geer', 0.061), ('ritov', 0.061), ('nearly', 0.06), ('logistic', 0.059), ('genomics', 0.058), ('sup', 0.058), ('rejecting', 0.056), ('interval', 0.056), ('rows', 0.056), ('constructing', 0.055), ('asymptotically', 0.055), ('test', 0.054), ('necessity', 0.054), ('arxiv', 0.054), ('statistical', 0.053), ('plot', 0.052), ('fusion', 0.052), ('lehmann', 0.052), ('montanari', 0.051), ('log', 0.05), ('meinshausen', 0.05), ('earlier', 0.049), ('coef', 0.049), ('normal', 0.048), ('tests', 0.047), ('entries', 0.046), ('lim', 0.046), ('sparsity', 0.045), ('mp', 0.045), ('procedures', 0.045), ('records', 0.044), ('submatrix', 0.043), ('ca', 0.043), ('construct', 0.043), ('event', 0.043), ('holds', 0.043), ('notice', 0.043), ('program', 0.043), ('gaussian', 0.042), ('matrix', 0.042), ('broad', 0.042), ('accepted', 0.042), ('yn', 0.042), ('sake', 0.041), ('theorem', 0.041), ('van', 0.041), ('zi', 0.041), ('optimality', 0.04), ('bin', 0.04), ('selection', 0.04), ('nontrivial', 0.04), ('assumptions', 0.04)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999923 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
2 0.2694163 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
3 0.25157732 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
Author: Adel Javanmard, Andrea Montanari
Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1
4 0.16323155 91 nips-2013-Dirty Statistical Models
Author: Eunho Yang, Pradeep Ravikumar
Abstract: We provide a unified framework for the high-dimensional analysis of “superposition-structured” or “dirty” statistical models: where the model parameters are a superposition of structurally constrained parameters. We allow for any number and types of structures, and any statistical model. We consider the general class of M -estimators that minimize the sum of any loss function, and an instance of what we call a “hybrid” regularization, that is the infimal convolution of weighted regularization functions, one for each structural component. We provide corollaries showcasing our unified framework for varied statistical models such as linear regression, multiple regression and principal component analysis, over varied superposition structures. 1
5 0.15985127 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor
Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
6 0.14201462 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
7 0.13745368 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
8 0.13027564 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
10 0.1107255 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
11 0.11009249 188 nips-2013-Memory Limited, Streaming PCA
12 0.10710918 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
13 0.10453013 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation
14 0.096949548 206 nips-2013-Near-Optimal Entrywise Sampling for Data Matrices
15 0.096176125 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
16 0.095902167 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
17 0.092875369 269 nips-2013-Regression-tree Tuning in a Streaming Setting
18 0.09082751 106 nips-2013-Eluder Dimension and the Sample Complexity of Optimistic Exploration
19 0.090636469 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
20 0.090349607 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach
topicId topicWeight
[(0, 0.294), (1, 0.076), (2, 0.129), (3, 0.052), (4, -0.023), (5, 0.108), (6, -0.03), (7, 0.052), (8, -0.2), (9, 0.013), (10, 0.139), (11, -0.098), (12, -0.155), (13, -0.113), (14, -0.138), (15, -0.061), (16, 0.082), (17, -0.018), (18, -0.05), (19, 0.004), (20, 0.004), (21, 0.037), (22, -0.069), (23, 0.038), (24, -0.035), (25, 0.072), (26, -0.021), (27, -0.06), (28, -0.058), (29, -0.03), (30, 0.058), (31, -0.023), (32, -0.014), (33, -0.051), (34, -0.015), (35, -0.04), (36, -0.041), (37, -0.01), (38, 0.034), (39, -0.045), (40, -0.027), (41, 0.003), (42, 0.001), (43, 0.051), (44, -0.044), (45, -0.056), (46, -0.026), (47, 0.044), (48, 0.042), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.96436584 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
Author: Adel Javanmard, Andrea Montanari
Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1
3 0.89269859 109 nips-2013-Estimating LASSO Risk and Noise Level
Author: Mohsen Bayati, Murat A. Erdogdu, Andrea Montanari
Abstract: We study the fundamental problems of variance and risk estimation in high dimensional statistical modeling. In particular, we consider the problem of learning a coefficient vector θ0 ∈ Rp from noisy linear observations y = Xθ0 + w ∈ Rn (p > n) and the popular estimation procedure of solving the 1 -penalized least squares objective known as the LASSO or Basis Pursuit DeNoising (BPDN). In this context, we develop new estimators for the 2 estimation risk θ − θ0 2 and the variance of the noise when distributions of θ0 and w are unknown. These can be used to select the regularization parameter optimally. Our approach combines Stein’s unbiased risk estimate [Ste81] and the recent results of [BM12a][BM12b] on the analysis of approximate message passing and the risk of LASSO. We establish high-dimensional consistency of our estimators for sequences of matrices X of increasing dimensions, with independent Gaussian entries. We establish validity for a broader class of Gaussian designs, conditional on a certain conjecture from statistical physics. To the best of our knowledge, this result is the first that provides an asymptotically consistent risk estimator for the LASSO solely based on data. In addition, we demonstrate through simulations that our variance estimation outperforms several existing methods in the literature. 1
4 0.81772894 219 nips-2013-On model selection consistency of penalized M-estimators: a geometric theory
Author: Jason Lee, Yuekai Sun, Jonathan E. Taylor
Abstract: Penalized M-estimators are used in diverse areas of science and engineering to fit high-dimensional models with some low-dimensional structure. Often, the penalties are geometrically decomposable, i.e. can be expressed as a sum of support functions over convex sets. We generalize the notion of irrepresentable to geometrically decomposable penalties and develop a general framework for establishing consistency and model selection consistency of M-estimators with such penalties. We then use this framework to derive results for some special cases of interest in bioinformatics and statistical learning. 1
5 0.78052217 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
Author: Nikhil Rao, Christopher Cox, Rob Nowak, Timothy T. Rogers
Abstract: Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive form of multitask learning, wherein (1) the available features can be organized into subsets according to a notion of similarity and (2) features useful in one task are similar, but not necessarily identical, to the features best suited for other tasks. The main contribution of this paper is a new procedure called Sparse Overlapping Sets (SOS) lasso, a convex optimization that automatically selects similar features for related learning tasks. Error bounds are derived for SOSlasso and its consistency is established for squared error loss. In particular, SOSlasso is motivated by multisubject fMRI studies in which functional activity is classified using brain voxels as features. Experiments with real and synthetic data demonstrate the advantages of SOSlasso compared to the lasso and group lasso. 1
6 0.74762076 4 nips-2013-A Comparative Framework for Preconditioned Lasso Algorithms
7 0.74059641 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
8 0.73984307 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
9 0.73536587 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
10 0.73346817 354 nips-2013-When in Doubt, SWAP: High-Dimensional Sparse Recovery from Correlated Measurements
11 0.69883311 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
12 0.67820096 209 nips-2013-New Subsampling Algorithms for Fast Least Squares Regression
13 0.66130888 225 nips-2013-One-shot learning and big data with n=2
14 0.65420496 91 nips-2013-Dirty Statistical Models
15 0.62672377 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
16 0.59681469 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression
17 0.59194535 110 nips-2013-Estimating the Unseen: Improved Estimators for Entropy and other Properties
18 0.58718091 3 nips-2013-A* Lasso for Learning a Sparse Bayesian Network Structure for Continuous Variables
19 0.56485319 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
20 0.55829859 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?
topicId topicWeight
[(2, 0.021), (16, 0.039), (19, 0.18), (33, 0.168), (34, 0.101), (41, 0.019), (49, 0.037), (56, 0.172), (70, 0.027), (85, 0.034), (89, 0.094), (93, 0.037), (95, 0.024)]
simIndex simValue paperId paperTitle
Author: Harish G. Ramaswamy, Shivani Agarwal, Ambuj Tewari
Abstract: The design of convex, calibrated surrogate losses, whose minimization entails consistency with respect to a desired target loss, is an important concept to have emerged in the theory of machine learning in recent years. We give an explicit construction of a convex least-squares type surrogate loss that can be designed to be calibrated for any multiclass learning problem for which the target loss matrix has a low-rank structure; the surrogate loss operates on a surrogate target space of dimension at most the rank of the target loss. We use this result to design convex calibrated surrogates for a variety of subset ranking problems, with target losses including the precision@q, expected rank utility, mean average precision, and pairwise disagreement. 1
2 0.93059796 9 nips-2013-A Kernel Test for Three-Variable Interactions
Author: Dino Sejdinovic, Arthur Gretton, Wicher Bergsma
Abstract: We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.
3 0.9006995 119 nips-2013-Fast Template Evaluation with Vector Quantization
Author: Mohammad Amin Sadeghi, David Forsyth
Abstract: Applying linear templates is an integral part of many object detection systems and accounts for a significant portion of computation time. We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy. Our method is a combination of approximating scores by vector quantizing feature windows and a number of speedup techniques including cascade. Our procedure allows speed and accuracy to be traded off in two ways: by choosing the number of Vector Quantization levels, and by choosing to rescore windows or not. Our method can be directly plugged into any recognition system that relies on linear templates. We demonstrate our method to speed up the original Exemplar SVM detector [1] by an order of magnitude and Deformable Part models [2] by two orders of magnitude with no loss of accuracy. 1
same-paper 4 0.894023 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
5 0.84869665 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Author: Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Abstract: Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these dependencies when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal, we study the properties of the Kernel Cross-Spectral Density (KCSD) operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series, as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvements in terms of detection errors, as well as the suitability of this approach for testing dependency in complex dynamical systems. This similarity measure enables us to identify different types of interactions in electrophysiological neural time series. 1
6 0.84422183 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
7 0.84342945 201 nips-2013-Multi-Task Bayesian Optimization
8 0.83743578 273 nips-2013-Reinforcement Learning in Robust Markov Decision Processes
9 0.82653666 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
10 0.82564992 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
11 0.82486707 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
12 0.82438105 173 nips-2013-Least Informative Dimensions
14 0.82283372 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation
15 0.82276511 137 nips-2013-High-Dimensional Gaussian Process Bandits
16 0.82152641 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
17 0.81927586 233 nips-2013-Online Robust PCA via Stochastic Optimization
18 0.81888902 249 nips-2013-Polar Operators for Structured Sparse Estimation
19 0.81875867 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
20 0.81765151 355 nips-2013-Which Space Partitioning Tree to Use for Search?