nips nips2013 nips2013-302 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tuo Zhao, Han Liu
Abstract: We propose a semiparametric method for estimating sparse precision matrix of high dimensional elliptical distribution. The proposed method calibrates regularizations when estimating each column of the precision matrix. Thus it not only is asymptotically tuning free, but also achieves an improved finite sample performance. Theoretically, we prove that the proposed method achieves the parametric rates of convergence in both parameter estimation and model selection. We present numerical results on both simulated and real datasets to support our theory and illustrate the effectiveness of the proposed estimator. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The proposed method calibrates regularizations when estimating each column of the precision matrix. [sent-2, score-0.136]
2 Thus it not only is asymptotically tuning free, but also achieves an improved finite sample performance. [sent-3, score-0.059]
3 Theoretically, we prove that the proposed method achieves the parametric rates of convergence in both parameter estimation and model selection. [sent-4, score-0.082]
4 We present numerical results on both simulated and real datasets to support our theory and illustrate the effectiveness of the proposed estimator. [sent-5, score-0.023]
5 1 Introduction We study the precision matrix estimation problem: let X = (X1 , . [sent-6, score-0.243]
6 , Xd )T be a d-dimensional random vector following some distribution with mean µ ∈ Rd and covariance matrix Σ ∈ Rd×d , where Σkj = EXk Xj − EXk EXj . [sent-9, score-0.135]
7 To make the estimation manageable in high dimensions (d/n → ∞), we assume that Ω is sparse. [sent-11, score-0.055]
8 Existing literature in machine learning and statistics usually assumes that X follows a multivariate Gaussian distribution, i. [sent-13, score-0.049]
9 Such a distributional assumption naturally connects sparse precision matrices with Gaussian graphical models (Dempster, 1972), and has motivated numerous applications (Lauritzen, 1996). [sent-16, score-0.203]
10 To estimate sparse precision matrices for Gaussian distributions, many methods in the past decade have been proposed based on the sample covariance estimator. [sent-17, score-0.25]
11 , xn ∈ Rd be n independent observations of X, the sample covariance estimator is defined as S= 1 n n ¯ ¯ ¯ (xi − x)(xi − x)T with x = i=1 1 n n xi . [sent-21, score-0.174]
12 (2008) take advantage of the Gaussian likelihood, and propose the graphic lasso (GLASSO) estimator by solving Ω = argmin − log |Ω| + tr(SΩ) + λ Ω |Ωkj |, j,k where λ > 0 is the regularization parameter. [sent-25, score-0.186]
13 Scalable software packages for GLASSO have been developed, such as huge (Zhao et al. [sent-26, score-0.043]
14 (2011); Yuan (2010) adopt the pseudo-likelihood approach to estimate the precision matrix. [sent-29, score-0.177]
15 Their estimators follow a column-by-column estimation scheme, and possess better 1 theoretical properties. [sent-30, score-0.124]
16 More specifically, given a matrix A ∈ Rd×d , let A∗j = (A1j , . [sent-31, score-0.052]
17 , Adj )T denote the j th column of A, ||A∗j ||1 = k |Akj | and ||A∗j ||∞ = maxk |Akj |, Cai et al. [sent-34, score-0.072]
18 (2011) obtain the CLIME estimator by solving Ω∗j = argmin ||Ω∗j ||1 s. [sent-35, score-0.12]
19 Theoretically, let ||A||1 = maxj ||A∗j ||1 be the matrix 1 norm of A, and ||A||2 be the largest singular value of A, (i. [sent-44, score-0.19]
20 (2011) show that if we choose log d , n the CLIME estimator achieves the following rates of convergence under the spectral norm, λ ||Ω − Ω||2 = OP 2 where q ∈ [0, 1) and s = maxj k ||Ω||1 ||Ω||4−4q s2 1 log d n (1. [sent-47, score-0.243]
21 Despite of these good properties, the CLIME estimator in (1. [sent-50, score-0.091]
22 2) has three drawbacks: (1) The theoretical justification heavily relies on the subgaussian tail assumption. [sent-51, score-0.041]
23 When this assumption is violated, the inference can be unreliable; (2) All columns are estimated using the same regularization parameter, even though these columns may have different sparseness. [sent-52, score-0.044]
24 As a result, more estimation bias is introduced to the denser columns to compensate the sparser columns. [sent-53, score-0.055]
25 In another word, the estimation is not calibrated (Liu et al. [sent-54, score-0.128]
26 Thus we have to carefully tune the regularization parameter over a refined grid of potential values in order to get a good finite-sample performance. [sent-57, score-0.044]
27 To overcome the above three drawbacks, we propose a new sparse precision matrix estimation method, named EPIC (Estimating Precision mIatrix with Calibration). [sent-58, score-0.274]
28 To relax the Gaussian assumption, our EPIC method adopts an ensemble of the transformed Kendall’s tau estimator and Catoni’s M-estimator (Kruskal, 1958; Catoni, 2012). [sent-59, score-0.295]
29 Such a semiparametric combination makes EPIC applicable to the elliptical distribution family. [sent-60, score-0.179]
30 , 1990) contains many multivariate distributions such as Gaussian, multivariate t-distribution, Kotz distribution, multivariate Laplace, Pearson type II and VII distributions. [sent-63, score-0.147]
31 Many of these distributions do not have subgaussian tails, thus the commonly used sample covariance-based sparse precision matrix estimators often fail miserably. [sent-64, score-0.302]
32 Moreover, our EPIC method adopts a calibration framework proposed in Gautier and Tsybakov (2011), which reduces the estimation bias by calibrating the regularization for each column. [sent-65, score-0.196]
33 Meanwhile, the optimal regularization parameter selection under such a calibration framework does not require any prior knowledge of unknown quantities (Belloni et al. [sent-66, score-0.17]
34 Thus our EPIC estimator is asymptotically tuning free (Liu and Wang, 2012). [sent-68, score-0.15]
35 Our theoretical analysis shows that if the underlying distribution has a finite fourth moment, the EPIC estimator achieves the same rates of convergence as (1. [sent-69, score-0.118]
36 Numerical experiments on both simulated and real datasets show that EPIC outperforms existing precision matrix estimation methods. [sent-71, score-0.272]
37 2 Background We first introduce some notations used throughout this paper. [sent-72, score-0.029]
38 , vd )T ∈ Rd , we define the following vector norms: |vj |, ||v||2 = 2 ||v||1 = 2 vj , ||v||∞ = max |vj |. [sent-76, score-0.068]
39 j j j Given a matrix A ∈ Rd×d , we use A∗j = (A1j , . [sent-77, score-0.052]
40 We define the following matrix norms: ||A||1 = max ||A∗j ||1 , ||A||2 = max ψj (A), ||A||2 = F j j A2 , ||A||max = max |Akj |, kj k,j 2 k,j where ψj (A)’s are all singular values of A. [sent-81, score-0.287]
41 , X)T follows an elliptical distribution with parameter µ, Ξ, and β, if X has a stochastic representation d X = µ + βBU , such that β ≥ 0 is a continuous random variable independent of U , U ∈ Sr−1 is uniformly distributed in the unit sphere in Rr , and Ξ = BBT . [sent-90, score-0.116]
42 2 Since we are interested in the precision matrix estimation, we assume that maxj EXj is finite. [sent-91, score-0.287]
43 1 is not unique, and existing literature usually imposes the constraint maxj Ξjj = 1 to make the distribution identifiable (Fang et al. [sent-93, score-0.142]
44 However, such a constraint does not necessarily make Ξ the covariance matrix. [sent-95, score-0.083]
45 2, the distribution is identifiable with Σ defined as the conventional covariance matrix. [sent-103, score-0.083]
46 Σ has the decomposition Σ = ΘZΘ, where Z is the Pearson correlation matrix, and Θ = diag(θ1 , . [sent-106, score-0.048]
47 Since Θ is a diagonal matrix, the precision Ω also has a similar decomposition Ω = Θ−1 ΓΘ−1 , where Γ = Z−1 is the inverse correlation matrix. [sent-110, score-0.21]
48 3 Method We propose a three-step method: (1) We first use the transformed Kendall’s tau estimator and Catoni’s M-estimator to obtain Z and Θ respectively. [sent-111, score-0.233]
49 (2) We then plug Z into the calibrated inverse correlation matrix estimation to obtain Γ. [sent-112, score-0.211]
50 1 Correlation Matrix and Standard Deviation Estimation To estimate Z, we adopt the transformed Kendall’s tau estimator proposed in Liu et al. [sent-115, score-0.317]
51 3 Precision Matrix Estimation Once we obtain the estimated inverse correlation matrix Γ, we can recover the precision matrix estimator by the ensemble rule, Ω = Θ−1 ΓΘ−1 . [sent-125, score-0.429]
52 A possible alternative is to directly estimate Ω by plugging a covariance estimator S = ΘZΘ (3. [sent-128, score-0.174]
53 1) instead of Z, but this direct estimation procedure makes the regularization parameter 2 selection sensitive to Var(Xj ). [sent-130, score-0.123]
54 We define the following class of sparse symmetric matrices, Uq (s, M ) = Γ ∈ Rd×d Γ 0, Γ = ΓT , max |Γkj |q ≤ s, ||Γ||1 ≤ M , j k where q ∈ [0, 1) and (s, d, M ) can scale with the sample size n. [sent-132, score-0.068]
55 2) maxj |µj | ≤ µmax , maxj θj ≤ θmax , minj θj ≥ θmin 4 (A. [sent-135, score-0.198]
56 3) maxj EXj ≤ K where µmax , K, θmax , and θmin are constants. [sent-136, score-0.099]
57 Suppose that X follows an elliptical distribution with mean µ, and covariance Σ = ΘZΘ. [sent-140, score-0.199]
58 3) hold, given the transformed Kendall’s tau estimator and Catoni’s Mestimator defined in Section 3, there exist universal constants κ1 and κ2 such that for large enough n, −1 −1 P max |θj − θj | ≤ κ2 log d n ≥1− 2 , d3 P max |Zkj − Zkj | ≤ κ1 log d n ≥1− 1 . [sent-143, score-0.307]
59 1 implies that both transformed Kendall’s tau estimator and Catoni’s M-estimator possess good concentration properties, which enable us to obtain a consistent estimator of Ω. [sent-145, score-0.351]
60 The next theorem presents the rates of convergence under the matrix 1 norm, spectral norm, Frobenius norm, and max norm. [sent-146, score-0.142]
61 RC: CLIME + S as the input covariance matrix (3) CLIME. [sent-168, score-0.135]
62 1) as the input covariance matrix We consider three different settings for the comparison: (1) d = 100; (2) d = 200; (3) d = 400. [sent-170, score-0.135]
63 We adopt the following three graph generation schemes, as illustrated in Figure 1, to obtain precision matrices. [sent-171, score-0.177]
64 (a) Chain (b) Erd¨ s-R´ nyi o e (c) Scale-free Figure 1: Three different graph patterns. [sent-172, score-0.096]
65 We then generate n = 200 independent samples from the t-distribution1 with 5 degrees of freedom, mean 0 and covariance Σ = Ω−1 . [sent-174, score-0.083]
66 To evaluate the performance in parameter estimation, we repeatedly split the data into a training set of n1 = 160 samples and a validation set of n2 = 40 samples for 10 times. [sent-179, score-0.022]
67 Table 1 summarizes our experimental results averaged over 200 simulations. [sent-181, score-0.031]
68 We see that EPIC outperforms the competing estimators throughout all settings. [sent-182, score-0.139]
69 To evaluate the performance in model selection, we calculate the ROC curve of each obtained regularization path. [sent-183, score-0.044]
70 Figure 2 summarizes ROC curves of all methods averaged over 200 simulations. [sent-184, score-0.031]
71 We see that EPIC also outperforms the competing estimators throughout all settings. [sent-185, score-0.139]
72 6 Real Data Example To illustrate the effectiveness of the proposed EPIC method, we adopt the breast cancer data2 , which is analyzed in Hess et al. [sent-186, score-0.157]
73 The data set contains 133 subjects with 22,283 gene expression levels. [sent-188, score-0.079]
74 Existing results have shown that the pCR subjects have higher chance of cancer-free survival in the long term than the RD subject. [sent-190, score-0.044]
75 05 False Positive Rate (i) d = 400 Figure 2: Average ROC curves of different methods on the chain (a-c), Erd¨ s-R´ nyi (d-e), and scaleo e free (f-h) models. [sent-330, score-0.132]
76 We can see that EPIC uniformly outperforms the competing estimators throughout all settings. [sent-331, score-0.139]
77 We assume that the gene expression data in each category is elliptical distributed, and the two categories have the same covariance matrix Σ but different means µ(k) , where k = 0 for RD and k = 1 for pCR. [sent-334, score-0.286]
78 (2011), the sample mean is adopted to estimate µ(k) ’s, and CLIME. [sent-336, score-0.034]
79 In contrast, we adopt the Catoni’s M-estimator to estimate µk ’s, and EPIC is adopted to estimate Ω. [sent-338, score-0.075]
80 For the tuning parameter selection, we use a 5-fold cross validation on the training data to pick λ with the minimum classification error rate. [sent-342, score-0.056]
81 More specifically, let yi ’s and yi ’s be true labels and predicted labels 7 Table 1: Quantitive comparison of EPIC, GLASSO. [sent-344, score-0.104]
82 We see that EPIC outperforms the competing estimators o e throughout all settings. [sent-348, score-0.139]
83 1373) of the testing samples, we define Specificity = MCC = TN TP , Sensitivity = , TN + FP TP + FN TPTN − FPFN (TP + FP)(TP + FN)(TN + FP)(TN + FN) , where TP = I(yi = yi = 1), FP = i TN = I(yi = 0, yi = 1) i I(yi = yi = 0), FN = i I(yi = 1, yi = 0). [sent-499, score-0.208]
84 i Table 2 summarizes the performance of both methods over 100 replications. [sent-500, score-0.031]
85 Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. [sent-524, score-0.187]
86 Square-root lasso: pivotal recovery of sparse signals via conic programming. [sent-530, score-0.031]
87 A constrained 1 minimization approach to sparse precision matrix estimation. [sent-536, score-0.219]
88 Annales de l’Institut Henri Poincar´ , Probabilit´ s et Statistiques 48 1148–1185. [sent-547, score-0.043]
89 Pharmacogenomic predictor of sensitivity to preoperative chemotherapy with paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide in breast cancer. [sent-594, score-0.133]
90 Tiger: A tuning-insensitive approach for optimally estimating gaussian graphical models. [sent-618, score-0.064]
91 High dimensional inverse covariance matrix estimation via linear programming. [sent-640, score-0.216]
92 Model selection and estimation in the gaussian graphical model. [sent-645, score-0.143]
93 The huge package for high-dimensional undirected graph estimation in r. [sent-653, score-0.055]
94 Positive semidefinite rank-based correlation matrix estimation with application to semiparametric graph estimation. [sent-659, score-0.218]
wordName wordTfidf (topN-words)
[('epic', 0.776), ('catoni', 0.155), ('iu', 0.15), ('precision', 0.136), ('kj', 0.124), ('elliptical', 0.116), ('pcr', 0.115), ('clime', 0.102), ('kendall', 0.101), ('maxj', 0.099), ('nyi', 0.096), ('estimator', 0.091), ('tau', 0.091), ('rd', 0.089), ('mcc', 0.088), ('false', 0.087), ('tp', 0.084), ('covariance', 0.083), ('erd', 0.081), ('uan', 0.075), ('fang', 0.073), ('fp', 0.07), ('cai', 0.069), ('exj', 0.066), ('akj', 0.066), ('tn', 0.066), ('semiparametric', 0.063), ('calibration', 0.059), ('glasso', 0.057), ('estimation', 0.055), ('matrix', 0.052), ('hao', 0.052), ('yi', 0.052), ('transformed', 0.051), ('rate', 0.05), ('adj', 0.05), ('afferty', 0.05), ('bbt', 0.05), ('exk', 0.05), ('preoperative', 0.05), ('quantitive', 0.05), ('zkj', 0.05), ('fn', 0.05), ('multivariate', 0.049), ('correlation', 0.048), ('breast', 0.046), ('regularization', 0.044), ('positive', 0.044), ('subjects', 0.044), ('et', 0.043), ('estimators', 0.042), ('subgaussian', 0.041), ('uq', 0.041), ('adopt', 0.041), ('roc', 0.041), ('norm', 0.039), ('city', 0.039), ('competing', 0.039), ('adopts', 0.038), ('max', 0.037), ('sensitivity', 0.037), ('kotz', 0.036), ('roeder', 0.036), ('graphical', 0.036), ('chain', 0.036), ('gene', 0.035), ('tuning', 0.034), ('adopted', 0.034), ('wasserman', 0.034), ('sparse', 0.031), ('vj', 0.031), ('summarizes', 0.031), ('calibrated', 0.03), ('pearson', 0.03), ('bu', 0.03), ('argmin', 0.029), ('th', 0.029), ('throughout', 0.029), ('outperforms', 0.029), ('gaussian', 0.028), ('drawbacks', 0.028), ('liu', 0.028), ('cancer', 0.027), ('possess', 0.027), ('rates', 0.027), ('banerjee', 0.026), ('wang', 0.026), ('spectral', 0.026), ('inverse', 0.026), ('asymptotically', 0.025), ('ensemble', 0.024), ('selection', 0.024), ('yuan', 0.024), ('numerical', 0.023), ('zhao', 0.023), ('remark', 0.023), ('validation', 0.022), ('graphic', 0.022), ('belloni', 0.022), ('cambanis', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
Author: Tuo Zhao, Han Liu
Abstract: We propose a semiparametric method for estimating sparse precision matrix of high dimensional elliptical distribution. The proposed method calibrates regularizations when estimating each column of the precision matrix. Thus it not only is asymptotically tuning free, but also achieves an improved finite sample performance. Theoretically, we prove that the proposed method achieves the parametric rates of convergence in both parameter estimation and model selection. We present numerical results on both simulated and real datasets to support our theory and illustrate the effectiveness of the proposed estimator. 1
2 0.11717467 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
Author: Huahua Wang, Arindam Banerjee, Cho-Jui Hsieh, Pradeep Ravikumar, Inderjit Dhillon
Abstract: We consider the problem of sparse precision matrix estimation in high dimensions using the CLIME estimator, which has several desirable theoretical properties. We present an inexact alternating direction method of multiplier (ADMM) algorithm for CLIME, and establish rates of convergence for both the objective and optimality conditions. Further, we develop a large scale distributed framework for the computations, which scales to millions of dimensions and trillions of parameters, using hundreds of cores. The proposed framework solves CLIME in columnblocks and only involves elementwise operations and parallel matrix multiplications. We evaluate our algorithm on both shared-memory and distributed-memory architectures, which can use block cyclic distribution of data and parameters to achieve load balance and improve the efficiency in the use of memory hierarchies. Experimental results show that our algorithm is substantially more scalable than state-of-the-art methods and scales almost linearly with the number of cores. 1
3 0.109702 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
4 0.084352829 225 nips-2013-One-shot learning and big data with n=2
Author: Lee H. Dicker, Dean P. Foster
Abstract: We model a “one-shot learning” situation, where very few observations y1 , ..., yn ∈ R are available. Associated with each observation yi is a very highdimensional vector xi ∈ Rd , which provides context for yi and enables us to predict subsequent observations, given their own context. One of the salient features of our analysis is that the problems studied here are easier when the dimension of xi is large; in other words, prediction becomes easier when more context is provided. The proposed methodology is a variant of principal component regression (PCR). Our rigorous analysis sheds new light on PCR. For instance, we show that classical PCR estimators may be inconsistent in the specified setting, unless they are multiplied by a scalar c > 1; that is, unless the classical estimator is expanded. This expansion phenomenon appears to be somewhat novel and contrasts with shrinkage methods (c < 1), which are far more common in big data analyses. 1
5 0.077290617 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
Author: Fang Han, Han Liu
Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1
6 0.077024437 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
7 0.070404112 109 nips-2013-Estimating LASSO Risk and Noise Level
8 0.065133624 91 nips-2013-Dirty Statistical Models
9 0.058263876 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
10 0.057729542 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
11 0.056461085 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
12 0.047311738 127 nips-2013-Generalized Denoising Auto-Encoders as Generative Models
13 0.045410726 327 nips-2013-The Randomized Dependence Coefficient
14 0.044486597 18 nips-2013-A simple example of Dirichlet process mixture inconsistency for the number of components
15 0.044406109 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
16 0.044254586 185 nips-2013-Matrix Completion From any Given Set of Observations
17 0.044245578 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
18 0.043068789 217 nips-2013-On Poisson Graphical Models
19 0.042449884 269 nips-2013-Regression-tree Tuning in a Streaming Setting
20 0.042315431 123 nips-2013-Flexible sampling of discrete data correlations without the marginal distributions
topicId topicWeight
[(0, 0.126), (1, 0.061), (2, 0.045), (3, 0.043), (4, -0.004), (5, 0.054), (6, -0.006), (7, 0.036), (8, -0.092), (9, 0.006), (10, 0.028), (11, -0.002), (12, -0.039), (13, -0.007), (14, -0.038), (15, -0.012), (16, 0.035), (17, 0.029), (18, -0.034), (19, 0.072), (20, 0.024), (21, 0.028), (22, -0.025), (23, 0.005), (24, 0.062), (25, -0.02), (26, -0.009), (27, 0.004), (28, -0.006), (29, -0.016), (30, 0.034), (31, -0.022), (32, -0.043), (33, -0.014), (34, 0.007), (35, -0.067), (36, -0.038), (37, -0.045), (38, 0.054), (39, 0.085), (40, 0.1), (41, -0.05), (42, 0.042), (43, 0.033), (44, -0.046), (45, 0.072), (46, -0.067), (47, -0.012), (48, -0.008), (49, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.87875515 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
Author: Tuo Zhao, Han Liu
Abstract: We propose a semiparametric method for estimating sparse precision matrix of high dimensional elliptical distribution. The proposed method calibrates regularizations when estimating each column of the precision matrix. Thus it not only is asymptotically tuning free, but also achieves an improved finite sample performance. Theoretically, we prove that the proposed method achieves the parametric rates of convergence in both parameter estimation and model selection. We present numerical results on both simulated and real datasets to support our theory and illustrate the effectiveness of the proposed estimator. 1
2 0.79905266 225 nips-2013-One-shot learning and big data with n=2
Author: Lee H. Dicker, Dean P. Foster
Abstract: We model a “one-shot learning” situation, where very few observations y1 , ..., yn ∈ R are available. Associated with each observation yi is a very highdimensional vector xi ∈ Rd , which provides context for yi and enables us to predict subsequent observations, given their own context. One of the salient features of our analysis is that the problems studied here are easier when the dimension of xi is large; in other words, prediction becomes easier when more context is provided. The proposed methodology is a variant of principal component regression (PCR). Our rigorous analysis sheds new light on PCR. For instance, we show that classical PCR estimators may be inconsistent in the specified setting, unless they are multiplied by a scalar c > 1; that is, unless the classical estimator is expanded. This expansion phenomenon appears to be somewhat novel and contrasts with shrinkage methods (c < 1), which are far more common in big data analyses. 1
3 0.67605913 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
Author: Daniel Bartz, Klaus-Robert Müller
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. 1
4 0.66121686 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model
Author: Fang Han, Han Liu
Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1
5 0.60340732 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
6 0.5990696 109 nips-2013-Estimating LASSO Risk and Noise Level
7 0.59755307 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
8 0.58271599 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
9 0.57651424 209 nips-2013-New Subsampling Algorithms for Fast Least Squares Regression
10 0.5647583 73 nips-2013-Convex Relaxations for Permutation Problems
11 0.5469771 327 nips-2013-The Randomized Dependence Coefficient
12 0.5393852 120 nips-2013-Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
13 0.53786355 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
14 0.528781 146 nips-2013-Large Scale Distributed Sparse Precision Estimation
15 0.5284152 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions
16 0.52397436 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
17 0.51608604 256 nips-2013-Probabilistic Principal Geodesic Analysis
18 0.51424825 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
19 0.50964504 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
20 0.5045259 284 nips-2013-Robust Spatial Filtering with Beta Divergence
topicId topicWeight
[(16, 0.043), (33, 0.155), (34, 0.065), (41, 0.011), (49, 0.069), (56, 0.068), (70, 0.018), (75, 0.295), (85, 0.023), (89, 0.065), (93, 0.054), (95, 0.018), (99, 0.022)]
simIndex simValue paperId paperTitle
1 0.75339442 21 nips-2013-Action from Still Image Dataset and Inverse Optimal Control to Learn Task Specific Visual Scanpaths
Author: Stefan Mathe, Cristian Sminchisescu
Abstract: Human eye movements provide a rich source of information into the human visual information processing. The complex interplay between the task and the visual stimulus is believed to determine human eye movements, yet it is not fully understood, making it difficult to develop reliable eye movement prediction systems. Our work makes three contributions towards addressing this problem. First, we complement one of the largest and most challenging static computer vision datasets, VOC 2012 Actions, with human eye movement recordings collected under the primary task constraint of action recognition, as well as, separately, for context recognition, in order to analyze the impact of different tasks. Our dataset is unique among the eyetracking datasets of still images in terms of large scale (over 1 million fixations recorded in 9157 images) and different task controls. Second, we propose Markov models to automatically discover areas of interest (AOI) and introduce novel sequential consistency metrics based on them. Our methods can automatically determine the number, the spatial support and the transitions between AOIs, in addition to their locations. Based on such encodings, we quantitatively show that given unconstrained read-world stimuli, task instructions have significant influence on the human visual search patterns and are stable across subjects. Finally, we leverage powerful machine learning techniques and computer vision features in order to learn task-sensitive reward functions from eye movement data within models that allow to effectively predict the human visual search patterns based on inverse optimal control. The methodology achieves state of the art scanpath modeling results. 1
same-paper 2 0.72265971 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
Author: Tuo Zhao, Han Liu
Abstract: We propose a semiparametric method for estimating sparse precision matrix of high dimensional elliptical distribution. The proposed method calibrates regularizations when estimating each column of the precision matrix. Thus it not only is asymptotically tuning free, but also achieves an improved finite sample performance. Theoretically, we prove that the proposed method achieves the parametric rates of convergence in both parameter estimation and model selection. We present numerical results on both simulated and real datasets to support our theory and illustrate the effectiveness of the proposed estimator. 1
3 0.68286639 260 nips-2013-RNADE: The real-valued neural autoregressive density-estimator
Author: Benigno Uria, Iain Murray, Hugo Larochelle
Abstract: We introduce RNADE, a new model for joint density estimation of real-valued vectors. Our model calculates the density of a datapoint as the product of onedimensional conditionals modeled using mixture density networks with shared parameters. RNADE learns a distributed representation of the data, while having a tractable expression for the calculation of densities. A tractable likelihood allows direct comparison with other methods and training by standard gradientbased optimizers. We compare the performance of RNADE on several datasets of heterogeneous and perceptual data, finding it outperforms mixture models in all but one case. 1
4 0.67613763 347 nips-2013-Variational Planning for Graph-based MDPs
Author: Qiang Cheng, Qiang Liu, Feng Chen, Alex Ihler
Abstract: Markov Decision Processes (MDPs) are extremely useful for modeling and solving sequential decision making problems. Graph-based MDPs provide a compact representation for MDPs with large numbers of random variables. However, the complexity of exactly solving a graph-based MDP usually grows exponentially in the number of variables, which limits their application. We present a new variational framework to describe and solve the planning problem of MDPs, and derive both exact and approximate planning algorithms. In particular, by exploiting the graph structure of graph-based MDPs, we propose a factored variational value iteration algorithm in which the value function is first approximated by the multiplication of local-scope value functions, then solved by minimizing a Kullback-Leibler (KL) divergence. The KL divergence is optimized using the belief propagation algorithm, with complexity exponential in only the cluster size of the graph. Experimental comparison on different models shows that our algorithm outperforms existing approximation algorithms at finding good policies. 1
5 0.67396349 166 nips-2013-Learning invariant representations and applications to face verification
Author: Qianli Liao, Joel Z. Leibo, Tomaso Poggio
Abstract: One approach to computer object recognition and modeling the brain’s ventral stream involves unsupervised learning of representations that are invariant to common transformations. However, applications of these ideas have usually been limited to 2D affine transformations, e.g., translation and scaling, since they are easiest to solve via convolution. In accord with a recent theory of transformationinvariance [1], we propose a model that, while capturing other common convolutional networks as special cases, can also be used with arbitrary identitypreserving transformations. The model’s wiring can be learned from videos of transforming objects—or any other grouping of images into sets by their depicted object. Through a series of successively more complex empirical tests, we study the invariance/discriminability properties of this model with respect to different transformations. First, we empirically confirm theoretical predictions (from [1]) for the case of 2D affine transformations. Next, we apply the model to non-affine transformations; as expected, it performs well on face verification tasks requiring invariance to the relatively smooth transformations of 3D rotation-in-depth and changes in illumination direction. Surprisingly, it can also tolerate clutter “transformations” which map an image of a face on one background to an image of the same face on a different background. Motivated by these empirical findings, we tested the same model on face verification benchmark tasks from the computer vision literature: Labeled Faces in the Wild, PubFig [2, 3, 4] and a new dataset we gathered—achieving strong performance in these highly unconstrained cases as well. 1
6 0.58811331 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization
8 0.56252134 83 nips-2013-Deep Fisher Networks for Large-Scale Image Classification
9 0.56184071 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
10 0.56138259 331 nips-2013-Top-Down Regularization of Deep Belief Networks
11 0.56020272 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
12 0.55866152 237 nips-2013-Optimal integration of visual speed across different spatiotemporal frequency channels
13 0.55631775 345 nips-2013-Variance Reduction for Stochastic Gradient Optimization
14 0.55542451 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
15 0.55324864 335 nips-2013-Transfer Learning in a Transductive Setting
16 0.55244875 301 nips-2013-Sparse Additive Text Models with Low Rank Background
17 0.55233508 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
18 0.55168569 200 nips-2013-Multi-Prediction Deep Boltzmann Machines
19 0.55047303 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
20 0.54998034 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking