nips nips2013 nips2013-130 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel Bartz, Klaus-Robert Müller
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. [sent-5, score-0.918]
2 We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. [sent-6, score-0.212]
3 We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. [sent-7, score-0.394]
4 In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. [sent-8, score-1.11]
5 Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. [sent-9, score-0.197]
6 1 Introduction The estimation of covariance matrices is the basis of many machine learning algorithms and estimation procedures in statistics. [sent-10, score-0.192]
7 The standard estimator is the sample covariance matrix: its entries are unbiased and consistent [1]. [sent-11, score-0.237]
8 A well-known shortcoming of the sample covariance is the systematic error in the spectrum. [sent-12, score-0.196]
9 In particular for high dimensional data, where dimensionality p and number of observations n are often of the same order, large eigenvalues are over- und small eigenvalues underestimated. [sent-13, score-0.214]
10 A form of regularization which can alleviate this bias is shrinkage [2]: the convex combination of the sample covariance matrix S and a multiple of the identity T = p−1 trace(S)I, Csh = (1 − λ)S + λT, (1) has potentially lower mean squared error and lower bias in the spectrum [3]. [sent-14, score-0.895]
11 The standard procedure for chosing an optimal regularization for shrinkage is cross-validation [4], which is known to be time consuming. [sent-15, score-0.643]
12 Recently, analytic shrinkage [3] which provides a consistent analytic formula for the above regularization parameter λ has become increasingly popular. [sent-17, score-1.023]
13 The consistency of analytic shrinkage relies on assumptions which are rarely tested in practice [5]. [sent-19, score-0.988]
14 This paper will therefore aim to render the analytic shrinkage framework more practical and usable for real world data. [sent-20, score-0.931]
15 We contribute in three aspects: first, we derive simple tests for the applicability of the analytic shrinkage framework and observe that for many data sets of practical relevance the assumptions which underly consistency are not fullfilled. [sent-21, score-0.962]
16 Second, we design assumptions which better fit the statistical properties observed in real world data which typically has a low dimensional structure. [sent-22, score-0.144]
17 Under these new assumptions, we prove consistency of analytic shrinkage. [sent-23, score-0.273]
18 We show a counter-intuitive result: for typical covariance structures, no shrinkage –and therefore no regularization– takes place in the limit of high dimensionality and number of observations. [sent-24, score-0.914]
19 In practice, this leads to weak shrinkage and degrading performance. [sent-25, score-0.643]
20 Therefore, third, we propose an extension of the shrinkage framework: automatic orthogonal complement shrinkage (aoc-shrinkage) 1 takes the covariance structure into account and outperforms standard shrinkage on real world data at a moderate increase in computation time. [sent-26, score-2.447]
21 2 Overview of analytic shrinkage To derive analytic shrinkage, the expected mean squared error of the shrinkage covariance matrix eq. [sent-28, score-1.889]
22 (1) as an estimator of the true covariance matrix C is minimized: 2 C − (1 − λ)S − λT λ = arg min R(λ) := arg min E λ λ λ = i,j + λ2 E 2λ Cov Sij , Tij − Var Sij = arg min (2) Sij − Tij 2 + Var Sij (3) i,j Var Sij − Cov Sij , Tij i,j E Sij − Tij . [sent-29, score-0.235]
23 Xn denotes a pn × n matrix of n iid observations of pn variables with mean zero and covariance matrix Σn . [sent-31, score-0.306]
24 Yn = ΓT Xn denotes the same observations in their eigenbasis, having n n diagonal covariance Λn = ΓT Σn Γn . [sent-32, score-0.194]
25 The main theoretical result on the estimator λ is its consistency in the large n, p limit [3]. [sent-34, score-0.167]
26 A decisive role is played by an assumption on the eighth moments2 in the eigenbasis: Assumption 2 (A2, Ledoit/Wolf 2004 [3]). [sent-35, score-0.096]
27 i=1 3 Implicit assumptions on the covariance structure From the assumption on the eighth moments in the eigenbasis, we derive requirements on the eigenvalues which facilitate an empirical check: Theorem 1 (largest eigenvalue growth rate). [sent-37, score-0.566]
28 Then, there exists a limit on the growth rate of the largest eigenvalue n n γ1 = max Var(yi ) = O p1/4 . [sent-39, score-0.235]
29 Then, there exists a limit on the growth rate of the normalized eigenvalue dispersion dn = p−1 n (γi − p−1 n i γj )2 = O (1) . [sent-42, score-0.541]
30 eighth moments arise because Var(Sij ), the variance of the sample covariance, is of fourth order and has to converge. [sent-44, score-0.171]
31 2 2 model A model B dispersion and largest EV 4 40 model B 20 100 10 50 sample dispersion max(EV) 3. [sent-46, score-0.694]
32 5 10 2 100 200 300 400 0 500 0 100 200 300 400 max(EV) covariance matrices normalized sample dispersion model A 0 500 dimensionality Figure 1: Covariance matrices and dependency of the largest eigenvalue/dispersion on the dimensionality. [sent-48, score-0.736]
33 The theorems restrict the covariance structure of the sequence of models when the dimensionality increases. [sent-52, score-0.228]
34 To illustrate this, we design two sequences of models A and B indexed by their dimensionality p, in which dimensions xp are correlated with a signal sp : i xp = i (0. [sent-53, score-0.191]
35 5 + bp ) · εp + αcp sp , with probability PsA/B (i), i i i (0. [sent-54, score-0.075]
36 i i (4) where bp and cp are uniform random from [0, 1], sp and p are standard normal, α = 1, PsB (i) = 0. [sent-56, score-0.102]
37 To the left in Figure 1, covariance matrices are shown: For model A, the matrix is dense in the upper left corner, the more dimensions we add the more sparse the matrix gets. [sent-59, score-0.246]
38 To the right, normalized sample dispersion and largest eigenvalue are shown. [sent-61, score-0.506]
39 For model A, we see the behaviour from the theorems: the dispersion is bounded, the largest eigenvalue grows with the fourth root. [sent-62, score-0.526]
40 For model B, there is a linear dependency of both dispersion and largest eigenvalue: A2 is violated. [sent-63, score-0.388]
41 For real world data, we measure the dependency of the largest eigenvalue/dispersion on the dimensionality by averaging over random subsets. [sent-64, score-0.237]
42 Figure 2 shows the results for four data sets3 : (1) New York Stock Exchange, (2) USPS hand-written digits, (3) ISOLET spoken letters and (4) a Brain Computer Interface EEG data set. [sent-65, score-0.148]
43 The largest eigenvalues and the normalized dispersions (see Figure 2) closely resemble model B; a linear dependence on the dimensionality which violates A2 is visible. [sent-66, score-0.21]
44 3 4 Analytic shrinkage for arbitrary covariance structures We replace A2 by a weaker assumption on the moments in the basis of the observations X which does not impose any constraints on the covariance structure4 : Assumption 2 (A2 ). [sent-68, score-1.079]
45 i1 −1 p i=1 Standard assumptions For the proof of consistency, the relationship between dimensionality and number of observations has to be defined and a weak restriction on the correlation of the products of uncorrelated variables is necessary. [sent-70, score-0.134]
46 Additional Assumptions A1 to A3 subsume a wide range of dispersion and eigenvalue configurations. [sent-76, score-0.391]
47 It will prove essential for the limit behavior of optimal shrinkage and the consistency of analytic shrinkage: Assumption 4 (A4, growth rate of the normalized dispersion). [sent-78, score-1.066]
48 Then, the limit behaviour of the normalized dispersion is parameterized by k: p−1 (γi − p−1 γj )2 = Θ max(1, p2k−1 ) , i j where Θ is the Landau Theta. [sent-80, score-0.457]
49 5 the normalized dispersion is bounded from above and below, as in model A in the last section. [sent-82, score-0.351]
50 5 the normalized dispersion grows with the dimensionality, for k = 1 it is linear in p, as in model B. [sent-84, score-0.351]
51 i1 p−1 i=1 Second, we assume that limits on the relation between second, fourth and eighth moments exist: Assumption 6 (A6, moment relation). [sent-88, score-0.142]
52 4 Figure 3: Illustration of orthogonal complement shrinkage. [sent-90, score-0.198]
53 Theoretical results on limit behaviour and consistency We are able to derive a novel theorem which shows that under these wider assumptions, shrinkage remains consistent: Theorem 3 (Consistency of Shrinkage). [sent-91, score-0.832]
54 p→∞ An unexpected caveat accompanying this result is the limit behaviour of the optimal shrinkage strength λ∗ : Theorem 4 (Limit behaviour). [sent-94, score-0.774]
55 5 ∀n : bl ≤ λ∗ ≤ bu lim λ∗ = 0 ⇒ ⇒ p→∞ The theorem shows that there is a fundamental problem with analytic shrinkage: if k is larger than 0. [sent-98, score-0.292]
56 5 (all data sets in the last section had k = 1) there is no shrinkage in the limit. [sent-99, score-0.643]
57 5 Automatic orthogonal complement shrinkage Orthogonal complement shrinkage To obtain a finite shrinkage strength, we propose an extension of shrinkage we call oc-shrinkage: it leaves the first eigendirection untouched and performs shrinkage on the orthogonal complement oc of that direction. [sent-100, score-4.028]
58 It shows a three dimensional true covariance matrix with a high dispersion that makes it highly ellipsoidal. [sent-102, score-0.504]
59 The result is a high level of discrepancy between the spherical shrinkage target and the true covariance. [sent-103, score-0.693]
60 The best convex combination of target and sample covariance will put extremely low weight on the target. [sent-104, score-0.196]
61 The situation is different in the orthogonal complement of the first eigendirection of the sample covariance matrix: there, the discrepancy between sample covariance and target is strongly reduced. [sent-105, score-0.66]
62 To simplify the theoretical analysis, let us consider the case where there is only a single growing eigenvalue while the remainder stays bounded: 5 Assumption 4 (A4 single large eigenvalue). [sent-106, score-0.081]
63 8 There exist constants Fl and Fu such that Fl ≤ E[zi ] ≤ Fu A recent result from Random Matrix Theory [6] allows us to prove that the projection on the empirˆ ical orthogonal complement oc does not affect the consistency of the estimator λoc : Theorem 5 (consistency of oc-shrinkage). [sent-108, score-0.539]
64 Then, independently of k, 2 lim p→∞ ˆ λoc − arg min Qoc (λ) λ = 0, where Q denotes the mean squared error (MSE) of the convex combination (cmp. [sent-111, score-0.07]
65 Automatic model selection Orthogonal complement shrinkage only yields an advantage if the first eigenvalue is large enough. [sent-114, score-0.834]
66 (2), we can consistently estimate the error of standard shrinkage and orthogonal complement shrinkage and only use oc-shrinkage when the difference ∆R,oc is positive. [sent-116, score-1.484]
67 It is straightforward to iterate the procedure and thus automatically select the number of retained eigendirections r. [sent-124, score-0.09]
68 The computational cost of aoc-shrinkage is larger than that of standard shrinkage as it additionally requires an eigendecomposition O(p3 ) and some matrix multiplications O(ˆp2 ). [sent-127, score-0.699]
69 In the applications r considered here, this additional cost is negligible: r ˆ p and the eigendecomposition can replace matrix inversions for LDA, QDA or portfolio optimization. [sent-128, score-0.175]
70 4 1 10 2 10 dimensionality p Figure 4: Automatic selection of the number of eigendirections. [sent-135, score-0.061]
71 Mean absolute deviations·103 (mean squared deviations·106 ) of the resulting portfolios for the different covariance estimators and markets. [sent-144, score-0.196]
72 † := aoc-shrinkage significantly better than this model at the 5% level, tested by a randomization test. [sent-145, score-0.06]
73 ∗ := significantly better than all compared methods at the 5% level, tested by a randomization test. [sent-183, score-0.06]
74 07% of standard shrinkage, oc-shrinkage for one to four eigendirections and aoc-shrinkage. [sent-220, score-0.09]
75 ˆ Standard shrinkage behaves as predicted by Theorem 4: λ and therefore the PRIAL tend to zero in the large n, p limit. [sent-221, score-0.643]
76 For small dimensionalities eigenvalues are small and therefore there is no advantage for oc-shrinkage. [sent-223, score-0.063]
77 On the contrary, the higher the order of oc-shrinkage, the larger the error by projecting out spurious large eigenvalues which should have been subject to regularization. [sent-224, score-0.063]
78 Real world data I: portfolio optimization Covariance estimates are needed for the minimization of portfolio risk [7]. [sent-226, score-0.309]
79 Table 1 shows portfolio risk for approximately eight years of daily return data from 1200 US, 600 European and 100 Hong Kong stocks, aggregated from Reuters tick data [8]. [sent-227, score-0.119]
80 Estimation of covariance matrices is based on short time windows (150 days) because of the data’s nonstationarity. [sent-228, score-0.192]
81 Despite the unfavorable ratio of observations to dimensionality, standard shrinkage ˆ has very low values of λ: the stocks are highly correlated and the spherical target is highly inappropriate. [sent-229, score-0.726]
82 Shrinkage to a financial factor model incorporating the market factor [9] provides a better target; it leads to stronger shrinkage and better portfolios. [sent-230, score-0.643]
83 Our proposed aoc-shrinkage yields even stronger shrinkage and significantly outperforms all compared methods. [sent-231, score-0.643]
84 2532 Figure 5: High variance components responsible for failure of shrinkage in BCI. [sent-271, score-0.643]
85 Table 2 shows that aoc-shrinkage outperforms standard shrinkage for QDA and LDA on both data sets for different training set sizes. [sent-275, score-0.643]
86 Only for LDA and large sample sizes on the relatively low dimensional USPS data, there is no difference between standard and aoc-shrinkage: the automatic procedure decides that shrinkage on the whole space is optimal. [sent-276, score-0.727]
87 Real world data III: Brain-Computer-Interface The BCI data was recorded in a study in which 11 subjects had to distinguish between noisy and noise-free phonemes [13, 14]. [sent-277, score-0.107]
88 With and without noise, our proposed aoc-shrinkage outperforms standard shrinkage LDA. [sent-280, score-0.643]
89 With injected noise, the number of directions increases to r ≈ 3, as the procedure detects the additional high variance component –to the right ˆ in Figure 5– and adapts the shrinkage procedure such that performance remains unaffected. [sent-282, score-0.677]
90 For standard shrinkage, noise affects the analytic regularization and performance degrades as a result. [sent-283, score-0.216]
91 7 Discussion Analytic shrinkage is a fast and accurate alternative to cross-validation which yields comparable performance, e. [sent-284, score-0.643]
92 This paper has contributed by clarifying the (limited) applicability of the analytic shrinkage formula. [sent-287, score-0.833]
93 In particular we could show that its assumptions are often violated in practice since real world data has complex structured dependencies. [sent-288, score-0.144]
94 We therefore introduced a set of more general assumptions to shrinkage theory, chosen such that the appealing consistency properties of analytic shrinkage are preserved. [sent-289, score-1.605]
95 We have shown that for typcial structure in real world data, strong eigendirections adversely affect shrinkage by driving the shrinkage strength to zero. [sent-290, score-1.499]
96 Therefore, finally, we have proposed an algorithm which automatically restricts shrinkage to the orthogonal complement of the strongest eigendirections if appropriate. [sent-291, score-0.931]
97 This leads to improved robustness and significant performance enhancement in simulations and on real world data from the domains of finance, spoken letter and optical character recognition, and neuroscience. [sent-292, score-0.295]
98 A shrinkage approach to large-scale covariance matrix estia mation and implications for functional genomics. [sent-314, score-0.837]
99 Directional u Variance Adjustment: Bias reduction in covariance matrices based on factor analysis with an application to portfolio optimization. [sent-324, score-0.311]
100 Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. [sent-327, score-0.351]
wordName wordTfidf (topN-words)
[('shrinkage', 0.643), ('dispersion', 0.31), ('oc', 0.217), ('analytic', 0.19), ('covariance', 0.167), ('sij', 0.159), ('qda', 0.134), ('lda', 0.13), ('spoken', 0.12), ('portfolio', 0.119), ('isolet', 0.119), ('complement', 0.11), ('usps', 0.098), ('tij', 0.093), ('aoc', 0.09), ('eigendirections', 0.09), ('prial', 0.09), ('orthogonal', 0.088), ('consistency', 0.083), ('eigenvalue', 0.081), ('ev', 0.075), ('world', 0.071), ('eighth', 0.068), ('growth', 0.066), ('eigenvalues', 0.063), ('behaviour', 0.063), ('dimensionality', 0.061), ('eigenbasis', 0.059), ('var', 0.058), ('automatic', 0.055), ('anne', 0.055), ('bci', 0.051), ('yi', 0.051), ('korea', 0.049), ('letter', 0.048), ('cov', 0.047), ('moments', 0.047), ('assumptions', 0.046), ('xp', 0.046), ('antons', 0.045), ('eigendirection', 0.045), ('kerstin', 0.045), ('porbadnigk', 0.045), ('schleicher', 0.045), ('untouched', 0.045), ('largest', 0.045), ('qp', 0.043), ('limit', 0.043), ('normalized', 0.041), ('ller', 0.041), ('discriminant', 0.041), ('lim', 0.041), ('estimator', 0.041), ('bartz', 0.04), ('ledoit', 0.04), ('treder', 0.04), ('sp', 0.038), ('stock', 0.038), ('bp', 0.037), ('berlin', 0.037), ('blankertz', 0.036), ('ntrain', 0.036), ('phonemes', 0.036), ('injected', 0.034), ('bl', 0.034), ('gabriel', 0.034), ('randomization', 0.034), ('dependency', 0.033), ('digits', 0.031), ('eeg', 0.031), ('stocks', 0.031), ('fl', 0.031), ('squared', 0.029), ('sample', 0.029), ('nance', 0.029), ('finance', 0.029), ('character', 0.029), ('eigendecomposition', 0.029), ('pn', 0.029), ('letters', 0.028), ('assumption', 0.028), ('matrix', 0.027), ('fu', 0.027), ('tu', 0.027), ('fourth', 0.027), ('real', 0.027), ('observations', 0.027), ('cp', 0.027), ('matthias', 0.027), ('bu', 0.027), ('sh', 0.027), ('tested', 0.026), ('sebastian', 0.026), ('degrades', 0.026), ('olivier', 0.026), ('strength', 0.025), ('discrepancy', 0.025), ('spherical', 0.025), ('robert', 0.025), ('matrices', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
Author: Daniel Bartz, Klaus-Robert Müller
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. 1
2 0.084051982 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
Author: Barbara Rakitsch, Christoph Lippert, Karsten Borgwardt, Oliver Stegle
Abstract: Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term in form of a sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives. 1
3 0.079926163 284 nips-2013-Robust Spatial Filtering with Beta Divergence
Author: Wojciech Samek, Duncan Blythe, Klaus-Robert Müller, Motoaki Kawanabe
Abstract: The efficiency of Brain-Computer Interfaces (BCI) largely depends upon a reliable extraction of informative features from the high-dimensional EEG signal. A crucial step in this protocol is the computation of spatial filters. The Common Spatial Patterns (CSP) algorithm computes filters that maximize the difference in band power between two conditions, thus it is tailored to extract the relevant information in motor imagery experiments. However, CSP is highly sensitive to artifacts in the EEG data, i.e. few outliers may alter the estimate drastically and decrease classification performance. Inspired by concepts from the field of information geometry we propose a novel approach for robustifying CSP. More precisely, we formulate CSP as a divergence maximization problem and utilize the property of a particular type of divergence, namely beta divergence, for robustifying the estimation of spatial filters in the presence of artifacts in the data. We demonstrate the usefulness of our method on toy data and on EEG recordings from 80 subjects. 1
4 0.073683053 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe
Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1
5 0.072367638 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
Author: Adel Javanmard, Andrea Montanari
Abstract: Fitting high-dimensional statistical models often requires the use of non-linear parameter estimation procedures. As a consequence, it is generally impossible to obtain an exact characterization of the probability distribution of the parameter estimates. This in turn implies that it is extremely challenging to quantify the uncertainty associated with a certain parameter estimate. Concretely, no commonly accepted procedure exists for computing classical measures of uncertainty and statistical significance as confidence intervals or p-values. We consider here a broad class of regression problems, and propose an efficient algorithm for constructing confidence intervals and p-values. The resulting confidence intervals have nearly optimal size. When testing for the null hypothesis that a certain parameter is vanishing, our method has nearly optimal power. Our approach is based on constructing a ‘de-biased’ version of regularized Mestimators. The new construction improves over recent work in the field in that it does not assume a special structure on the design matrix. Furthermore, proofs are remarkably simple. We test our method on a diabetes prediction problem. 1
6 0.066680834 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
7 0.066480458 109 nips-2013-Estimating LASSO Risk and Noise Level
8 0.060769737 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
9 0.059231058 188 nips-2013-Memory Limited, Streaming PCA
10 0.058143988 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
11 0.05581414 225 nips-2013-One-shot learning and big data with n=2
12 0.055437531 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
13 0.050953977 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models
14 0.050291076 91 nips-2013-Dirty Statistical Models
15 0.047375899 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
16 0.046483424 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
17 0.045641921 326 nips-2013-The Power of Asymmetry in Binary Hashing
18 0.045140568 224 nips-2013-On the Sample Complexity of Subspace Learning
19 0.043867204 201 nips-2013-Multi-Task Bayesian Optimization
20 0.042930521 55 nips-2013-Bellman Error Based Feature Generation using Random Projections on Sparse Spaces
topicId topicWeight
[(0, 0.124), (1, 0.054), (2, 0.035), (3, 0.041), (4, -0.013), (5, 0.018), (6, -0.014), (7, 0.038), (8, -0.045), (9, -0.001), (10, -0.002), (11, 0.037), (12, -0.045), (13, -0.027), (14, -0.013), (15, 0.007), (16, 0.021), (17, 0.003), (18, -0.054), (19, -0.002), (20, -0.011), (21, 0.008), (22, -0.058), (23, 0.032), (24, 0.011), (25, 0.02), (26, -0.002), (27, 0.012), (28, -0.075), (29, -0.067), (30, 0.007), (31, 0.052), (32, 0.002), (33, -0.014), (34, -0.013), (35, -0.032), (36, -0.039), (37, -0.05), (38, 0.081), (39, -0.009), (40, 0.015), (41, -0.038), (42, -0.028), (43, -0.009), (44, -0.016), (45, 0.035), (46, -0.006), (47, -0.021), (48, -0.053), (49, 0.095)]
simIndex simValue paperId paperTitle
same-paper 1 0.93934065 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
Author: Daniel Bartz, Klaus-Robert Müller
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. 1
2 0.68267757 145 nips-2013-It is all in the noise: Efficient multi-task Gaussian process inference with structured residuals
Author: Barbara Rakitsch, Christoph Lippert, Karsten Borgwardt, Oliver Stegle
Abstract: Multi-task prediction methods are widely used to couple regressors or classification models by sharing information across related tasks. We propose a multi-task Gaussian process approach for modeling both the relatedness between regressors and the task correlations in the residuals, in order to more accurately identify true sharing between regressors. The resulting Gaussian model has a covariance term in form of a sum of Kronecker products, for which efficient parameter inference and out of sample prediction are feasible. On both synthetic examples and applications to phenotype prediction in genetics, we find substantial benefits of modeling structured noise compared to established alternatives. 1
3 0.67526037 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration
Author: Tuo Zhao, Han Liu
Abstract: We propose a semiparametric method for estimating sparse precision matrix of high dimensional elliptical distribution. The proposed method calibrates regularizations when estimating each column of the precision matrix. Thus it not only is asymptotically tuning free, but also achieves an improved finite sample performance. Theoretically, we prove that the proposed method achieves the parametric rates of convergence in both parameter estimation and model selection. We present numerical results on both simulated and real datasets to support our theory and illustrate the effectiveness of the proposed estimator. 1
4 0.66960466 209 nips-2013-New Subsampling Algorithms for Fast Least Squares Regression
Author: Paramveer Dhillon, Yichao Lu, Dean P. Foster, Lyle Ungar
Abstract: We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O(np) and our best method, Uluru, gives an error bound of O( p/n) which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy. 1
5 0.65746486 284 nips-2013-Robust Spatial Filtering with Beta Divergence
Author: Wojciech Samek, Duncan Blythe, Klaus-Robert Müller, Motoaki Kawanabe
Abstract: The efficiency of Brain-Computer Interfaces (BCI) largely depends upon a reliable extraction of informative features from the high-dimensional EEG signal. A crucial step in this protocol is the computation of spatial filters. The Common Spatial Patterns (CSP) algorithm computes filters that maximize the difference in band power between two conditions, thus it is tailored to extract the relevant information in motor imagery experiments. However, CSP is highly sensitive to artifacts in the EEG data, i.e. few outliers may alter the estimate drastically and decrease classification performance. Inspired by concepts from the field of information geometry we propose a novel approach for robustifying CSP. More precisely, we formulate CSP as a divergence maximization problem and utilize the property of a particular type of divergence, namely beta divergence, for robustifying the estimation of spatial filters in the presence of artifacts in the data. We demonstrate the usefulness of our method on toy data and on EEG recordings from 80 subjects. 1
6 0.64715958 225 nips-2013-One-shot learning and big data with n=2
7 0.62084168 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
8 0.60422266 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis
9 0.59627694 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
10 0.57965094 327 nips-2013-The Randomized Dependence Coefficient
11 0.57960624 109 nips-2013-Estimating LASSO Risk and Noise Level
12 0.56980562 178 nips-2013-Locally Adaptive Bayesian Multivariate Time Series
13 0.54516435 290 nips-2013-Scoring Workers in Crowdsourcing: How Many Control Questions are Enough?
14 0.54180408 265 nips-2013-Reconciling "priors" & "priors" without prejudice?
15 0.53491354 197 nips-2013-Moment-based Uniform Deviation Bounds for $k$-means and Friends
16 0.5265739 153 nips-2013-Learning Feature Selection Dependencies in Multi-task Learning
17 0.5214712 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
18 0.51243424 120 nips-2013-Faster Ridge Regression via the Subsampled Randomized Hadamard Transform
19 0.51005191 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
20 0.50980866 297 nips-2013-Sketching Structured Matrices for Faster Nonlinear Regression
topicId topicWeight
[(2, 0.014), (5, 0.313), (16, 0.045), (33, 0.103), (34, 0.077), (41, 0.036), (49, 0.048), (56, 0.098), (70, 0.034), (85, 0.024), (89, 0.059), (93, 0.038), (99, 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.74515873 130 nips-2013-Generalizing Analytic Shrinkage for Arbitrary Covariance Structures
Author: Daniel Bartz, Klaus-Robert Müller
Abstract: Analytic shrinkage is a statistical technique that offers a fast alternative to crossvalidation for the regularization of covariance matrices and has appealing consistency properties. We show that the proof of consistency requires bounds on the growth rates of eigenvalues and their dispersion, which are often violated in data. We prove consistency under assumptions which do not restrict the covariance structure and therefore better match real world data. In addition, we propose an extension of analytic shrinkage –orthogonal complement shrinkage– which adapts to the covariance structure. Finally we demonstrate the superior performance of our novel approach on data from the domains of finance, spoken letter and optical character recognition, and neuroscience. 1
2 0.59369576 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
Author: Miao Xu, Rong Jin, Zhi-Hua Zhou
Abstract: In standard matrix completion theory, it is required to have at least O(n ln2 n) observed entries to perfectly recover a low-rank matrix M of size n × n, leading to a large number of observations when n is large. In many real tasks, side information in addition to the observed entries is often available. In this work, we develop a novel theory of matrix completion that explicitly explore the side information to reduce the requirement on the number of observed entries. We show that, under appropriate conditions, with the assistance of side information matrices, the number of observed entries needed for a perfect recovery of matrix M can be dramatically reduced to O(ln n). We demonstrate the effectiveness of the proposed approach for matrix completion in transductive incomplete multi-label learning. 1
3 0.50292999 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables
Author: Zhuo Wang, Alan Stocker, Daniel Lee
Abstract: In many neural systems, information about stimulus variables is often represented in a distributed manner by means of a population code. It is generally assumed that the responses of the neural population are tuned to the stimulus statistics, and most prior work has investigated the optimal tuning characteristics of one or a small number of stimulus variables. In this work, we investigate the optimal tuning for diffeomorphic representations of high-dimensional stimuli. We analytically derive the solution that minimizes the L2 reconstruction loss. We compared our solution with other well-known criteria such as maximal mutual information. Our solution suggests that the optimal weights do not necessarily decorrelate the inputs, and the optimal nonlinearity differs from the conventional equalization solution. Results illustrating these optimal representations are shown for some input distributions that may be relevant for understanding the coding of perceptual pathways. 1
4 0.50025219 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.
Author: Michel Besserve, Nikos K. Logothetis, Bernhard Schölkopf
Abstract: Many applications require the analysis of complex interactions between time series. These interactions can be non-linear and involve vector valued as well as complex data structures such as graphs or strings. Here we provide a general framework for the statistical analysis of these dependencies when random variables are sampled from stationary time-series of arbitrary objects. To achieve this goal, we study the properties of the Kernel Cross-Spectral Density (KCSD) operator induced by positive definite kernels on arbitrary input domains. This framework enables us to develop an independence test between time series, as well as a similarity measure to compare different types of coupling. The performance of our test is compared to the HSIC test using i.i.d. assumptions, showing improvements in terms of detection errors, as well as the suitability of this approach for testing dependency in complex dynamical systems. This similarity measure enables us to identify different types of interactions in electrophysiological neural time series. 1
5 0.5000388 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking
Author: David Carlson, Vinayak Rao, Joshua T. Vogelstein, Lawrence Carin
Abstract: With simultaneous measurements from ever increasing populations of neurons, there is a growing need for sophisticated tools to recover signals from individual neurons. In electrophysiology experiments, this classically proceeds in a two-step process: (i) threshold the waveforms to detect putative spikes and (ii) cluster the waveforms into single units (neurons). We extend previous Bayesian nonparametric models of neural spiking to jointly detect and cluster neurons using a Gamma process model. Importantly, we develop an online approximate inference scheme enabling real-time analysis, with performance exceeding the previous state-of-theart. Via exploratory data analysis—using data with partial ground truth as well as two novel data sets—we find several features of our model collectively contribute to our improved performance including: (i) accounting for colored noise, (ii) detecting overlapping spikes, (iii) tracking waveform dynamics, and (iv) using multiple channels. We hope to enable novel experiments simultaneously measuring many thousands of neurons and possibly adapting stimuli dynamically to probe ever deeper into the mysteries of the brain. 1
6 0.49923286 121 nips-2013-Firing rate predictions in optimal balanced networks
7 0.49821657 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits
8 0.49817511 304 nips-2013-Sparse nonnegative deconvolution for compressive calcium imaging: algorithms and phase transitions
10 0.49760181 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
11 0.4968414 249 nips-2013-Polar Operators for Structured Sparse Estimation
12 0.49575457 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
13 0.49513263 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
14 0.49470034 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit
15 0.49366975 350 nips-2013-Wavelets on Graphs via Deep Learning
16 0.4935801 101 nips-2013-EDML for Learning Parameters in Directed and Undirected Graphical Models
17 0.49347264 252 nips-2013-Predictive PAC Learning and Process Decompositions
18 0.49343255 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis
19 0.49337271 77 nips-2013-Correlations strike back (again): the case of associative memory retrieval
20 0.4931182 22 nips-2013-Action is in the Eye of the Beholder: Eye-gaze Driven Model for Spatio-Temporal Action Localization