nips nips2013 nips2013-283 knowledge-graph by maker-knowledge-mining

283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model


Source: pdf

Author: Fang Han, Han Liu

Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. [sent-3, score-0.387]

2 First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. [sent-5, score-0.778]

3 Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. [sent-6, score-0.49]

4 The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. [sent-7, score-0.216]

5 It allows the random vector to be heavy tailed and have tail dependence. [sent-8, score-0.115]

6 Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. [sent-10, score-0.261]

7 1 Introduction Principal component regression (PCR) has been widely used in statistics for years (Kendall, 1968). [sent-12, score-0.203]

8 Take the classical linear regression with random design for example. [sent-13, score-0.157]

9 The classical linear regression model and simple principal component regression model can be elaborated as follows: (Classical linear regression model) Y = Xβ + ; (Principal Component Regression Model) Y = αXu1 + , (1. [sent-18, score-0.677]

10 , xn )T ∈ Rn×d , Y ∈ Rn , ui is the i-th leading eigenvector of Σ, and ∈ Nn (0, σ 2 Id ) is independent of X, β ∈ Rd and α ∈ R. [sent-22, score-0.076]

11 The principal component regression then can be conducted in two steps: First we obtain an estimator u1 of u1 ; Secondly we project the data in the direction of u1 and solve a simple linear regression in estimating α. [sent-24, score-0.554]

12 1), it is easy to observe that the principal component regression model is a subset of the general linear regression (LR) model with the constraint that the regression coefficient β is proportional to u1 . [sent-26, score-0.607]

13 There has been a lot of discussions on the advantage of principal component regression over classical linear regression. [sent-27, score-0.434]

14 In low dimensional settings, Massy (1965) pointed out that principal component regression can be much more efficient in handling collinearity among predictors compared to the linear regression. [sent-28, score-0.434]

15 More recently, Cook (2007) and Artemiou and Li (2009) argued that principal component regression has potential to play a more important role. [sent-29, score-0.387]

16 In particular, letting uj be the j-th leading eigenvector of the sample covariance matrix Σ of x1 , . [sent-30, score-0.101]

17 This indicates, although not rigorous, there is possibility that principal component regression can borrow strength from the low rank structure of Σ, which motivates our work. [sent-34, score-0.402]

18 Even though the statistical performance of principal component regression in low dimensions is not fully understood, there is even less analysis on principal component regression in high dimensions where the dimension d can be even exponentially larger than the sample size n. [sent-35, score-0.774]

19 This is partially due to the fact that estimating the leading eigenvectors of Σ itself has been difficult enough. [sent-36, score-0.061]

20 Very recently, Han and Liu (2013b) relax the Gaussian assumption in conducting a scale invariant version of the sparse PCA (i. [sent-48, score-0.082]

21 , estimating the leading eigenvector of the correlation instead of the covariance matrix). [sent-50, score-0.095]

22 Secondly, in high dimensions where d can increase much faster, even exponentially faster, than n, we propose a robust method in conducting (sparse) principal component regression under a nonGaussian elliptical model. [sent-55, score-0.614]

23 The elliptical distribution is a semiparametric generalization to the Gaussian, relaxing the light tail and zero tail dependence constraints, but preserving the symmetry property. [sent-56, score-0.223]

24 This distribution family includes many u well known distributions such as multivariate Gaussian, rank deficient Gaussian, t, logistic, and many others. [sent-59, score-0.071]

25 Under the elliptical model, we exploit the result in Han and Liu (2013a), who showed that by utilizing a robust covariance matrix estimator, the multivariate Kendall’s tau, we can obtain an estimator u1 , which can recover u1 in the optimal parametric rate as shown in Vu and Lei (2012). [sent-60, score-0.279]

26 We then exploit u1 in conducting principal component regression and show that the obtained estiˇ mator β can estimate β in the optimal s log d/n rate. [sent-61, score-0.438]

27 2 Classical Principal Component Regression This section is devoted to the discussion on the advantage of classical principal component regression over the classical linear regression. [sent-64, score-0.481]

28 We also denote MI,J to be the submatrix of M whose rows are indexed by I and columns are indexed by J. [sent-71, score-0.068]

29 Let MI∗ and M∗J be the submatrix of M with rows indexed by I, and the submatrix of M with columns indexed by J. [sent-72, score-0.084]

30 We suppose that the following principal component regression model holds: i=1 Y = αXu1 + , (2. [sent-96, score-0.387]

31 , xn ]T ∈ Rn×d and are interested in estimating the regression coefficient β := αu1 . [sent-103, score-0.155]

32 We Let β represent the solution of the classical least square estimator without taking the information that β is proportional to u1 into account. [sent-108, score-0.137]

33 Under the principal component regression model shown in (2. [sent-114, score-0.387]

34 1 reflects the vulnerability of least square estimator on the collinearity. [sent-117, score-0.09]

35 1), the classical principal component regression estimator can be elaborated as follows. [sent-121, score-0.495]

36 (1) We first estimate u1 using the leading eigenvector u1 of the sample covariance Σ := 1 n xi xT . [sent-122, score-0.076]

37 1) by the standard least square estimation on the projected data Z := Xu1 ∈ Rn : α := (Z T Z)−1 Z T Y , The final principal component regression estimator β is then obtained as β = αu1 . [sent-124, score-0.477]

38 1, provides several important messages on the performance of principal component regression. [sent-135, score-0.277]

39 First, compared to the least square estimator β, β is insensitive to collinearity in the sense that λmin (Σ) plays no role in the rate of convergence of β. [sent-136, score-0.116]

40 These two observations, combined together, illustrate the advantages of the classical principal component regression over least square estimation. [sent-138, score-0.512]

41 These advantages justify the use of principal component regression. [sent-139, score-0.29]

42 The empirical mean square errors are plotted against 1/λd , λ1 , and α separately in (A), (B), and (C). [sent-177, score-0.068]

43 Here the results of classical linear regression and principal component regression are marked in black solid line and red dotted line. [sent-178, score-0.544]

44 3 Robust Sparse Principal Component Regression under Elliptical Model In this section, we propose a new principal component regression method. [sent-179, score-0.387]

45 We generalize the settings in classical principal component regression discussed in the last section in two directions: (i) We consider the high dimensional settings where the dimension d can be much larger than the sample size n; (ii) In modeling the predictors x1 , . [sent-180, score-0.455]

46 The elliptical family can capture characteristics such as heavy tails and tail dependence, making it more suitable for analyzing complex datasets in finance, genomics, and biomedical imaging. [sent-184, score-0.256]

47 1 Elliptical Distribution In this section we define the elliptical distribution and introduce the basic property of the elliptical d distribution. [sent-186, score-0.302]

48 To our knowledge, there are essentially four ways to define the continuous elliptical distribution with density. [sent-189, score-0.151]

49 The most intuitive way is as follows: A random vector X ∈ Rd is said to follow an elliptical distribution ECd (µ, Σ, ξ) if and only there exists a random variable ξ > 0 (a. [sent-190, score-0.151]

50 Accordingly, elliptical distribution can be regarded as a semiparametric generalization to the Gaussian distribution, with the nonparametric part ξ. [sent-195, score-0.177]

51 Because ξ can be very heavy tailed, X can also be very heavy tailed. [sent-196, score-0.104]

52 4 We would like to point out that the elliptical family is significantly larger than the Gaussian. [sent-203, score-0.168]

53 In contrast, the elliptical is a semiparametric family (since the elliptical density can be represented as g((x−µ)T Σ− 1(x−µ)) where the function g(·) function is completely unspecified. [sent-205, score-0.345]

54 2 Multivariate Kendall’s tau As a important step in conducting the principal component regression, we need to estimate u1 = Θ1 (Cov(X)) = Θ1 (Σ) as accurately as possible. [sent-209, score-0.389]

55 1) can be very heavy tailed, the according elliptical distributed random vector can be heavy tailed. [sent-211, score-0.255]

56 , 2002; Han and Liu, 2013b), the leading eigenvector of the sample covariance matrix Σ can be a bad estimator in estimating u1 = Θ1 (Σ) under the elliptical distribution. [sent-213, score-0.284]

57 In particular, in this paper we consider using the multivariate Kendall’s tau (Choi and Marden, 1998) and recently deeply studied by Han and Liu (2013a). [sent-215, score-0.1]

58 The population multivariate Kendall’s tau matrix, denoted by K ∈ Rd×d , is defined as: K := E (X − X)(X − X)T X −X 2 2 . [sent-218, score-0.1]

59 The sample version of multivariate Kendall’s tau is accordingly defined as K= 1 n(n − 1) i=j (xi − xj )(xi − xj )T , xi − xj 2 2 (3. [sent-224, score-0.118]

60 Let X ∼ ECd (µ, Σ, ξ) be a continuous distribution and K be the population multivariate Kendall’s tau statistic. [sent-231, score-0.1]

61 3 Model and Method In this section we discuss the model we build and the accordingly proposed method in conducting high dimensional (sparse) principal component regression on non-Gaussian data. [sent-242, score-0.477]

62 Similar as in Section 2, we consider the classical simple principal component regression model: Y = αXu1 + = α[x1 , . [sent-243, score-0.434]

63 Thusly, the formal model of the robust sparse principal component regression considered in this paper is as follows: Md (Y , ; Σ, ξ, s) : Y = αXu1 + , x1 , . [sent-257, score-0.443]

64 5) Then the robust sparse principal component regression can be elaborated as a two step procedure: (i) Inspired by the model Md (Y , ; Σ, ξ, s) and Proposition 3. [sent-262, score-0.466]

65 6) v∈Rd where B0 (s) := {v ∈ Rd : v 0 ≤ s} and K is the estimated multivariate Kendall’s tau matrix. [sent-264, score-0.1]

66 1, u1 is also an estimator of Θ1 (Cov(X)), whenever the covariance matrix exists. [sent-267, score-0.064]

67 5) by the standard least square estimation on the projected data Z := Xu1 ∈ Rn : α := (Z T Z)−1 Z T Y , ˇ ˇ ˇ ˇ The final principal component regression estimator β is then obtained as β = αu1 . [sent-269, score-0.477]

68 2, we show that how to estimate u1 accurately plays an important role in conducting the principal component regression. [sent-272, score-0.328]

69 Under Conditions 1 and 2, we then have the following theorem, which shows that under certain ˇ conditions, β − β 2 = OP ( s log d/n), which is the optimal parametric rate in estimating the regression coefficient (Ravikumar et al. [sent-286, score-0.129]

70 Then under Condition 1 or Condition 2 and for all random vector X such that max v∈Sd−1 , v 0 ≤2s |v T (Σ − Σ)v| = oP (1), ˇ we have the robust principal component regression estimator β satisfies that ˇ β−β s log d n = OP multivariate-t . [sent-292, score-0.45]

71 Here n = 100, d = 200, and we are interested in estimating the regression coefficient β. [sent-320, score-0.129]

72 The horizontal-axis represents the cardinalities of the estimates’ support sets and the vertical-axis represents the empirical mean square error. [sent-321, score-0.067]

73 Here from the left to the right, the minimum mean square errors for lasso are 0. [sent-322, score-0.091]

74 6 4 Experiments In this section we conduct study on both synthetic and real-world data to investigate the empirical performance of the robust sparse principal component regression proposed in this paper. [sent-325, score-0.443]

75 1 Simulation Study In this section, we conduct simulation study to back up the theoretical results and further investigate the empirical performance of the proposed robust sparse principal component regression method. [sent-331, score-0.443]

76 , ud be the eigenvectors of Σ with uj := (uj1 , . [sent-340, score-0.061]

77 The top 2 leading eigenvectors u1 , u2 of Σ are specified to be sparse with sj := √ j−1 j uj 0 and ujk = 1/ sj for k ∈ [1 + i=1 si , i=1 si ] and zero for all the others. [sent-344, score-0.098]

78 With Σ, we then consider the following four different elliptical distributions: d (Normal) X ∼ ECd (0, Σ, ζ1 ) with ζ1 = χd . [sent-353, score-0.151]

79 To show the estimation accuracy, Figure ˇ 2 plots the empirical mean square error between the estimate u1 and true regression coefficient β ˇ against the numbers of estimated nonzero entries (defined as u1 0 ), for PCR and RPCR, under different schemes of (n, d), Σ and different distributions. [sent-382, score-0.162]

80 As discussed in Section 2, especially as shown in Figure 1, linear regression and principal component regression have their own advantages in different settings. [sent-385, score-0.51]

81 For example, under the Gaussian settings with n = 100 and d = 200, the lowest mean square error for lasso is 0. [sent-387, score-0.075]

82 Moreover, when the data are indeed normally distributed, there is no obvious difference between RPCR and PCR, indicating that RPCR is a safe alternative to the classical sparse principal component regression. [sent-392, score-0.355]

83 quantile plot of the log-return values for one stock ”Goldman Sachs”. [sent-402, score-0.08]

84 2 Application to Equity Data In this section we apply the proposed robust sparse principal component regression and the other two methods to the stock price data from Yahoo! [sent-406, score-0.484]

85 We collect the daily closing prices for 452 stocks that are consistently in the S&P; 500 index between January 1, 2003 through January 1, 2008. [sent-410, score-0.067]

86 Let St = [Stt,j ] denote by the closing price of stock j on day t. [sent-412, score-0.086]

87 This is done first by conducting marginal normality tests (Kolmogorove-Smirnov, Shapiro-Wilk, and Lillifors) on the data. [sent-415, score-0.071]

88 We find that at most 24 out of 452 stocks would pass any of three normality test. [sent-416, score-0.064]

89 With Bonferroni correction there are still over half stocks that fail to pass any normality tests. [sent-417, score-0.064]

90 Moreover, to illustrate the heavy tailed issue, we plot the quantile vs. [sent-418, score-0.144]

91 It can be observed that the log return values for this stock is heavy tailed compared to the Gaussian. [sent-420, score-0.133]

92 We are interested in predicting the log return value in day t for each stock indexed by k (i. [sent-424, score-0.089]

93 , treating Ft,k as the response) using the log return values for all the stocks in day t − 1 to day t − 7 (i. [sent-426, score-0.088]

94 For each stock indexed by k, to learn the regression coefficient βk , we use Ft ∈{1,. [sent-430, score-0.177]

95 On principal components and regression: a statistical explanation of a natural phenomenon. [sent-448, score-0.184]

96 On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. [sent-453, score-0.079]

97 Sign and rank covariance matrices: statistical properties and application to principal components analysis. [sent-477, score-0.225]

98 Optimal sparse principal component analysis in high dimensional elliptical model. [sent-498, score-0.48]

99 On consistency and sparsity for principal components analysis in high dimensions. [sent-511, score-0.184]

100 Estimating the tail dependence function of an elliptical u distribution. [sent-521, score-0.174]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('qq', 0.894), ('principal', 0.184), ('elliptical', 0.151), ('rpcr', 0.125), ('regression', 0.11), ('pcr', 0.104), ('component', 0.093), ('ecd', 0.083), ('kendall', 0.077), ('han', 0.062), ('tau', 0.061), ('square', 0.052), ('heavy', 0.052), ('conducting', 0.051), ('classical', 0.047), ('qqq', 0.046), ('stocks', 0.044), ('stock', 0.041), ('tailed', 0.04), ('multivariate', 0.039), ('cov', 0.039), ('op', 0.039), ('estimator', 0.038), ('md', 0.035), ('oja', 0.032), ('artemiou', 0.031), ('sparse', 0.031), ('fang', 0.03), ('rd', 0.03), ('sd', 0.029), ('proposition', 0.027), ('semiparametric', 0.026), ('quantile', 0.026), ('xn', 0.026), ('covariance', 0.026), ('liu', 0.026), ('indexed', 0.026), ('leading', 0.026), ('elliptically', 0.026), ('collinearity', 0.026), ('vershynin', 0.026), ('robust', 0.025), ('uj', 0.025), ('lr', 0.025), ('eigenvector', 0.024), ('secondly', 0.024), ('tail', 0.023), ('lasso', 0.023), ('closing', 0.023), ('elaborated', 0.023), ('gaussian', 0.022), ('day', 0.022), ('pca', 0.022), ('dimensional', 0.021), ('lei', 0.021), ('averagely', 0.021), ('bunea', 0.021), ('kjk', 0.021), ('marden', 0.021), ('massy', 0.021), ('maxjk', 0.021), ('moghaddam', 0.021), ('ppelberg', 0.021), ('sachs', 0.021), ('truncated', 0.021), ('normality', 0.02), ('ud', 0.02), ('vu', 0.02), ('estimating', 0.019), ('xtj', 0.018), ('equity', 0.018), ('croux', 0.018), ('statistic', 0.018), ('accordingly', 0.018), ('averaged', 0.017), ('vec', 0.017), ('family', 0.017), ('coef', 0.017), ('goldman', 0.017), ('tyler', 0.017), ('eigenvectors', 0.016), ('errors', 0.016), ('submatrix', 0.016), ('lounici', 0.016), ('rn', 0.015), ('cardinalities', 0.015), ('rank', 0.015), ('borrowing', 0.015), ('cook', 0.015), ('quantiles', 0.014), ('aspremont', 0.014), ('hardest', 0.014), ('nance', 0.014), ('arxiv', 0.013), ('advantages', 0.013), ('ft', 0.013), ('illustrate', 0.013), ('plot', 0.013), ('biomedical', 0.013), ('financial', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

Author: Fang Han, Han Liu

Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1

2 0.47977567 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces

Author: Leonid Boytsov, Bilegsaikhan Naidan

Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality. 1

3 0.14666294 161 nips-2013-Learning Stochastic Inverses

Author: Andreas Stuhlmüller, Jacob Taylor, Noah Goodman

Abstract: We describe a class of algorithms for amortized inference in Bayesian networks. In this setting, we invest computation upfront to support rapid online inference for a wide range of queries. Our approach is based on learning an inverse factorization of a model’s joint distribution: a factorization that turns observations into root nodes. Our algorithms accumulate information to estimate the local conditional distributions that constitute such a factorization. These stochastic inverses can be used to invert each of the computation steps leading to an observation, sampling backwards in order to quickly find a likely explanation. We show that estimated inverses converge asymptotically in number of (prior or posterior) training samples. To make use of inverses before convergence, we describe the Inverse MCMC algorithm, which uses stochastic inverses to make block proposals for a Metropolis-Hastings sampler. We explore the efficiency of this sampler for a variety of parameter regimes and Bayes nets. 1

4 0.13598032 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

Author: Vincent Q. Vu, Juhee Cho, Jing Lei, Karl Rohe

Abstract: We propose a novel convex relaxation of sparse principal subspace estimation based on the convex hull of rank-d projection matrices (the Fantope). The convex problem can be solved efficiently using alternating direction method of multipliers (ADMM). We establish a near-optimal convergence rate, in terms of the sparsity, ambient dimension, and sample size, for estimation of the principal subspace of a general covariance matrix without assuming the spiked covariance model. In the special case of d = 1, our result implies the near-optimality of DSPCA (d’Aspremont et al. [1]) even when the solution is not rank 1. We also provide a general theoretical framework for analyzing the statistical properties of the method for arbitrary input matrices that extends the applicability and provable guarantees to a wide array of settings. We demonstrate this with an application to Kendall’s tau correlation matrices and transelliptical component analysis. 1

5 0.11977096 47 nips-2013-Bayesian Hierarchical Community Discovery

Author: Charles Blundell, Yee Whye Teh

Abstract: We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks. Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms that take just one pass through the data to learn a fully probabilistic, hierarchical community model. In the worst case, Our algorithms scale quadratically in the number of vertices of the network, but independent of the number of nested communities. In practice, the run time of our algorithms are two orders of magnitude faster than the Infinite Relational Model, achieving comparable or better accuracy. 1

6 0.093265004 225 nips-2013-One-shot learning and big data with n=2

7 0.088527113 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics

8 0.077290617 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

9 0.06697841 188 nips-2013-Memory Limited, Streaming PCA

10 0.064374819 91 nips-2013-Dirty Statistical Models

11 0.056468528 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

12 0.054281533 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints

13 0.046931341 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

14 0.045867864 256 nips-2013-Probabilistic Principal Geodesic Analysis

15 0.045414612 217 nips-2013-On Poisson Graphical Models

16 0.043795999 109 nips-2013-Estimating LASSO Risk and Noise Level

17 0.041831665 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

18 0.040334113 232 nips-2013-Online PCA for Contaminated Data

19 0.039697587 324 nips-2013-The Fast Convergence of Incremental PCA

20 0.038720742 120 nips-2013-Faster Ridge Regression via the Subsampled Randomized Hadamard Transform


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.101), (1, 0.049), (2, 0.041), (3, 0.031), (4, 0.002), (5, 0.063), (6, -0.001), (7, 0.034), (8, -0.113), (9, 0.029), (10, -0.014), (11, 0.039), (12, 0.055), (13, 0.058), (14, -0.003), (15, 0.022), (16, -0.001), (17, 0.034), (18, -0.025), (19, 0.165), (20, 0.149), (21, 0.033), (22, -0.251), (23, 0.236), (24, 0.179), (25, -0.21), (26, -0.094), (27, -0.028), (28, 0.142), (29, 0.081), (30, 0.211), (31, 0.004), (32, -0.191), (33, -0.053), (34, -0.019), (35, -0.023), (36, -0.028), (37, -0.201), (38, 0.056), (39, 0.237), (40, 0.042), (41, 0.017), (42, -0.026), (43, -0.031), (44, -0.08), (45, -0.028), (46, -0.03), (47, -0.05), (48, 0.008), (49, -0.162)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94194376 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

Author: Fang Han, Han Liu

Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1

2 0.78565687 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces

Author: Leonid Boytsov, Bilegsaikhan Naidan

Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality. 1

3 0.43992665 225 nips-2013-One-shot learning and big data with n=2

Author: Lee H. Dicker, Dean P. Foster

Abstract: We model a “one-shot learning” situation, where very few observations y1 , ..., yn ∈ R are available. Associated with each observation yi is a very highdimensional vector xi ∈ Rd , which provides context for yi and enables us to predict subsequent observations, given their own context. One of the salient features of our analysis is that the problems studied here are easier when the dimension of xi is large; in other words, prediction becomes easier when more context is provided. The proposed methodology is a variant of principal component regression (PCR). Our rigorous analysis sheds new light on PCR. For instance, we show that classical PCR estimators may be inconsistent in the specified setting, unless they are multiplied by a scalar c > 1; that is, unless the classical estimator is expanded. This expansion phenomenon appears to be somewhat novel and contrasts with shrinkage methods (c < 1), which are far more common in big data analyses. 1

4 0.43772179 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics

Author: Özgür1 Şimşek

Abstract: Several attempts to understand the success of simple decision heuristics have examined heuristics as an approximation to a linear decision rule. This research has identified three environmental structures that aid heuristics: dominance, cumulative dominance, and noncompensatoriness. This paper develops these ideas further and examines their empirical relevance in 51 natural environments. The results show that all three structures are prevalent, making it possible for simple rules to reach, and occasionally exceed, the accuracy of the linear decision rule, using less information and less computation. 1

5 0.42032632 47 nips-2013-Bayesian Hierarchical Community Discovery

Author: Charles Blundell, Yee Whye Teh

Abstract: We propose an efficient Bayesian nonparametric model for discovering hierarchical community structure in social networks. Our model is a tree-structured mixture of potentially exponentially many stochastic blockmodels. We describe a family of greedy agglomerative model selection algorithms that take just one pass through the data to learn a fully probabilistic, hierarchical community model. In the worst case, Our algorithms scale quadratically in the number of vertices of the network, but independent of the number of nested communities. In practice, the run time of our algorithms are two orders of magnitude faster than the Infinite Relational Model, achieving comparable or better accuracy. 1

6 0.37696573 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

7 0.35497105 161 nips-2013-Learning Stochastic Inverses

8 0.28136814 256 nips-2013-Probabilistic Principal Geodesic Analysis

9 0.28053159 73 nips-2013-Convex Relaxations for Permutation Problems

10 0.27234733 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

11 0.27006158 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

12 0.25848073 117 nips-2013-Fast Algorithms for Gaussian Noise Invariant Independent Component Analysis

13 0.24827676 300 nips-2013-Solving the multi-way matching problem by permutation synchronization

14 0.24571083 294 nips-2013-Similarity Component Analysis

15 0.24154438 326 nips-2013-The Power of Asymmetry in Binary Hashing

16 0.22896788 244 nips-2013-Parametric Task Learning

17 0.22873908 355 nips-2013-Which Space Partitioning Tree to Use for Search?

18 0.22588287 131 nips-2013-Geometric optimisation on positive definite matrices for elliptically contoured distributions

19 0.22037467 76 nips-2013-Correlated random features for fast semi-supervised learning

20 0.21631964 217 nips-2013-On Poisson Graphical Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.042), (33, 0.131), (34, 0.102), (36, 0.014), (41, 0.022), (47, 0.021), (49, 0.053), (56, 0.082), (70, 0.014), (85, 0.041), (89, 0.043), (93, 0.023), (95, 0.01), (99, 0.27)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84336168 128 nips-2013-Generalized Method-of-Moments for Rank Aggregation

Author: Hossein Azari Soufiani, William Chen, David C. Parkes, Lirong Xia

Abstract: In this paper we propose a class of efficient Generalized Method-of-Moments (GMM) algorithms for computing parameters of the Plackett-Luce model, where the data consists of full rankings over alternatives. Our technique is based on breaking the full rankings into pairwise comparisons, and then computing parameters that satisfy a set of generalized moment conditions. We identify conditions for the output of GMM to be unique, and identify a general class of consistent and inconsistent breakings. We then show by theory and experiments that our algorithms run significantly faster than the classical Minorize-Maximization (MM) algorithm, while achieving competitive statistical efficiency. 1

2 0.84268844 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar

Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1

3 0.78522289 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces

Author: Leonid Boytsov, Bilegsaikhan Naidan

Abstract: Our focus is on approximate nearest neighbor retrieval in metric and non-metric spaces. We employ a VP-tree and explore two simple yet effective learning-toprune approaches: density estimation through sampling and “stretching” of the triangle inequality. Both methods are evaluated using data sets with metric (Euclidean) and non-metric (KL-divergence and Itakura-Saito) distance functions. Conditions on spaces where the VP-tree is applicable are discussed. The VP-tree with a learned pruner is compared against the recently proposed state-of-the-art approaches: the bbtree, the multi-probe locality sensitive hashing (LSH), and permutation methods. Our method was competitive against state-of-the-art methods and, in most cases, was more efficient for the same rank approximation quality. 1

same-paper 4 0.76577067 283 nips-2013-Robust Sparse Principal Component Regression under the High Dimensional Elliptical Model

Author: Fang Han, Han Liu

Abstract: In this paper we focus on the principal component regression and its application to high dimension non-Gaussian data. The major contributions are two folds. First, in low dimensions and under the Gaussian model, by borrowing the strength from recent development in minimax optimal principal component estimation, we first time sharply characterize the potential advantage of classical principal component regression over least square estimation. Secondly, we propose and analyze a new robust sparse principal component regression on high dimensional elliptically distributed data. The elliptical distribution is a semiparametric generalization of the Gaussian, including many well known distributions such as multivariate Gaussian, rank-deficient Gaussian, t, Cauchy, and logistic. It allows the random vector to be heavy tailed and have tail dependence. These extra flexibilities make it very suitable for modeling finance and biomedical imaging data. Under the elliptical model, we prove that our method can estimate the regression coefficients in the optimal parametric rate and therefore is a good alternative to the Gaussian based methods. Experiments on synthetic and real world data are conducted to illustrate the empirical usefulness of the proposed method. 1

5 0.75664884 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

Author: Trevor Campbell, Miao Liu, Brian Kulis, Jonathan P. How, Lawrence Carin

Abstract: This paper presents a novel algorithm, based upon the dependent Dirichlet process mixture model (DDPMM), for clustering batch-sequential data containing an unknown number of evolving clusters. The algorithm is derived via a lowvariance asymptotic analysis of the Gibbs sampling algorithm for the DDPMM, and provides a hard clustering with convergence guarantees similar to those of the k-means algorithm. Empirical results from a synthetic test with moving Gaussian clusters and a test with real ADS-B aircraft trajectory data demonstrate that the algorithm requires orders of magnitude less computational time than contemporary probabilistic and hard clustering algorithms, while providing higher accuracy on the examined datasets. 1

6 0.72800612 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization

7 0.60005432 262 nips-2013-Real-Time Inference for a Gamma Process Model of Neural Spiking

8 0.5945974 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

9 0.59205163 303 nips-2013-Sparse Overlapping Sets Lasso for Multitask Learning and its Application to fMRI Analysis

10 0.59145248 302 nips-2013-Sparse Inverse Covariance Estimation with Calibration

11 0.59131497 175 nips-2013-Linear Convergence with Condition Number Independent Access of Full Gradients

12 0.59109616 173 nips-2013-Least Informative Dimensions

13 0.59016323 49 nips-2013-Bayesian Inference and Online Experimental Design for Mapping Neural Microcircuits

14 0.58931381 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

15 0.5890379 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

16 0.58878565 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search

17 0.58840781 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity

18 0.58787811 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

19 0.58787417 236 nips-2013-Optimal Neural Population Codes for High-dimensional Stimulus Variables

20 0.58784389 350 nips-2013-Wavelets on Graphs via Deep Learning