nips nips2012 nips2012-310 knowledge-graph by maker-knowledge-mining

310 nips-2012-Semiparametric Principal Component Analysis


Source: pdf

Author: Fang Han, Han Liu

Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We propose two new principal component analysis methods in this paper utilizing a semiparametric model. [sent-3, score-0.084]

2 The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. [sent-5, score-0.025]

3 The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. [sent-6, score-0.087]

4 The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. [sent-7, score-0.026]

5 We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). [sent-10, score-0.03]

6 Given a random vector X ∈ Rd with covariance matrix Σ and n independent observations of X, the PCA reduces the dimension of the data by projecting the data onto a linear subspace spanned by the k leading eigenvectors of Σ, such that the principal modes of variations are preserved. [sent-12, score-0.103]

7 In practice, Σ is unknown and replaced by d the sample covariance S. [sent-13, score-0.014]

8 ≥ ωd and the corresponding orthornormal eigenvectors u1 , . [sent-17, score-0.018]

9 PCA aims at recovering the first k eigenvectors u1 , . [sent-21, score-0.018]

10 [5] show that if X is multivariate Gaussian, then the distribution is centered about the principal component axes and is therefore “self-consistent” [8]. [sent-27, score-0.07]

11 Given u1 the dominant eigenvector of S, [9] show that the angle between u1 and u1 will not converge to 0, i. [sent-30, score-0.03]

12 lim inf n→∞ E∠(u1 , u1 ) > 0, where we denote by ∠(u1 , u1 ) the angle between the estimated and the true leading eigenvectors. [sent-32, score-0.014]

13 The resulting estimator u1 is: u1 = arg max v T Sv subject to v 2 = 1, card(supp(v)) ≤ s. [sent-35, score-0.014]

14 1), a variety of algorithms are proposed: greedy algorithms [3], lasso-type methods including SCoTLASS [11], SPCA [25] and sPCA-rSVD [19], a number of power methods [12, 23, 16], the biconvex algorithm PMD [21] and the semidefinite relaxation DSPCA [4]. [sent-38, score-0.015]

15 In this paper, we first explore the use of the PCA conducted on the correlation matrix Σ0 instead of the covariance matrix Σ, and then propose a high dimensional semiparametric scale-invariant principal component analysis method, named the Copula Component Analysis (COCA). [sent-41, score-0.157]

16 In this paper, the population version of the scale-invariant PCA is built as the estimator of the leading eigenvector of the population correlation matrix Σ0 . [sent-42, score-0.08]

17 , Xd )T belongs to a Nonparanormal family if and only if there exists a set 0 0 0 of univariate monotone functions {fj }d such that (f1 (X1 ), . [sent-47, score-0.019]

18 Thirdly, to estimate Σ0 robustly and efficiently, instead of estimating 0 0 the normal score transformation functions {fj }d as [15] did, realizing that {fj }d preserve the j=1 j=1 ranks of the data, we utilize the nonparametric correlation coefficient estimator, Spearman’s rho, to estimate Σ0 . [sent-52, score-0.05]

19 In theory, we analyze the general case that X is following the Nonparanormal and θ1 is weakly sparse, here θ1 is the leading eigenvector of Σ0 . [sent-54, score-0.032]

20 We obtain the estimation consistency of the COCA estimator to θ1 using the Spearman’s rho correlation coefficient matrix. [sent-55, score-0.089]

21 We prove that the estimation consistency rates are close to the parametric rate under Gaussian assumption and the feature selection consistency can be achieved when d is nearly exponential to the sample size. [sent-56, score-0.032]

22 The Copula PCA estimates the leading eigenvector of the latent covariance matrix Σ. [sent-58, score-0.062]

23 To estimate the leading eigenvectors of Σ, instead of Σ0 , in a fast rate, we prove that extra conditions are required on the transformation functions. [sent-59, score-0.047]

24 1 The Models of the PCA and Scale-invariant PCA Let Σ0 be the correlation matrix of Σ, and by spectral decomposition, Σ = d j=1 ωj uj uT and Σ0 = j d j=1 T λj θj θj . [sent-77, score-0.034]

25 , θd }, the eigenvectors of the sample covariance and correlation matrices S and S 0 , are the MLEs of {u1 , . [sent-97, score-0.058]

26 xn ∼ N (µ, Σ) and Σ0 be the correlation matrix of Σ. [sent-108, score-0.04]

27 , ud }, and the estimators of the scale-invariant PCA, {θ1 , . [sent-112, score-0.02]

28 , σd )T is said to follow a Nonparanormal distribution N P Nd (µ, Σ, f ) if and only if there exists a set of univariate monotone transformations 2 f = {fj }d such that: f (X) = (f1 (X1 ), . [sent-153, score-0.019]

29 Let f 0 = {fj }d be a set of monotone univariate functions and Σ0 ∈ Rd×d j=1 be a positive definite correlation matrix with diag(Σ0 ) = 1. [sent-162, score-0.053]

30 , µd )T , Σ = [Σjk ] ∈ Rd×d such that for any 1 ≤ j, k ≤ d, E(Xj ) = µj , Var(Xj ) = Σjj and Σ0 = jk √ Σjk , and a set of monotone univariate functions f = {fj }d such that X ∼ N P Nd (µ, Σ, f ). [sent-178, score-0.03]

31 Using the connection that fj (x) = µj + σj fj (x), for j ∈ {1, 2 . [sent-180, score-0.042]

32 2 is more appealing because it emphasizes the correlation and hence matches the spirit of the Copula. [sent-187, score-0.026]

33 3 Spearman’s rho Correlation and Covariance Matrices Given n data points x1 , . [sent-191, score-0.042]

34 Because the Nonparanormal distribution preserves the rank of the data, it is natural to use the nonparametric rank-based correlation coefficient estimator, Spearman’s rho, to estimate the latent n 1 correlation. [sent-198, score-0.034]

35 , xnj and rj := n i=1 rij = n+1 , ¯ 2 n (r −¯ )(r −¯ ) r r we consider the following statistics: ρjk = √ n i=1 ij 2 j nik k , and the correlation ma2 r i=1 (rij −¯j ) · r i=1 (rik −¯k ) trix estimator: Rjk = 2 sin( π ρjk ). [sent-202, score-0.035]

36 1) n We denote by R := [Rjk ] the Spearman’s rho correlation coefficient matrix. [sent-214, score-0.068]

37 In the following let S := [Sjk ] = [σj σk Rjk ] be the Spearman’s rho covariance matrix. [sent-215, score-0.056]

38 1 COCA Model We firstly present the model of the Copula Component Analysis (COCA) method, where the idea of scale-invariant PCA is exploited and we wish to estimate the leading eigenvector of the latent correlation matrix. [sent-227, score-0.066]

39 5 and 12 0 0 the transformation functions have the form as follows: (A) f1 (x) = x3 and f2 (x) = x1/3 ; (B) 0 0 0 0 f1 (x) = sign(x)x2 and f2 (x) = x3 ; (C) f1 (x) = f2 (x) = Φ−1 (x). [sent-262, score-0.015]

40 where θ1 is the leading eigenvectors of the latent correlation matrix Σ0 we are interested in estimating, 0 ≤ q ≤ 1 and the q ball Bq (Rq ) is defined as: when q = 0, B0 (R0 ) := {v ∈ Rd : card(supp(v)) ≤ R0 }; when 0 < q ≤ 1, d Bq (Rq ) := {v ∈ R : v q q ≤ Rq }. [sent-263, score-0.074]

41 3) Inspired by the model M0 (q, Rq , Σ0 , f 0 ), we consider the following COCA estimator θ1 , which maximizes the following equation with the constraint that θ1 ∈ Bq (Rq ) for some 0 ≤ q ≤ 1: θ1 = arg max v T Rv, subject to v ∈ Sd−1 ∩ Bq (Rq ). [sent-266, score-0.021]

42 4) v∈Rd Here R is the estimated Spearman’s rho correlation coefficient matrix. [sent-268, score-0.068]

43 The corresponding COCA estimator θ1 can be considered as a nonlinear dimensional reduction procedure and has the potential to gain more flexibility compared with the classical PCA. [sent-269, score-0.024]

44 In Section 4 we will establish the theoretical results on the COCA estimator and will show that it can estimate the latent true dominant eigenvector θ1 in a fast rate and can achieve feature selection consistency. [sent-270, score-0.059]

45 1 Copula PCA Model In contrast, we provide another model inspired from the classical PCA method, where we wish to estimate the leading eigenvector of the latent covariance matrix. [sent-273, score-0.054]

46 5) u1 ∈ Sd−1 ∩ Bq (Rq ), where u1 is the leading eigenvector of the covariance matrix Σ and it is what we are interested in estimating. [sent-280, score-0.054]

47 6) v∈Rd where S is the Spearman’s rho covariance coefficient matrix. [sent-282, score-0.056]

48 2 Algorithms In this section we provide three sparse PCA algorithms, where the Spearman’s rho correlation and covariance matrices R and S can be directly plugged in to obtain sparse estimators. [sent-286, score-0.1]

49 [21] suggest using the first leading 4 eigenvector of Γ to be the initial value of v. [sent-294, score-0.032]

50 The main idea of the SPCA algorithm is to exploit a regression approach to PCA and then utilize lasso and elastic net [24] to calculate a sparse estimator to the leading eigenvector. [sent-299, score-0.046]

51 [25] suggest using the first leading eigenvector of Γ to be the initial value of v. [sent-307, score-0.032]

52 The main idea is to utilize the power method, but truncate the vector to a 0 ball in each iteration. [sent-312, score-0.017]

53 In detail, we utilize the classical power method, but in each iteration t we project the intermediate vector xt to the intersection of the d-dimension sphere Sd−1 and the q ball 1/q with the radius Rq . [sent-320, score-0.017]

54 In particular, we establish the results on the max convergence rates of the Spearman’s rho correlation and covariance matrices to Σ and Σ0 . [sent-326, score-0.087]

55 d N P Nd (µ, Σ, f ), 0 < 1/c0 < min{σj } < max{σj } < c0 < j j −1 2 ∞, for some constant c0 and g := {gj = fj }d satisfies for all j = 1, . [sent-344, score-0.021]

56 1 claims that, under certain constraint on the transformation functions, the latent covariance matrix Σ can be recovered using the Spearman’s rho covariance matrix. [sent-352, score-0.108]

57 For any two vectors v1 ∈ Sd−1 and v2 ∈ Sd−1 , let 21 T | sin ∠(v1 , v2 )| = 1 − (v1 v2 )2 , then we have, for any n ≥ log d + 2, 2 P sin ∠(θ1 , θ1 ) ≤ 2 γq Rq 64π 2 log d · 2 (λ1 − λ2 ) n 5 2−q 2 ≥ 1 − 1/d2 , (4. [sent-361, score-0.016]

58 Generally, when Rq and λ1 , λ2 do not scale with (n, d), the rate is OP ( log d )1−q/2 , which is the n parametric rate [16, 20, 18] obtain. [sent-366, score-0.02]

59 When (n, d) goes to infinity, the two dominant eigenvalues λ1 and λ2 will typically go to infinity and will at least be away from zero. [sent-367, score-0.018]

60 Similarly, we can give an upper bound for the estimation rate of the Copula PCA to the true leading eigenvalue u1 of the latent covariance matrix Σ. [sent-380, score-0.051]

61 −1 2 If g := {gj = fj }d satisfies gj ∈ T F (K) for all 1 ≤ j ≤ d, and 0 < 1/c0 < minj {σj } < j=1 2 2 γq Rq maxj {σj } < c0 < ∞, and we further have minj∈Θ |u1j | ≥ n≥ 5 21 log d + 2, P(Θ = Θ) ≥ 1 − √ √ 4 2R0 c1 (ω1 −ω2 ) log d n , then for any 1 d2 . [sent-393, score-0.042]

62 A covariance matrix Σ is firstly synthesized through the eigenvalue decomposition, where the first two eigenvalues are given and the corresponding eigenvectors are pre-specified to be sparse. [sent-404, score-0.046]

63 In detail, we suppose that the first two dominant eigenvectors of Σ, u1 and u2 , are sparse in the sense that only the first s = 10 entries of √ u1 and the second s = 10 entries of u2 are nonzero and set to be 1/ 10. [sent-405, score-0.039]

64 The correlation matrix Σ0 is accordingly generated from Σ, with λ1 = 4, λ2 = 2. [sent-411, score-0.041]

65 , λd ≤ 1 and the two dominant eigenvectors sparse. [sent-415, score-0.03]

66 To sample data from the Nonparanormal, we also need the transformation 0 functions: f 0 = {fj }d . [sent-416, score-0.015]

67 Here two types of transformation functions are considered: (1) Linear j=1 0 transformation (or no transformation): flinear = {h0 , h0 , . [sent-417, score-0.041]

68 , h0 }, where h0 (x) := x; (2) Nonlinear transformation: there exist five univariate monotone functions h1 , h2 , . [sent-420, score-0.019]

69 We then generate j j n = 100, 200 or 500 data points from: 0 0 [Scheme 1] X ∼ N P Nd (Σ0 , flinear ) where flinear = {h0 , h0 , . [sent-435, score-0.022]

70 0 0 [Scheme 2] X ∼ N P Nd (Σ0 , fnonlinear ) where fnonlinear = {h1 , h2 , h3 , h4 , h5 , . [sent-439, score-0.022]

71 The PMD, SPCA and TPower algorithms are then employed on X to computer the estimated leading eigenvector θ1 . [sent-447, score-0.032]

72 0 FPR Figure 2: ROC curves for the PMD, SPCA and Truncated Power method (the left two, the middle two, the right two) with linear (no) and nonlinear transformation (top, bottom) and data contamination at different levels (r = 0, 0. [sent-604, score-0.025]

73 The raw data contain 20,248 probes and 13,182 samples belonging to 2,711 tissue types (e. [sent-610, score-0.035]

74 There are at most 1,599 samples and at least 1 sample belonging to each tissue type. [sent-614, score-0.029]

75 We utilize the Truncated Power method proposed by [23] to achieve the sparse estimated dominant eigenvectors. [sent-619, score-0.03]

76 We then explore several tissue types with the largest sample size: (1) Breast tumor, 1,599 samples; (2) B cell lymphoma, 213 samples; (3) Prostate tumor, 148 samples; (4) Wilms tumor, 143 samples. [sent-624, score-0.027]

77 b cell lymphoma, breast tumor, prostate tumor and Wilms tumor are explored (from left to right). [sent-627, score-0.102]

78 Each black point represents a sample and each red point represents a sample belonging to the corresponding tissue type. [sent-628, score-0.029]

79 For each tissue type listed above, we apply the COCA (Spearman) and the classic high dimensional PCA (Pearson) on the data belonging to this specific tissue type and obtain the first two dominant sparse eigenvectors. [sent-629, score-0.087]

80 For COCA, we do a normal score transformation on the original dataset. [sent-631, score-0.015]

81 We subsequently project the whole dataset to the first two principal components using the obtained eigenvectors. [sent-632, score-0.049]

82 In Figure 3 each black point represents a sample and each red point represents a sample belonging to the corresponding tissue type. [sent-634, score-0.029]

83 The first phenomenon indicates that the COCA has the potential to preserve more common information shared by samples from the same tissue type. [sent-636, score-0.022]

84 The second phenomenon indicates that the COCA has the potential to differentiate samples from different tissue types more efficiently. [sent-637, score-0.022]

85 Though both papers are working on principal component analysis, the core ideas are quite different: Firstly, the analysis in [7] is based on a different distribution family called transelliptical, while COCA and Copula PCA are based on the Nonparanormal family. [sent-639, score-0.07]

86 Secondly, by improving the modeling flexibility, in [7] there does not exist a scale-variant variant since it is hard to quantify the transformation functions. [sent-640, score-0.015]

87 In contrast, by introducing the subgaussian transformation function family, the current paper provides sufficient conditions for Copula PCA to achieve parametric rates. [sent-641, score-0.029]

88 Thirdly, the method in [7] cannot explicitly conduct data visualization, due to the fact that the latent elliptical distribution is unspecified and accordingly they cannot accurately estimate the marginal transformations. [sent-642, score-0.022]

89 Moreover, via quantifying a sharp convergence rate in estimating the marginal transformations, we can provide the convergence rates in estimating the principal components. [sent-644, score-0.068]

90 Finally, we recommend using the Spearman’s rho instead of the Kendall’s tau in estimating the correlation coefficients provided that the Nonparanormal model holds. [sent-646, score-0.076]

91 This is because Spearman’s rho is statistically more efficient than Kendall’tau within the Nonparanormal family. [sent-647, score-0.042]

92 High-dimensional analysis of semidefinite relaxations for sparse principal components. [sent-654, score-0.058]

93 Tca: Transelliptical principal component analysis for high dimensional non-gaussian data. [sent-692, score-0.08]

94 On consistency and sparsity for principal components analysis in high dimensions. [sent-704, score-0.056]

95 A modified principal component technique based on the lasso. [sent-717, score-0.07]

96 Generalized power method for sparse e a principal component analysis. [sent-724, score-0.087]

97 Augmented sparse principal component analysis for high dimensional data. [sent-759, score-0.089]

98 Sparse principal component analysis via regularized low rank matrix approximation. [sent-766, score-0.078]

99 Minimax rates of estimation for sparse pca in high dimensions. [sent-772, score-0.08]

100 A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. [sent-780, score-0.092]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('qq', 0.965), ('coca', 0.117), ('spearman', 0.085), ('copula', 0.069), ('qqq', 0.067), ('pca', 0.066), ('nonparanormal', 0.059), ('spca', 0.051), ('principal', 0.049), ('pmd', 0.047), ('rq', 0.047), ('pearson', 0.042), ('rho', 0.042), ('tpr', 0.039), ('tumor', 0.036), ('fpr', 0.036), ('correlation', 0.026), ('supp', 0.023), ('tpower', 0.023), ('tissue', 0.022), ('component', 0.021), ('fj', 0.021), ('oracle', 0.018), ('eigenvector', 0.018), ('bq', 0.018), ('eigenvectors', 0.018), ('sd', 0.017), ('prostate', 0.016), ('transformation', 0.015), ('lymphoma', 0.015), ('wilms', 0.015), ('estimator', 0.014), ('ud', 0.014), ('semiparametric', 0.014), ('covariance', 0.014), ('leading', 0.014), ('gj', 0.013), ('dominant', 0.012), ('jk', 0.011), ('han', 0.011), ('monotone', 0.011), ('xd', 0.011), ('flinear', 0.011), ('fnonlinear', 0.011), ('rd', 0.011), ('dimensional', 0.01), ('contamination', 0.01), ('breast', 0.009), ('utilize', 0.009), ('truncated', 0.009), ('rij', 0.009), ('transelliptical', 0.009), ('rstly', 0.009), ('sparse', 0.009), ('card', 0.009), ('subgaussian', 0.008), ('rjk', 0.008), ('power', 0.008), ('univariate', 0.008), ('dt', 0.008), ('latent', 0.008), ('arxiv', 0.008), ('matrix', 0.008), ('minj', 0.008), ('tau', 0.008), ('sin', 0.008), ('ut', 0.008), ('belonging', 0.007), ('biconvex', 0.007), ('mles', 0.007), ('qtpm', 0.007), ('consistency', 0.007), ('named', 0.007), ('accordingly', 0.007), ('jj', 0.007), ('equation', 0.007), ('coef', 0.007), ('mij', 0.007), ('claims', 0.007), ('fd', 0.007), ('kendall', 0.007), ('marginal', 0.007), ('rate', 0.007), ('tf', 0.006), ('sjk', 0.006), ('eigenvalues', 0.006), ('parametric', 0.006), ('submatrix', 0.006), ('rik', 0.006), ('probes', 0.006), ('preprint', 0.006), ('xn', 0.006), ('genes', 0.006), ('estimators', 0.006), ('biostatistics', 0.006), ('unspeci', 0.006), ('gaussian', 0.005), ('listed', 0.005), ('rates', 0.005), ('cell', 0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1

2 0.89288235 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1

3 0.58297366 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1

4 0.39949682 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

5 0.075194478 351 nips-2012-Transelliptical Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost. 1

6 0.055739984 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

7 0.05569075 211 nips-2012-Meta-Gaussian Information Bottleneck

8 0.05217829 352 nips-2012-Transelliptical Graphical Models

9 0.033575911 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

10 0.026957877 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

11 0.026220024 254 nips-2012-On the Sample Complexity of Robust PCA

12 0.020567171 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

13 0.019063398 237 nips-2012-Near-optimal Differentially Private Principal Components

14 0.017813275 247 nips-2012-Nonparametric Reduced Rank Regression

15 0.015961634 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning

16 0.015212669 325 nips-2012-Stochastic optimization and sparse statistical recovery: Optimal algorithms for high dimensions

17 0.014458696 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

18 0.014398525 289 nips-2012-Recognizing Activities by Attribute Dynamics

19 0.0143839 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

20 0.014090434 199 nips-2012-Link Prediction in Graphs with Autoregressive Features


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.051), (1, 0.034), (2, 0.031), (3, -0.024), (4, -0.008), (5, 0.004), (6, 0.913), (7, -0.065), (8, -0.048), (9, 0.024), (10, 0.07), (11, -0.041), (12, -0.042), (13, 0.036), (14, 0.022), (15, 0.031), (16, 0.016), (17, -0.009), (18, 0.014), (19, -0.011), (20, 0.019), (21, -0.015), (22, 0.011), (23, -0.028), (24, -0.004), (25, -0.004), (26, -0.008), (27, -0.005), (28, 0.014), (29, 0.015), (30, 0.003), (31, -0.025), (32, -0.016), (33, 0.022), (34, -0.002), (35, 0.011), (36, 0.001), (37, 0.012), (38, -0.007), (39, -0.002), (40, 0.018), (41, -0.003), (42, -0.005), (43, 0.006), (44, -0.017), (45, -0.006), (46, 0.003), (47, -0.005), (48, 0.016), (49, -0.004)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99634516 308 nips-2012-Semi-Supervised Domain Adaptation with Non-Parametric Copulas

Author: David Lopez-paz, Jose M. Hernández-lobato, Bernhard Schölkopf

Abstract: A new framework based on the theory of copulas is proposed to address semisupervised domain adaptation problems. The presented method factorizes any multivariate density into a product of marginal distributions and bivariate copula functions. Therefore, changes in each of these factors can be detected and corrected to adapt a density model accross different learning domains. Importantly, we introduce a novel vine copula model, which allows for this factorization in a non-parametric manner. Experimental results on regression problems with real-world data illustrate the efficacy of the proposed approach when compared to state-of-the-art techniques. 1

same-paper 2 0.99191564 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1

3 0.79227382 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning

Author: Liping Liu, Thomas G. Dietterich

Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1

4 0.7871244 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

Author: Amadou Ba, Mathieu Sinn, Yannig Goude, Pascal Pompey

Abstract: This paper proposes an efficient online learning algorithm to track the smoothing functions of Additive Models. The key idea is to combine the linear representation of Additive Models with a Recursive Least Squares (RLS) filter. In order to quickly track changes in the model and put more weight on recent data, the RLS filter uses a forgetting factor which exponentially weights down observations by the order of their arrival. The tracking behaviour is further enhanced by using an adaptive forgetting factor which is updated based on the gradient of the a priori errors. Using results from Lyapunov stability theory, upper bounds for the learning rate are analyzed. The proposed algorithm is applied to 5 years of electricity load data provided by the French utility company Electricit´ de France (EDF). e Compared to state-of-the-art methods, it achieves a superior performance in terms of model tracking and prediction accuracy. 1

5 0.12264585 211 nips-2012-Meta-Gaussian Information Bottleneck

Author: Melanie Rey, Volker Roth

Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1

6 0.11629491 351 nips-2012-Transelliptical Component Analysis

7 0.1123152 130 nips-2012-Feature-aware Label Space Dimension Reduction for Multi-label Classification

8 0.094729386 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

9 0.093679689 352 nips-2012-Transelliptical Graphical Models

10 0.07530944 262 nips-2012-Optimal Neural Tuning Curves for Arbitrary Stimulus Distributions: Discrimax, Infomax and Minimum $L p$ Loss

11 0.061535705 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

12 0.060683832 280 nips-2012-Proper losses for learning from partial labels

13 0.055212285 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent

14 0.055035941 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

15 0.054963231 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression

16 0.053098783 192 nips-2012-Learning the Dependency Structure of Latent Factors

17 0.053010952 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

18 0.052938677 256 nips-2012-On the connections between saliency and tracking

19 0.052439217 228 nips-2012-Multilabel Classification using Bayesian Compressed Sensing

20 0.050665677 169 nips-2012-Label Ranking with Partial Abstention based on Thresholded Probabilistic Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.024), (11, 0.014), (21, 0.014), (24, 0.348), (38, 0.068), (39, 0.103), (42, 0.037), (54, 0.013), (55, 0.015), (64, 0.014), (74, 0.018), (76, 0.086), (80, 0.033), (92, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73927277 310 nips-2012-Semiparametric Principal Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose two new principal component analysis methods in this paper utilizing a semiparametric model. The according methods are named Copula Component Analysis (COCA) and Copula PCA. The semiparametric model assumes that, after unspecified marginally monotone transformations, the distributions are multivariate Gaussian. The COCA and Copula PCA accordingly estimate the leading eigenvectors of the correlation and covariance matrices of the latent Gaussian distribution. The robust nonparametric rank-based correlation coefficient estimator, Spearman’s rho, is exploited in estimation. We prove that, under suitable conditions, although the marginal distributions can be arbitrarily continuous, the COCA and Copula PCA estimators obtain fast estimation rates and are feature selection consistent in the setting where the dimension is nearly exponentially large relative to the sample size. Careful numerical experiments on the synthetic and real data are conducted to back up the theoretical results. We also discuss the relationship with the transelliptical component analysis proposed by Han and Liu (2012). 1

2 0.52235854 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

Author: Tuo Zhao, Kathryn Roeder, Han Liu

Abstract: We introduce a new learning algorithm, named smooth-projected neighborhood pursuit, for estimating high dimensional undirected graphs. In particularly, we focus on the nonparanormal graphical model and provide theoretical guarantees for graph estimation consistency. In addition to new computational and theoretical analysis, we also provide an alternative view to analyze the tradeoff between computational efficiency and statistical error under a smoothing optimization framework. Numerical results on both synthetic and real datasets are provided to support our theory. 1

3 0.41935351 352 nips-2012-Transelliptical Graphical Models

Author: Han Liu, Fang Han, Cun-hui Zhang

Abstract: We advocate the use of a new distribution family—the transelliptical—for robust inference of high dimensional graphical models. The transelliptical family is an extension of the nonparanormal family proposed by Liu et al. (2009). Just as the nonparanormal extends the normal by transforming the variables using univariate functions, the transelliptical extends the elliptical family in the same way. We propose a nonparametric rank-based regularization estimator which achieves the parametric rates of convergence for both graph recovery and parameter estimation. Such a result suggests that the extra robustness and flexibility obtained by the semiparametric transelliptical modeling incurs almost no efficiency loss. We also discuss the relationship between this work with the transelliptical component analysis proposed by Han and Liu (2012). 1

4 0.41063625 351 nips-2012-Transelliptical Component Analysis

Author: Fang Han, Han Liu

Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost. 1

5 0.40917829 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models

Author: Chong Wang, David M. Blei

Abstract: We present a truncation-free stochastic variational inference algorithm for Bayesian nonparametric models. While traditional variational inference algorithms require truncations for the model or the variational distribution, our method adapts model complexity on the fly. We studied our method with Dirichlet process mixture models and hierarchical Dirichlet process topic models on two large data sets. Our method performs better than previous stochastic variational inference algorithms. 1

6 0.39825782 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

7 0.39607534 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

8 0.39526877 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

9 0.38758326 249 nips-2012-Nyström Method vs Random Fourier Features: A Theoretical and Empirical Comparison

10 0.37692007 323 nips-2012-Statistical Consistency of Ranking Methods in A Rank-Differentiable Probability Space

11 0.33767667 163 nips-2012-Isotropic Hashing

12 0.33607733 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

13 0.3330259 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

14 0.33228084 363 nips-2012-Wavelet based multi-scale shape features on arbitrary surfaces for cortical thickness discrimination

15 0.33141884 221 nips-2012-Multi-Stage Multi-Task Feature Learning

16 0.33113101 75 nips-2012-Collaborative Ranking With 17 Parameters

17 0.33057749 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

18 0.32976553 74 nips-2012-Collaborative Gaussian Processes for Preference Learning

19 0.32941273 147 nips-2012-Graphical Models via Generalized Linear Models

20 0.32787478 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)