jmlr jmlr2013 jmlr2013-20 knowledge-graph by maker-knowledge-mining

20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

Source: pdf

Author: Fang Han, Tuo Zhao, Han Liu

Abstract: We propose a high dimensional classiﬁcation method, named the Copula Discriminant Analysis (CODA). The CODA generalizes the normal-based linear discriminant analysis to the larger Gaussian Copula models (or the nonparanormal) as proposed by Liu et al. (2009). To simultaneously achieve estimation efﬁciency and robustness, the nonparametric rank-based methods including the Spearman’s rho and Kendall’s tau are exploited in estimating the covariance matrix. In high dimensional settings, we prove that the sparsity pattern of the discriminant features can be consistently recovered with the parametric rate, and the expected misclassiﬁcation error is consistent to the Bayes risk. Our theory is backed up by careful numerical experiments, which show that the extra ﬂexibility gained by the CODA method incurs little efﬁciency loss even when the data are truly Gaussian. These results suggest that the CODA method can be an alternative choice besides the normal-based high dimensional linear discriminant analysis. Keywords: high dimensional statistics, sparse nonlinear discriminant analysis, Gaussian copula, nonparanormal distribution, rank-based statistics

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 EDU Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA Editor: Tong Zhang Abstract We propose a high dimensional classiﬁcation method, named the Copula Discriminant Analysis (CODA). [sent-4, score-0.153]

2 The CODA generalizes the normal-based linear discriminant analysis to the larger Gaussian Copula models (or the nonparanormal) as proposed by Liu et al. [sent-5, score-0.319]

3 To simultaneously achieve estimation efﬁciency and robustness, the nonparametric rank-based methods including the Spearman’s rho and Kendall’s tau are exploited in estimating the covariance matrix. [sent-7, score-0.431]

4 In high dimensional settings, we prove that the sparsity pattern of the discriminant features can be consistently recovered with the parametric rate, and the expected misclassiﬁcation error is consistent to the Bayes risk. [sent-8, score-0.48]

5 These results suggest that the CODA method can be an alternative choice besides the normal-based high dimensional linear discriminant analysis. [sent-10, score-0.427]

6 Keywords: high dimensional statistics, sparse nonlinear discriminant analysis, Gaussian copula, nonparanormal distribution, rank-based statistics 1. [sent-11, score-0.772]

7 Introduction High dimensional classiﬁcation is of great interest to both computer scientists and statisticians. [sent-12, score-0.106]

8 Bickel and Levina (2004) show that the classical low dimensional normal-based linear discriminant analysis (LDA) is asymptotically equivalent to random guess when the dimension d increases fast compared to the sample size n, even if the Gaussian assumption is correct. [sent-13, score-0.398]

9 To handle this problem, a sparsity condition is commonly added, resulting in many follow-up works in recent years. [sent-14, score-0.032]

10 A variety of methods in sparse linear discriminant analysis, including the nearest shrunken centroids (Tibshirani et al. [sent-15, score-0.353]

11 , 2002; Wang and Zhu, 2007) and feature annealed independence rules (Fan and Fan, 2008), are based on a working independence assumption. [sent-16, score-0.119]

12 Recently, numerous alternative approaches have been proposed by taking more complex covariance matrix structures into consideration (Fan et al. [sent-17, score-0.11]

13 It is well known that the Bayes rule classiﬁes a new data point x to the second class if and only if log ψ1 (x) − log ψ0 (x) + log(π1 /π0 ) > 0. [sent-28, score-0.032]

14 It is well known then that for 2 any linear discriminant rule with respect to w ∈ Rd : gw (X) := I((X − µa )T w > 0), (2) the corresponding misclassiﬁcation error is w T µd C (gw ) = 1 − Φ √ wT Σw , (3) where Φ(·) is the cumulative distribution function of the standard Gaussian. [sent-30, score-0.54]

15 In exploring discriminant rules with a similar form as Equation (2), both Tibshirani et al. [sent-32, score-0.319]

16 (2002) and Fan and Fan (2008) assume a working independence structure for Σ. [sent-33, score-0.058]

17 (2010) propose the Regularized Optimal Afﬁne Discriminant (ROAD) approach. [sent-36, score-0.037]

18 Let Σ and µd be consistent estimators of Σ and µd . [sent-37, score-0.056]

19 wT µd =1 Later, Cai and Liu (2012) propose another version of the sparse LDA, which tries to make w close to the Bayes rule’s linear term Σ−1 µd in the ℓ∞ norm (detailed deﬁnitions are provided in the next section), that is, min{ ||w||1 , subject to ||Σw − µd ||∞ ≤ λn }. [sent-39, score-0.064]

20 w (4) Equation (4) turns out to be a linear programming problem highly related to the Dantzig selector (Candes and Tao, 2007; Yuan, 2010; Cai et al. [sent-40, score-0.027]

21 (2012) propose another version of the sparse linear discriminant analysis based on an equivalent least square formulation of the LDA. [sent-43, score-0.356]

22 In brief, to avoid the “the curse of dimensionality”, an ℓ1 penalty is added in all three 630 C OPULA D ISCRIMINANT A NALYSIS methods to encourage a sparsity pattern of w, and hence nice theoretical properties can be obtained under certain regularity conditions. [sent-45, score-0.063]

23 There are three issues with regard to high dimensional linear discriminant analysis: (1) How to estimate Σ and Σ−1 accurately and efﬁciently (Rothman et al. [sent-47, score-0.398]

24 , 2010); (2) How to incorporate the covariance estimator to classiﬁcation (Fan et al. [sent-51, score-0.151]

25 In this paper, we propose a high dimensional classiﬁcation method, named the Copula Discriminant Analysis (CODA), which addresses all the above three questions. [sent-55, score-0.153]

26 To handle non-Gaussian data, we extend the underlying conditional distributions of (X|Y = 0) and (X|Y = 1) from Gaussian to the larger nonparanormal family (Liu et al. [sent-56, score-0.374]

27 , Xd )T belongs to a nonparanormal family if and only if there exists a set of univariate strictly increasing functions { f j }d such that ( f1 (X1 ), . [sent-61, score-0.409]

28 To estimate Σ and Σ−1 robustly and efﬁciently, instead of estimating the transformation functions { f j }d as Liu et al. [sent-65, score-0.122]

29 (2009) did, we exploit the nonparametric rank-based correlation coefﬁj=1 cient estimators including the Spearman’s rho and Kendall’s tau, which are invariant to the strictly increasing functions f j . [sent-66, score-0.303]

30 They have been shown to enjoy the optimal parametric rate in estimating the correlation matrix (Liu et al. [sent-67, score-0.18]

31 To incorporate the estimated covariance matrix into high dimensional classiﬁcation, we show that the ROAD (Fan et al. [sent-70, score-0.224]

32 Unlike the parametric cases, one new challenge for the CODA is that the rank-based covariance matrix estimator may not be positive semideﬁnite which makes the objective function nonconvex. [sent-73, score-0.197]

33 To solve this problem, we ﬁrst project the estimated covariance matrix into the cone of positive semideﬁnite matrices (using elementwise sup-norm). [sent-74, score-0.165]

34 Finally, to show that the expected misclassiﬁcation error is consistent to the Bayes risk, we quantify the difference between the CODA classiﬁer gnpn and the Bayes rule g∗ . [sent-76, score-0.12]

35 To this end, we measure the convergence rate of the estimated transformation function { f j }d to the true transformation j=1 function { f j }d . [sent-77, score-0.122]

36 Under certain regularity conditions, we show that j=1 γ sup | f j − f j | = OP n− 2 , ∀ j ∈ {1, 2, . [sent-78, score-0.031]

37 In the next section, we brieﬂy review the nonparanormal estimators (Liu et al. [sent-89, score-0.43]

38 We give a theoretical analysis of the CODA estimator in Section 4, with more detailed proofs collected in the appendix. [sent-92, score-0.037]

39 , vd )T and M− j,−k be the matrix with M’s j-th row and k-th column removed, M j,−k be M’s j-th row with the k-th column removed, M− j,k be M’s k-th column with the j-th row removed. [sent-107, score-0.099]

40 Moreover, v’s subvector with entries indexed by I is denoted by vI , M’s submatrix with rows indexed by I and columns indexed by J is denoted by MIJ , M’s submatrix with all rows and columns indexed by J is denoted by M·J , M’s submatrix with all rows and columns indexed by J is denoted by MJ . [sent-108, score-0.788]

41 For 0 < q < ∞, we deﬁne d ||v0 || := card(support(v)), ||v||q := ∑ |vi |q i=1 1/q , and ||v||∞ := max |vi |. [sent-109, score-0.029]

42 1≤i≤d We deﬁne the matrix ℓmax norm as the elementwise maximum value: ||M||max := max{|Mi j |} and n the ℓ∞ norm as ||M||∞ = max ∑ |Mi j |. [sent-110, score-0.169]

43 We deﬁne the matrix operator norm as ||M||op := λmax (M). [sent-112, score-0.058]

44 , Xd )T is said to follow a nonparanormal distribution if and only if there exists a set of univariate strictly increasing transformations f = { f j }d such that: j=1 f (X) = ( f1 (X1 ), . [sent-117, score-0.409]

45 , µd )T , Σ = [Σ jk ], Ω = Σ−1 , Σ0 = [Σ0 ] are the mean, covariance, concentrajk tion and correlation matrices of the Gaussian distribution Z. [sent-123, score-0.179]

46 (2009) prove that the nonparanormal is highly related to the Gaussian Copula (Clemen and Reilly, 1999; Klaassen and Wellner, 1997). [sent-129, score-0.374]

47 (2009) suggest a normal-score based correlation coefﬁcient matrix to estimate Σ0 . [sent-132, score-0.132]

48 , xn ) := Tδn (5) to be the winsorized empirical cumulative distribution function of X j . [sent-143, score-0.085]

49 In particular, the empirical cumulative distribution function Fj (t; x1 , . [sent-145, score-0.054]

50 Let Φ−1 (·) be the quantile function of standard Gaussian, we deﬁne f j (t) = Φ−1 Fj (t) , and the corresponding sample correlation estimator Rns = [Rns ] to be: jk Rns := jk 1 n ∑ f j (xi j ) fk (xik ) n i=1 1 n 2 ∑ f (xi j ) · n i=1 j 1 n 2 ∑ f (xik ) n i=1 k Liu et al. [sent-152, score-0.323]

51 (2009) suggest to use the truncation level δn = ||Rns − Σ0 ||max = O p . [sent-153, score-0.029]

52 (2012) propose a different approach for estimating the correlations, called the Nonparanormal SKEPTIC. [sent-156, score-0.064]

53 The Nonparanormal SKEPTIC exploits the Spearman’s rho and Kendall’s tau to directly estimate the unknown correlation matrix. [sent-157, score-0.397]

54 1 n In speciﬁc, let ri j be the rank of xi j among x1 j , . [sent-158, score-0.091]

55 We consider the ¯ n i=1 following two statistics: (Spearman’s rho) ρ jk = (Kendall’s tau) τ jk = ¯ ¯ ∑n (ri j − r j )(rik − rk ) i=1 , n 2 · ∑n (r − r )2 ¯ ¯k ∑i=1 (ri j − r j ) i=1 ik 2 ∑ sign xi j − xi′ j (xik − xi′ k ) . [sent-162, score-0.243]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('coda', 0.412), ('nonparanormal', 0.374), ('discriminant', 0.319), ('copula', 0.236), ('liu', 0.182), ('rho', 0.175), ('rns', 0.175), ('fj', 0.15), ('tau', 0.15), ('fan', 0.135), ('gw', 0.135), ('mai', 0.131), ('kendall', 0.124), ('spearman', 0.124), ('xik', 0.112), ('han', 0.11), ('jk', 0.107), ('xd', 0.088), ('gnpn', 0.088), ('iscriminant', 0.088), ('jeon', 0.088), ('opula', 0.088), ('shao', 0.088), ('tuo', 0.088), ('wt', 0.084), ('cai', 0.081), ('indexed', 0.08), ('covariance', 0.079), ('dimensional', 0.079), ('road', 0.078), ('correlation', 0.072), ('vd', 0.068), ('hao', 0.067), ('rd', 0.063), ('ri', 0.062), ('baltimore', 0.062), ('transformation', 0.061), ('tibshirani', 0.061), ('fd', 0.058), ('johns', 0.058), ('misclassi', 0.057), ('submatrix', 0.057), ('bayes', 0.057), ('estimators', 0.056), ('elementwise', 0.055), ('op', 0.055), ('fang', 0.055), ('cumulative', 0.054), ('hopkins', 0.052), ('nalysis', 0.052), ('parametric', 0.05), ('princeton', 0.048), ('iu', 0.047), ('lda', 0.047), ('md', 0.044), ('classi', 0.039), ('zhao', 0.038), ('levina', 0.037), ('npn', 0.037), ('reilly', 0.037), ('rik', 0.037), ('rothman', 0.037), ('wellner', 0.037), ('xid', 0.037), ('estimator', 0.037), ('named', 0.037), ('propose', 0.037), ('incorporate', 0.035), ('univariate', 0.035), ('rows', 0.034), ('card', 0.034), ('centroids', 0.034), ('robustly', 0.034), ('witten', 0.034), ('sparsity', 0.032), ('semide', 0.032), ('rule', 0.032), ('xn', 0.031), ('gaussian', 0.031), ('regularity', 0.031), ('annealed', 0.031), ('mj', 0.031), ('scheinberg', 0.031), ('subvector', 0.031), ('matrix', 0.031), ('vi', 0.03), ('independence', 0.03), ('mi', 0.03), ('max', 0.029), ('estimations', 0.029), ('xi', 0.029), ('suggest', 0.029), ('var', 0.029), ('working', 0.028), ('removed', 0.028), ('columns', 0.028), ('estimating', 0.027), ('scientists', 0.027), ('selector', 0.027), ('norm', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

Author: Fang Han, Tuo Zhao, Han Liu

2 0.19606315 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

Author: Naftali Harris, Mathias Drton

Abstract: The PC algorithm uses conditional independence tests for model selection in graphical modeling with acyclic directed graphs. In Gaussian models, tests of conditional independence are typically based on Pearson correlations, and high-dimensional consistency results have been obtained for the PC algorithm in this setting. Analyzing the error propagation from marginal to partial correlations, we prove that high-dimensional consistency carries over to a broader class of Gaussian copula or nonparanormal models when using rank-based measures of correlation. For graph sequences with bounded degree, our consistency result is as strong as prior Gaussian results. In simulations, the ‘Rank PC’ algorithm works as well as the ‘Pearson PC’ algorithm for normal data and considerably better for non-normal data, all the while incurring a negligible increase of computation time. While our interest is in the PC algorithm, the presented analysis of error propagation could be applied to other algorithms that test the vanishing of low-order partial correlations. Keywords: Gaussian copula, graphical model, model selection, multivariate normal distribution, nonparanormal distribution

3 0.079401754 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

Author: Tingni Sun, Cun-Hui Zhang

Abstract: We propose a new method of learning a sparse nonnegative-deﬁnite target matrix. Our primary example of the target matrix is the inverse of a population covariance or correlation matrix. The algorithm ﬁrst estimates each column of the target matrix by the scaled Lasso and then adjusts the matrix estimator to be symmetric. The penalty level of the scaled Lasso for each column is completely determined by data via convex minimization, without using cross-validation. We prove that this scaled Lasso method guarantees the fastest proven rate of convergence in the spectrum norm under conditions of weaker form than those in the existing analyses of other ℓ1 regularized algorithms, and has faster guaranteed rate of convergence when the ratio of the ℓ1 and spectrum norms of the target inverse matrix diverges to inﬁnity. A simulation study demonstrates the computational feasibility and superb performance of the proposed method. Our analysis also provides new performance bounds for the Lasso and scaled Lasso to guarantee higher concentration of the error at a smaller threshold level than previous analyses, and to allow the use of the union bound in column-by-column applications of the scaled Lasso without an adjustment of the penalty level. In addition, the least squares estimation after the scaled Lasso selection is considered and proven to guarantee performance bounds similar to that of the scaled Lasso. Keywords: precision matrix, concentration matrix, inverse matrix, graphical model, scaled Lasso, linear regression, spectrum norm

4 0.063187093 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

Author: Tony Cai, Jianqing Fan, Tiefeng Jiang

Abstract: This paper studies the asymptotic behaviors of the pairwise angles among n randomly and uniformly distributed unit vectors in R p as the number of points n → ∞, while the dimension p is either ﬁxed or growing with n. For both settings, we derive the limiting empirical distribution of the random angles and the limiting distributions of the extreme angles. The results reveal interesting differences in the two settings and provide a precise characterization of the folklore that “all high-dimensional random vectors are almost always nearly orthogonal to each other”. Applications to statistics and machine learning and connections with some open problems in physics and mathematics are also discussed. Keywords: random angle, uniform distribution on sphere, empirical law, maximum of random variables, minimum of random variables, extreme-value distribution, packing on sphere

5 0.061193813 32 jmlr-2013-Differential Privacy for Functions and Functional Data

Author: Rob Hall, Alessandro Rinaldo, Larry Wasserman

Abstract: Differential privacy is a rigorous cryptographically-motivated characterization of data privacy which may be applied when releasing summaries of a database. Previous work has focused mainly on methods for which the output is a ﬁnite dimensional vector, or an element of some discrete set. We develop methods for releasing functions while preserving differential privacy. Speciﬁcally, we show that adding an appropriate Gaussian process to the function of interest yields differential privacy. When the functions lie in the reproducing kernel Hilbert space (RKHS) generated by the covariance kernel of the Gaussian process, then the correct noise level is established by measuring the “sensitivity” of the function in the RKHS norm. As examples we consider kernel density estimation, kernel support vector machines, and functions in RKHSs. Keywords: differential privacy, density estimation, Gaussian processes, reproducing kernel Hilbert space

6 0.046304107 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

7 0.045848977 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

8 0.041307107 22 jmlr-2013-Classifying With Confidence From Incomplete Information

9 0.040986307 90 jmlr-2013-Quasi-Newton Method: A New Direction

10 0.037571859 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

11 0.035933349 73 jmlr-2013-Multicategory Large-Margin Unified Machines

12 0.033934847 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit

13 0.033579431 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

14 0.03329904 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

15 0.032035973 95 jmlr-2013-Ranking Forests

16 0.029853133 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

17 0.02831633 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

18 0.028173886 31 jmlr-2013-Derivative Estimation with Local Polynomial Fitting

19 0.028063724 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification

20 0.028010409 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.166), (1, 0.02), (2, 0.046), (3, 0.043), (4, 0.133), (5, -0.017), (6, 0.029), (7, 0.118), (8, 0.121), (9, 0.079), (10, -0.067), (11, 0.002), (12, -0.007), (13, -0.191), (14, -0.161), (15, -0.048), (16, 0.276), (17, -0.124), (18, -0.132), (19, -0.173), (20, 0.038), (21, 0.063), (22, 0.158), (23, -0.024), (24, 0.142), (25, -0.047), (26, -0.206), (27, -0.108), (28, -0.169), (29, 0.106), (30, 0.106), (31, -0.105), (32, -0.084), (33, -0.094), (34, -0.055), (35, 0.19), (36, -0.114), (37, 0.011), (38, 0.109), (39, -0.133), (40, -0.103), (41, 0.036), (42, 0.046), (43, -0.148), (44, -0.036), (45, -0.213), (46, 0.026), (47, 0.015), (48, 0.003), (49, -0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94492054 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

Author: Fang Han, Tuo Zhao, Han Liu

2 0.60594171 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

Author: Naftali Harris, Mathias Drton

3 0.33546937 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

Author: Tony Cai, Jianqing Fan, Tiefeng Jiang

4 0.27585796 73 jmlr-2013-Multicategory Large-Margin Unified Machines

Author: Chong Zhang, Yufeng Liu

Abstract: Hard and soft classiﬁers are two important groups of techniques for classiﬁcation problems. Logistic regression and Support Vector Machines are typical examples of soft and hard classiﬁers respectively. The essential difference between these two groups is whether one needs to estimate the class conditional probability for the classiﬁcation task or not. In particular, soft classiﬁers predict the label based on the obtained class conditional probabilities, while hard classiﬁers bypass the estimation of probabilities and focus on the decision boundary. In practice, for the goal of accurate classiﬁcation, it is unclear which one to use in a given situation. To tackle this problem, the Largemargin Uniﬁed Machine (LUM) was recently proposed as a uniﬁed family to embrace both groups. The LUM family enables one to study the behavior change from soft to hard binary classiﬁers. For multicategory cases, however, the concept of soft and hard classiﬁcation becomes less clear. In that case, class probability estimation becomes more involved as it requires estimation of a probability vector. In this paper, we propose a new Multicategory LUM (MLUM) framework to investigate the behavior of soft versus hard classiﬁcation under multicategory settings. Our theoretical and numerical results help to shed some light on the nature of multicategory classiﬁcation and its transition behavior from soft to hard classiﬁers. The numerical results suggest that the proposed tuned MLUM yields very competitive performance. Keywords: hard classiﬁcation, large-margin, soft classiﬁcation, support vector machine

5 0.26795682 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

Author: Tingni Sun, Cun-Hui Zhang

6 0.25176013 22 jmlr-2013-Classifying With Confidence From Incomplete Information

7 0.23030518 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

8 0.21461174 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

9 0.20877922 90 jmlr-2013-Quasi-Newton Method: A New Direction

10 0.18082906 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

11 0.17997003 32 jmlr-2013-Differential Privacy for Functions and Functional Data

12 0.17097168 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java

13 0.15600753 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

14 0.15062556 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

15 0.14242151 51 jmlr-2013-Greedy Sparsity-Constrained Optimization

16 0.14046714 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

17 0.14015126 60 jmlr-2013-Learning Bilinear Model for Matching Queries and Documents

18 0.13887799 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning

19 0.13792148 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

20 0.13203348 34 jmlr-2013-Distance Preserving Embeddings for General n-Dimensional Manifolds

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (5, 0.18), (6, 0.023), (10, 0.046), (20, 0.019), (23, 0.025), (30, 0.412), (68, 0.017), (70, 0.026), (75, 0.061), (85, 0.037), (87, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.69599473 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis

Author: Fang Han, Tuo Zhao, Han Liu

2 0.43323275 15 jmlr-2013-Bayesian Canonical Correlation Analysis

Author: Arto Klami, Seppo Virtanen, Samuel Kaski

Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulﬁlling the promise yet; we introduce a novel efﬁcient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-speciﬁc components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment. Keywords: Bayesian modeling, canonical correlation analysis, group-wise sparsity, inter-battery factor analysis, variational Bayesian approximation

3 0.4331775 114 jmlr-2013-The Rate of Convergence of AdaBoost

Author: Indraneel Mukherjee, Cynthia Rudin, Robert E. Schapire

Abstract: The AdaBoost algorithm was designed to combine many “weak” hypotheses that perform slightly better than random guessing into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss.” Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are ﬁnite. Our ﬁrst result shows that the exponential loss of AdaBoost’s computed parameter vector will be at most ε more than that of any parameter vector of ℓ1 -norm bounded by B in a number of rounds that is at most a polynomial in B and 1/ε. We also provide lower bounds showing that a polynomial dependence is necessary. Our second result is that within C/ε iterations, AdaBoost achieves a value of the exponential loss that is at most ε more than the best possible value, where C depends on the data set. We show that this dependence of the rate on ε is optimal up to constant factors, that is, at least Ω(1/ε) rounds are necessary to achieve within ε of the optimal exponential loss. Keywords: AdaBoost, optimization, coordinate descent, convergence rate

4 0.43296573 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

Author: Yuchen Zhang, John C. Duchi, Martin J. Wainwright

Abstract: We analyze two communication-efﬁcient algorithms for distributed optimization in statistical settings involving large-scale data sets. The ﬁrst algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves √ mean-squared error (MSE) that decays as O (N −1 + (N/m)−2 ). Whenever m ≤ N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O (N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O (N −1 + (N/m)−3/2 ), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efﬁciently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N ≈ 2.4 × 108 samples and d ≈ 740,000 covariates. Keywords: distributed learning, stochastic optimization, averaging, subsampling

5 0.43103731 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

Author: Tony Cai, Wen-Xin Zhou

Abstract: We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed. Keywords: 1-bit matrix completion, low-rank matrix, max-norm, trace-norm, constrained optimization, maximum likelihood estimate, optimal rate of convergence

6 0.42753601 73 jmlr-2013-Multicategory Large-Margin Unified Machines

7 0.42619675 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

8 0.42611304 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

9 0.42401114 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

10 0.42322218 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression

11 0.42204034 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

12 0.42147228 87 jmlr-2013-Performance Bounds for λ Policy Iteration and Application to the Game of Tetris

13 0.42033654 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

14 0.41997072 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

15 0.4199267 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach

16 0.41918492 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

17 0.41889557 8 jmlr-2013-A Theory of Multiclass Boosting

18 0.41768619 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

19 0.41678655 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

20 0.41658583 68 jmlr-2013-Machine Learning with Operational Costs