nips nips2012 nips2012-211 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Melanie Rey, Volker Roth
Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ch Abstract We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. [sent-5, score-0.804]
2 Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. [sent-6, score-0.889]
3 This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. [sent-7, score-0.077]
4 1 Introduction The information bottleneck method (IB) [1] considers the concept of relevant information in the data compression problem, and takes a new perspective to signal compression which was classically treated using rate distortion theory. [sent-8, score-0.438]
5 The IB method formalizes the idea of relevance, or meaningful information, by introducing a relevance variable Y . [sent-9, score-0.101]
6 The problem is then to obtain an optimal compression T of the data X which preserves a maximum of information about Y . [sent-10, score-0.198]
7 Although the IB method beautifully formalizes the compression problem under relevance constraints, the practical solution of this problem remains difficult, particularly in high dimensions, since the mutual informations I(X; T ), I(Y ; T ) must be estimated. [sent-11, score-0.393]
8 The IB optimization problem has no available analytical solution in the general case. [sent-12, score-0.097]
9 In the continuous case, estimation of multivariate densities becomes arduous and can be a major impediment to the practical application of IB. [sent-15, score-0.137]
10 A notable exception is the case of joint Gaussian (X, Y ) for which an analytical solution for the optimal representation T exists [3]. [sent-16, score-0.146]
11 The optimal T is jointly Gaussian with (X, Y ) [4] and takes the form of a noisy linear projection to eigenvectors of the normalised conditional covariance matrix. [sent-17, score-0.165]
12 The existence of an analytical solution opens new application possibilities and IB becomes practically feasible in higher dimensions [5]. [sent-18, score-0.143]
13 The practical usefulness of the Gaussian IB (GIB), on the other hand, suffers from its missing flexibility and the statistical problem of finding a robust estimate of the joint covariance matrix of (X, Y ) in high-dimensional spaces. [sent-20, score-0.094]
14 Compression and relevance in IB are defined in terms of mutual information (MI) of two random vectors V and W , which is defined as the reduction in the entropy of V by the conditional entropy of V given W . [sent-21, score-0.193]
15 MI bears an interesting relationship to copulas: mutual information equals negative copula entropy [6]. [sent-22, score-0.71]
16 In this work we reformulate the IB problem for the continuous variables in terms of copulas and enlighten that IB is completely independent of the marginal distributions of X, Y . [sent-24, score-0.202]
17 The IB problem in the continuous case is in fact to find the optimal copula (or dependence structure) of T and X, knowing the copula of X and the relevance variable Y . [sent-25, score-1.412]
18 We focus on the case of Gaussian copula and on the consequences of the IB reformulation for the Gaussian IB. [sent-26, score-0.658]
19 We show that the analytical solution available for GIB can naturally be extended to multivariate distributions with Gaussian copula and arbitrary marginal densities, also called meta-Gaussian densities. [sent-27, score-0.829]
20 Moreover, we show that the GIB solution depends only a correlation matrix, and not on the variance. [sent-28, score-0.094]
21 This allows us to use robust rank correlation estimators instead of unstable covariance estimators, and gives a robust version of GIB. [sent-29, score-0.168]
22 No analytical solution is available for the general problem defined by (1) and this joint distribution must be calculated with an iterative procedure. [sent-38, score-0.127]
23 However, when X and Y are jointly multivariate Gaussian distributed, this problem becomes analytically tractable. [sent-43, score-0.097]
24 Consider two joint Gaussian random vectors (rv) X and Y with zero mean: (X, Y ) ∼ N 0p+q , Σ = Σx Σxy ΣT xy Σy , (2) where p is the dimension of X, q is the dimension of Y and 0p+q is the zero vector of dimension p + q. [sent-47, score-0.109]
25 In [4] it is proved that the optimal compression T is also jointly Gaussian with X and Y . [sent-48, score-0.234]
26 (4) A,Σξ For a given trade-off parameter β, the optimal compression is given by T ∼ N (0p , Σt ) with Σt = AΣx AT + Σξ and the noise variance can be fixed to the identity matrix Σξ = Ip , as shown in [3]. [sent-51, score-0.226]
27 We can see from equation (5) that the optimal projection of X is a combination of weighted eigenvectors of Σx|y Σ−1 . [sent-73, score-0.113]
28 A multivariate distribution consists of univariate random variables related to each other by a dependence mechanism. [sent-77, score-0.15]
29 Copulas provide a framework to separate the dependence structure from the marginal distributions. [sent-78, score-0.094]
30 Formally, a d-dimensional copula is a multivariate distribution function C : [0, 1]d → [0, 1] with standard uniform margins. [sent-79, score-0.681]
31 Sklar’s theorem [7] states the relationship between copulas and multivariate distributions. [sent-80, score-0.183]
32 Any joint distribution function F can be represented using its marginal univariate distribution functions and a copula: F (z1 , . [sent-81, score-0.108]
33 (6) If the margins are continuous, then this copula is unique. [sent-88, score-0.768]
34 , Fd are univariate distribution functions, then F defined as in (6) is a valid multivariate distribution function with margins F1 , . [sent-92, score-0.253]
35 Assuming that C has d-th order partial derivatives we can define the copula density function: c(u1 , . [sent-96, score-0.677]
36 ∂ud sponding to (6) can then be rewritten as a product of the marginal densities and the copula density d function: f (z1 , . [sent-108, score-0.743]
37 [8]), the copula of Nd (µ, Σ) is the same as the copula of Nd (0, P ), where P is the correlation matrix corresponding to the covariance matrix Σ. [sent-118, score-1.406]
38 Thus a Gaussian copula is uniquely determined by a correlation matrix P and we denote a Gaussian copula by CP . [sent-119, score-1.342]
39 Using equation (6) with CP , we can construct multivariate distributions with arbitrary margins and a Gaussian dependence structure. [sent-120, score-0.291]
40 Gaussian copulas conveniently have a copula density function: 1 1 cP (u) = |P |− 2 exp − Φ−1 (u)T (P −1 − I)Φ−1 (u) , 2 (7) where Φ−1 (u) is a short notation for the univariate Gaussian quantile function applied to each component Φ−1 (u) = (Φ−1 (u1 ), . [sent-122, score-0.843]
41 At the heart of the copula formulation of IB is the following identity: for a continuous random vector Z = (Z1 , . [sent-128, score-0.649]
42 , Zd ) with density f (z) and copula density cZ (u) the multivariate mutual information 3 or multi-information is the negative differential entropy of the copula density: I(Z) ≡ Dkl (f (z) cZ (u) log cZ (u)du = −H(cZ ), f0 (z)) = (8) [0,1]d where u = (u1 , . [sent-131, score-1.522]
43 For continuous multivariate X, Y and T , equation (8) implies that: I(X; T ) = Dkl (f (x, t) f0 (x, t)) − Dkl (f (x)||f0 (x)) − Dkl (f (t)||f0 (t)), = −H(cXT ) + H(cX ) + H(cT ), I(Y ; T ) = −H(cY T ) + H(cY ) + H(cT ), where cXT is the copula density of the vector (X1 , . [sent-138, score-0.787]
44 cXT (9) The minimization problem defined in (1) is solved under the assumption that the joint distribution of (X, Y ) is known, this now translates in the assumption that the copula copula density cXY (and thus cX ) is assumed to be known. [sent-149, score-1.327]
45 The density cT is entirely determined by cXT , and using the conditional independence structure it is clear that cY T is also determined by cXT when cXY is known. [sent-150, score-0.072]
46 We can finally rewrite the copula density of (Y, T ) as: cY T (uy , ut ) = cXY T (ux , uy , ut )dux = cXT (ux , ut )cXY (ux , uy ) dux . [sent-152, score-1.075]
47 cX (ux ) (13) The IB optimization problem actually reduces to finding an optimal copula density cXT . [sent-153, score-0.696]
48 This implies that in order to construct the compression variable T , the only relevant aspect is the copula dependence structure between X, T and Y . [sent-154, score-0.859]
49 The only known case for which a simple analytical solution to the IB problem exists is when (X, Y ) are joint Gaussians. [sent-158, score-0.127]
50 Equation (9) shows that actually an optimal solution does not depend of the margins but only on the copula density cXY . [sent-159, score-0.864]
51 From this observation the idea naturally follows that an analytical solution should also exist for any joint distribution of (X, Y ) which has a Gaussian copula, and that regardless of its margins. [sent-160, score-0.127]
52 (14) Since copulas are invariant to strictly increasing transformations the normal scores have the same copulas as the original variables X and Y . [sent-167, score-0.331]
53 Optimality of meta-Gaussian IB Consider rv X, Y with a Gaussian dependence structure and arbitrary margins: FX,Y (x, y) ∼ CP (FX1 (x1 ), . [sent-170, score-0.093]
54 , FYq (yq )), (15) where FXi , FYi are the marginal distributions of X, Y and CP is a Gaussian copula parametrized by a correlation matrix P . [sent-176, score-0.773]
55 Then the optimum of the minimization problem (1) is obtained for T ∈ T , where T is the set of all rv T such that (X, Y, T ) has a Gaussian copula and T has Gaussian margins. [sent-177, score-0.653]
56 If T ∈ T then (X, Y, T ) has a Gaussian copula which implies that (X, Y , T ) also ˜ ˜ has a Gaussian copula. [sent-185, score-0.62]
57 Since X, Y , T all have normally distributed margins it follows that ˜ ˜ (X, Y , T ) has a joint Gaussian distribution. [sent-186, score-0.194]
58 If (X, Y , T ) are jointly Gaussian then (X, Y , T ) has a Gaussian copula which implies that (X, Y, T ) has again a Gaussian copula. [sent-188, score-0.656]
59 Assume there exists T ∗ ∈ T such that: / L(X, Y, T ∗ ) := I(X; T ∗ ) − βI(Y ; T ∗ ) < min p(t|x),T ∈T I(X; T ) − βI(T ; Y ) (16) ˜ ˜ ˜ ˜ Since (X, Y , T ) has the same copula as (X, Y, T ), we have that I(X; T ) = I(X; T ) and I(Y ; T ) = I(Y ; T ). [sent-194, score-0.62]
60 This is in contradiction with the optimality of Gaussian information bottleneck, which states that the optimal T is jointly Gaussian with (X, Y ). [sent-198, score-0.072]
61 Thus the optimum for meta-Gaussian (X, Y ) is attained for T with normal margins such that (X, Y, T ) also is meta-Gaussian. [sent-199, score-0.188]
62 The optimal projection T o obtained for (X, Y ) is also optimal for (X, Y ). [sent-202, score-0.075]
63 By the above we know that an optimal compression for (X, Y ) can be obtained in the set of ˜ ˜ ˜ variables T such that (X, Y , T ) is jointly Gaussian, since L = L it is clear that T o is also optimal for (X, Y ). [sent-204, score-0.253]
64 1, for any random vector (X, Y ) having a Gaussian copula dependence structure, an optimal projection T can be obtained by first calculating the vector of the normal ˜ ˜ ˜ scores (X, Y ) and then computing T = AX + ξ. [sent-206, score-0.808]
65 A is here entirely determined by the covariance ˜ Y ) which also equals its correlation matrix (the normal scores have unit ˜ matrix of the vector (X, variance by definition), and thus the correlation matrix P parametrizing the Gaussian copula CP . [sent-207, score-1.043]
66 In practice the problem is reduced to the estimation the Gaussian copula of (X, Y ). [sent-208, score-0.635]
67 , Zd ) with copula CPz is: 1 1 ˜ ˜ ˜ I(Z) = I(Z) = − 1 log |cov(Z)| = − 2 log |Σz | = − 1 log |corr(Z)| = − 2 log |Pz |, ˜ 2 2 (18) where |. [sent-215, score-0.688]
68 The mutual information between Px Pyx X and Y is then I(X; Y ) = − 1 log |P |+ 1 log |Px |+ 1 log |Py |, where P = . [sent-218, score-0.117]
69 It 2 2 2 Pxy Py is obvious that the formula for the meta-Gaussian is similar to the formula for the Gaussian case 1 IGauss (X; Y ) = − 1 log |Σ|+ 1 log |Σx |+ 2 log |Σy |, but uses the correlation matrix parametrizing 2 2 the copula instead of the data covariance matrix. [sent-219, score-0.877]
70 Semi-parametric copula estimation has been studied in [10], [11] and [12]. [sent-223, score-0.635]
71 The main idea is to combine non-parametric estimation of the margins with a parametric copula model, in our case the Gaussian copulas family. [sent-224, score-0.905]
72 , Fd of a random vector Z are known, P can be ˆ estimated by the matrix P with elements given by: 1 n ˆ P(k,l) = 1 n n i=1 n i=1 Φ−1 (Fk (zik ))Φ−1 (Fl (zil )) 2 1 [Φ−1 (Fk (zik ))] n n i=1 [Φ−1 (F 2 1/2 , (19) l (zil ))] ˆ where zik denotes the i-th observation of dimension k. [sent-228, score-0.126]
73 If the margins are unknown we can instead use the rescaled empirical cumulative distributions: ˆ Fj (t) = n n+1 1 n n . [sent-230, score-0.165]
74 The normal scores rank correlation ˆ coefficient is the matrix P n with elements: ˆn P(k,l) = n i=1 Φ−1 ( R(zik ) )Φ−1 ( R(zil ) ) n+1 n+1 n i=1 i Φ−1 ( n+1 ) 2 , (21) where R(zik ) denotes the rank of the i-th observation for dimension k. [sent-234, score-0.285]
75 Using (21) we compute an estimate of the correlation matrix P parametrizing cXY and obtain the transformation matrix A as detailed in Algorithm 1. [sent-236, score-0.221]
76 Compute the normal scores rank correlation estimate P n of the correlation matrix P parametrizing cXY : for k, l = 1, . [sent-238, score-0.371]
77 Compute the estimated conditional covariance matrix of the normal scores: Σx|˜ = Px − n ˆn ˆxy (Py )−1 Pyx . [sent-243, score-0.104]
78 1 Results Simulations We tested meta-Gaussian IB (MGIB) in two different setting, first when the data is Gaussian but contains outliers, second when the data has a Gaussian copula but non-Gaussian margins. [sent-249, score-0.62]
79 A covariance matrix was drawn from a Wishart distribution centered at a correlation matrix populated with a few high correlation values to ensure some dependency between X and Y . [sent-251, score-0.24]
80 This matrix was then scaled to obtain the correlation matrix parametrizing the copula. [sent-252, score-0.198]
81 For each training sample two projection ˆ matrices AG and AC were computed, AG was calculated based on the sample covariance Σn and ˆ AC was obtained using the normal scores rank correlation P n . [sent-257, score-0.274]
82 The compression quality of the projection was then tested on a test sample of n = 10 000 observations generated independently from the same distribution (without outliers). [sent-258, score-0.216]
83 The mutual informations I(X; T ) and (Y ; T ) can be reliably estimated on the test sample using (18) and (21). [sent-262, score-0.093]
84 The best information curves are situated in the upper left corner of the figure, since for a fixed compression value I(X; T ) we want to achieve the highest relevant information content (I; T ). [sent-264, score-0.218]
85 We clearly see in Figure 2 that MGIB consistently outperforms GIB in that it achieves higher compression rates. [sent-265, score-0.179]
86 Since GIB suffers from a model mismatch problem when the margins are not Gaussian, the curves saturate for smaller values of I(Y ; T ). [sent-270, score-0.213]
87 In a pre-processing step we selected the dx = 10 dimensions with the strongest absolute rank correlation to one of the relevance variables. [sent-275, score-0.211]
88 To still give a graphical representation of our results we show in Figure 3 non-parametric density estimates of the one dimensional compression T split in 5 groups according to corresponding values of the first relevance variable. [sent-277, score-0.315]
89 It is obvious from Figure 3 that the one-dimensional MGIB compression nicely separates the different target classes, whereas the GIB and PCA projections seem to contain much less information about the target variable. [sent-280, score-0.179]
90 We conclude that similar to our synthetic examples above, the MGIB compression contains more information about the relevance variable than GIB at the same compression rate. [sent-281, score-0.437]
91 5) First component of compression T First component of compression T first PCA projection Figure 3: Parzen density estimates of the univariate projection of X split in 5 groups according to values of the first relevance variable. [sent-288, score-0.612]
92 We see more separation between groups for MGIB than for GIB or PCA, which indicates that the projection is more informative about the relevance variable. [sent-289, score-0.116]
93 6 Conclusion We present a reformulation of the IB problem in terms of copula which gives new insights into data compression with relevance constraints and opens new possible applications of IB for continuous multivariate data. [sent-290, score-1.034]
94 Meta-Gaussian IB naturally extends the analytical solution of Gaussian IB to multivariate distributions with Gaussian copula and arbitrary marginal density. [sent-291, score-0.829]
95 It can be applied to any type of continuous data, provided the assumption of a Gaussian dependence structure is reasonable, in which case the optimal compression can easily be obtained by semi-parametric copula estimation. [sent-292, score-0.907]
96 Simulated experiments showed that MGIB clearly outperforms GIB when the marginal densities are not Gaussian, and even in the Gaussian case with a tiny amount of outliers MGIB has been shown to significantly benefit from the robustness properties of rank estimators. [sent-293, score-0.176]
97 In future work, it would be interesting to see if the copula formulation of IB admits analytical solutions for other copula families. [sent-294, score-1.317]
98 On the optimality of the Gaussian information bottleneck curve. [sent-326, score-0.097]
99 A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. [sent-362, score-0.149]
100 Estimation of R´ nyi entropy and mutual information based on a o a e generalized nearest-neighbor graphs. [sent-382, score-0.09]
wordName wordTfidf (topN-words)
[('copula', 0.62), ('ib', 0.422), ('gib', 0.268), ('mgib', 0.251), ('compression', 0.179), ('ux', 0.154), ('margins', 0.148), ('copulas', 0.122), ('cxt', 0.115), ('cxy', 0.109), ('gaussian', 0.101), ('uy', 0.1), ('zik', 0.082), ('bottleneck', 0.08), ('relevance', 0.079), ('zd', 0.079), ('analytical', 0.077), ('correlation', 0.074), ('parametrizing', 0.068), ('zil', 0.067), ('mutual', 0.066), ('cy', 0.065), ('cp', 0.063), ('multivariate', 0.061), ('fd', 0.061), ('cx', 0.06), ('density', 0.057), ('dkl', 0.056), ('ut', 0.055), ('outliers', 0.054), ('ud', 0.054), ('cz', 0.049), ('scores', 0.047), ('dependence', 0.045), ('univariate', 0.044), ('rank', 0.04), ('normal', 0.04), ('curves', 0.039), ('px', 0.039), ('ct', 0.039), ('reformulation', 0.038), ('eigenvectors', 0.037), ('projection', 0.037), ('jointly', 0.036), ('covariance', 0.036), ('pca', 0.034), ('marginal', 0.034), ('dux', 0.033), ('fxp', 0.033), ('pyx', 0.033), ('rv', 0.033), ('densities', 0.032), ('proposition', 0.031), ('xy', 0.031), ('py', 0.031), ('joint', 0.03), ('beta', 0.03), ('rey', 0.03), ('basel', 0.03), ('tishby', 0.03), ('continuous', 0.029), ('opens', 0.028), ('matrix', 0.028), ('xp', 0.028), ('semiparametric', 0.028), ('informations', 0.027), ('pxy', 0.027), ('student', 0.027), ('saturate', 0.026), ('entropy', 0.024), ('transformation', 0.023), ('formalizes', 0.022), ('globerson', 0.02), ('solution', 0.02), ('ac', 0.02), ('equation', 0.02), ('optimal', 0.019), ('unstable', 0.018), ('ag', 0.018), ('dimensions', 0.018), ('decomposes', 0.018), ('rescaled', 0.017), ('log', 0.017), ('distributions', 0.017), ('optimality', 0.017), ('ax', 0.016), ('dimension', 0.016), ('normally', 0.016), ('fk', 0.016), ('robustness', 0.016), ('fj', 0.016), ('estimation', 0.015), ('structure', 0.015), ('compressed', 0.015), ('fyi', 0.015), ('czos', 0.015), ('corre', 0.015), ('fxi', 0.015), ('genest', 0.015), ('sabato', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 211 nips-2012-Meta-Gaussian Information Bottleneck
Author: Melanie Rey, Volker Roth
Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1
2 0.25893128 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)
Author: Gal Elidan, Cobi Cario
Abstract: The empirical success of the belief propagation approximate inference algorithm has inspired numerous theoretical and algorithmic advances. Yet, for continuous non-Gaussian domains performing belief propagation remains a challenging task: recent innovations such as nonparametric or kernel belief propagation, while useful, come with a substantial computational cost and offer little theoretical guarantees, even for tree structured models. In this work we present Nonparanormal BP for performing efficient inference on distributions parameterized by a Gaussian copulas network and any univariate marginals. For tree structured networks, our approach is guaranteed to be exact for this powerful class of non-Gaussian models. Importantly, the method is as efficient as standard Gaussian BP, and its convergence properties do not depend on the complexity of the univariate marginals, even when a nonparametric representation is used. 1
3 0.10851222 351 nips-2012-Transelliptical Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost. 1
4 0.066613935 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation
Author: Tuo Zhao, Kathryn Roeder, Han Liu
Abstract: We introduce a new learning algorithm, named smooth-projected neighborhood pursuit, for estimating high dimensional undirected graphs. In particularly, we focus on the nonparanormal graphical model and provide theoretical guarantees for graph estimation consistency. In addition to new computational and theoretical analysis, we also provide an alternative view to analyze the tradeoff between computational efficiency and statistical error under a smoothing optimization framework. Numerical results on both synthetic and real datasets are provided to support our theory. 1
5 0.057658967 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
Author: Po-ling Loh, Martin J. Wainwright
Abstract: We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a non-Gaussian distribution. Based on our population-level results, we show how the graphical Lasso may be used to recover the edge structure of certain classes of discrete graphical models, and present simulations to verify our theoretical results. 1
6 0.05569075 310 nips-2012-Semiparametric Principal Component Analysis
7 0.054188676 37 nips-2012-Affine Independent Variational Inference
8 0.053899523 352 nips-2012-Transelliptical Graphical Models
9 0.045115273 343 nips-2012-Tight Bounds on Profile Redundancy and Distinguishability
10 0.044918243 216 nips-2012-Mirror Descent Meets Fixed Share (and feels no regret)
11 0.04409565 227 nips-2012-Multiclass Learning with Simplex Coding
12 0.043840401 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
13 0.041337036 16 nips-2012-A Polynomial-time Form of Robust Regression
14 0.040007647 187 nips-2012-Learning curves for multi-task Gaussian process regression
15 0.039516158 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection
16 0.038514733 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests
17 0.036223512 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation
18 0.035980649 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
19 0.035222162 254 nips-2012-On the Sample Complexity of Robust PCA
20 0.035059776 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
topicId topicWeight
[(0, 0.105), (1, 0.036), (2, 0.042), (3, -0.029), (4, -0.024), (5, 0.017), (6, 0.045), (7, -0.003), (8, -0.064), (9, -0.036), (10, -0.025), (11, -0.025), (12, -0.004), (13, -0.037), (14, -0.019), (15, 0.001), (16, 0.148), (17, -0.069), (18, -0.066), (19, 0.051), (20, 0.032), (21, -0.067), (22, -0.023), (23, 0.062), (24, -0.053), (25, -0.024), (26, -0.084), (27, -0.071), (28, 0.019), (29, 0.055), (30, 0.005), (31, 0.004), (32, 0.076), (33, -0.01), (34, -0.083), (35, 0.047), (36, -0.034), (37, 0.012), (38, -0.064), (39, 0.022), (40, 0.031), (41, 0.044), (42, 0.067), (43, -0.051), (44, -0.05), (45, 0.077), (46, -0.029), (47, 0.101), (48, -0.061), (49, -0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.90820211 211 nips-2012-Meta-Gaussian Information Bottleneck
Author: Melanie Rey, Volker Roth
Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1
2 0.67883354 351 nips-2012-Transelliptical Component Analysis
Author: Fang Han, Han Liu
Abstract: We propose a high dimensional semiparametric scale-invariant principle component analysis, named TCA, by utilize the natural connection between the elliptical distribution family and the principal component analysis. Elliptical distribution family includes many well-known multivariate distributions like multivariate Gaussian, t and logistic and it is extended to the meta-elliptical by Fang et.al (2002) using the copula techniques. In this paper we extend the meta-elliptical distribution family to a even larger family, called transelliptical. We prove that TCA can obtain a near-optimal s log d/n estimation consistency rate in recovering the leading eigenvector of the latent generalized correlation matrix under the transelliptical distribution family, even if the distributions are very heavy-tailed, have infinite second moments, do not have densities and possess arbitrarily continuous marginal distributions. A feature selection result with explicit rate is also provided. TCA is further implemented in both numerical simulations and largescale stock data to illustrate its empirical usefulness. Both theories and experiments confirm that TCA can achieve model flexibility, estimation accuracy and robustness at almost no cost. 1
3 0.67370611 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)
Author: Gal Elidan, Cobi Cario
Abstract: The empirical success of the belief propagation approximate inference algorithm has inspired numerous theoretical and algorithmic advances. Yet, for continuous non-Gaussian domains performing belief propagation remains a challenging task: recent innovations such as nonparametric or kernel belief propagation, while useful, come with a substantial computational cost and offer little theoretical guarantees, even for tree structured models. In this work we present Nonparanormal BP for performing efficient inference on distributions parameterized by a Gaussian copulas network and any univariate marginals. For tree structured networks, our approach is guaranteed to be exact for this powerful class of non-Gaussian models. Importantly, the method is as efficient as standard Gaussian BP, and its convergence properties do not depend on the complexity of the univariate marginals, even when a nonparametric representation is used. 1
4 0.66915596 352 nips-2012-Transelliptical Graphical Models
Author: Han Liu, Fang Han, Cun-hui Zhang
Abstract: We advocate the use of a new distribution family—the transelliptical—for robust inference of high dimensional graphical models. The transelliptical family is an extension of the nonparanormal family proposed by Liu et al. (2009). Just as the nonparanormal extends the normal by transforming the variables using univariate functions, the transelliptical extends the elliptical family in the same way. We propose a nonparametric rank-based regularization estimator which achieves the parametric rates of convergence for both graph recovery and parameter estimation. Such a result suggests that the extra robustness and flexibility obtained by the semiparametric transelliptical modeling incurs almost no efficiency loss. We also discuss the relationship between this work with the transelliptical component analysis proposed by Han and Liu (2012). 1
5 0.57217854 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation
Author: Tuo Zhao, Kathryn Roeder, Han Liu
Abstract: We introduce a new learning algorithm, named smooth-projected neighborhood pursuit, for estimating high dimensional undirected graphs. In particularly, we focus on the nonparanormal graphical model and provide theoretical guarantees for graph estimation consistency. In addition to new computational and theoretical analysis, we also provide an alternative view to analyze the tradeoff between computational efficiency and statistical error under a smoothing optimization framework. Numerical results on both synthetic and real datasets are provided to support our theory. 1
6 0.43380773 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning
7 0.39935702 37 nips-2012-Affine Independent Variational Inference
8 0.38996005 145 nips-2012-Gradient Weights help Nonparametric Regressors
9 0.38223836 268 nips-2012-Perfect Dimensionality Recovery by Variational Bayesian PCA
10 0.38105759 254 nips-2012-On the Sample Complexity of Robust PCA
11 0.37606749 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo
12 0.37231147 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
13 0.37107927 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
14 0.37053779 221 nips-2012-Multi-Stage Multi-Task Feature Learning
15 0.35925069 312 nips-2012-Simultaneously Leveraging Output and Task Structures for Multiple-Output Regression
16 0.35185972 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy
17 0.34915736 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests
18 0.34286872 171 nips-2012-Latent Coincidence Analysis: A Hidden Variable Model for Distance Metric Learning
19 0.34155256 192 nips-2012-Learning the Dependency Structure of Latent Factors
20 0.34058419 144 nips-2012-Gradient-based kernel method for feature extraction and variable selection
topicId topicWeight
[(0, 0.014), (21, 0.026), (38, 0.099), (39, 0.039), (42, 0.022), (53, 0.01), (54, 0.013), (55, 0.407), (74, 0.026), (76, 0.113), (80, 0.084), (92, 0.04)]
simIndex simValue paperId paperTitle
1 0.84053057 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition
Author: Francesco Dinuzzo, Bernhard Schölkopf
Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1
2 0.83850813 158 nips-2012-ImageNet Classification with Deep Convolutional Neural Networks
Author: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully-connected layers we employed a recently-developed regularization method called “dropout” that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry. 1
3 0.83709806 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts
Author: Francisco Ruiz, Isabel Valera, Carlos Blanco, Fernando Pérez-Cruz
Abstract: The National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) database contains a large amount of information, regarding the way of life, medical conditions, etc., of a representative sample of the U.S. population. In this paper, we are interested in seeking the hidden causes behind the suicide attempts, for which we propose to model the subjects using a nonparametric latent model based on the Indian Buffet Process (IBP). Due to the nature of the data, we need to adapt the observation model for discrete random variables. We propose a generative model in which the observations are drawn from a multinomial-logit distribution given the IBP matrix. The implementation of an efficient Gibbs sampler is accomplished using the Laplace approximation, which allows integrating out the weighting factors of the multinomial-logit likelihood model. Finally, the experiments over the NESARC database show that our model properly captures some of the hidden causes that model suicide attempts. 1
4 0.82272738 155 nips-2012-Human memory search as a random walk in a semantic network
Author: Joseph L. Austerweil, Joshua T. Abbott, Thomas L. Griffiths
Abstract: The human mind has a remarkable ability to store a vast amount of information in memory, and an even more remarkable ability to retrieve these experiences when needed. Understanding the representations and algorithms that underlie human memory search could potentially be useful in other information retrieval settings, including internet search. Psychological studies have revealed clear regularities in how people search their memory, with clusters of semantically related items tending to be retrieved together. These findings have recently been taken as evidence that human memory search is similar to animals foraging for food in patchy environments, with people making a rational decision to switch away from a cluster of related information as it becomes depleted. We demonstrate that the results that were taken as evidence for this account also emerge from a random walk on a semantic network, much like the random web surfer model used in internet search engines. This offers a simpler and more unified account of how people search their memory, postulating a single process rather than one process for exploring a cluster and one process for switching between clusters. 1
same-paper 5 0.77098095 211 nips-2012-Meta-Gaussian Information Bottleneck
Author: Melanie Rey, Volker Roth
Abstract: We present a reformulation of the information bottleneck (IB) problem in terms of copula, using the equivalence between mutual information and negative copula entropy. Focusing on the Gaussian copula we extend the analytical IB solution available for the multivariate Gaussian case to distributions with a Gaussian dependence structure but arbitrary marginal densities, also called meta-Gaussian distributions. This opens new possibles applications of IB to continuous data and provides a solution more robust to outliers. 1
6 0.68410921 215 nips-2012-Minimizing Uncertainty in Pipelines
7 0.67167771 95 nips-2012-Density-Difference Estimation
8 0.54886162 306 nips-2012-Semantic Kernel Forests from Multiple Taxonomies
9 0.54297101 91 nips-2012-Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images
10 0.54111058 231 nips-2012-Multiple Operator-valued Kernel Learning
11 0.53610498 238 nips-2012-Neurally Plausible Reinforcement Learning of Working Memory Tasks
12 0.53410298 193 nips-2012-Learning to Align from Scratch
13 0.52749914 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
14 0.52164352 298 nips-2012-Scalable Inference of Overlapping Communities
15 0.5120703 188 nips-2012-Learning from Distributions via Support Measure Machines
16 0.51003158 4 nips-2012-A Better Way to Pretrain Deep Boltzmann Machines
17 0.50745976 93 nips-2012-Deep Spatio-Temporal Architectures and Learning for Protein Structure Prediction
18 0.50212318 210 nips-2012-Memorability of Image Regions
19 0.49416128 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
20 0.49367091 345 nips-2012-Topic-Partitioned Multinetwork Embeddings