nips nips2011 nips2011-83 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Oliver Stegle, Christoph Lippert, Joris M. Mooij, Neil D. Lawrence, Karsten M. Borgwardt
Abstract: Inference in matrix-variate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Efficient inference in matrix-variate Gaussian models with iid observation noise Oliver Stegle1 Max Planck Institutes T¨ bingen, Germany u stegle@tuebingen. [sent-1, score-0.208]
2 Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. [sent-15, score-0.2]
3 Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. [sent-16, score-0.173]
4 Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. [sent-17, score-0.723]
5 We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders. [sent-19, score-0.079]
6 These models with Kronecker factored covariance have applications in geostatistics [4], statistical testing on matrix-variate data [5] and statistical genetics [6]. [sent-23, score-0.258]
7 In prior work, different covariance functions for rows and columns have been combined in a flexible manner. [sent-24, score-0.175]
8 In other applications for prediction [2] and dimension reduction [8], combinations of free-form covariances with squared exponential covariances have been used. [sent-27, score-0.148]
9 1 In the absence of iid observation noise, an efficient inference scheme also known as the “flip-flop algorithm” can be derived. [sent-29, score-0.179]
10 In this iterative approach, estimation of the respective covariances is decoupled by rotating the data with respect to one of the covariances to optimize parameters of the other [7, 1]. [sent-30, score-0.148]
11 While this simplifying assumption of noise-free matrix-variate data has been used with some success, there are clear motivations for including iid noise in the model. [sent-31, score-0.143]
12 This effect, also known from the geostatistics literature [4], eliminates any benefit from multivariate prediction compared to na¨ve approaches. [sent-34, score-0.094]
13 The covariance matrix no longer directly factorizes into a Kronecker product, thus rendering simple approaches such as the “flip-flop algorithm” inappropriate. [sent-36, score-0.206]
14 Here, we address these shortcomings and propose a general framework for efficient inference in matrix-variate normal models that include iid observation noise. [sent-37, score-0.222]
15 Although in this model the covariance matrix no longer factorizes into a Kronecker product, we show how efficient parameter inference can still be done. [sent-38, score-0.243]
16 This allows for parameter learning of covariance matrices of size 105 × 105 , or even bigger, which would not be possible if done na¨vely. [sent-40, score-0.15]
17 ı First, we show how for any combination of covariances, evaluation of model likelihood and gradients with respect to individual covariance parameters is tractable. [sent-41, score-0.193]
18 Second, we apply this framework to structure learning in Gaussian graphical models, while accounting for a confounding non-iid sample structure. [sent-42, score-0.352]
19 This generalization of the Graphical Lasso [9, 10] (GLASSO) allows to jointly learn and account for a sparse inverse covariance matrix between features and a structured (nondiagonal) sample covariance. [sent-43, score-0.31]
20 The low rank component of the sample covariance is used to account for confounding effects, as is done in other models for genomics [11, 12]. [sent-44, score-0.418]
21 We illustrate this generalization called “Kronecker GLASSO” on synthetic datasets and heterogeneous protein signaling and gene expression data, where the aim is to recover the hidden network structures. [sent-45, score-0.3]
22 We show that our approach is able to recover the confounding structure, when it is known, and reveals sparse biological networks that are in better agreement with known components of the latent network structure. [sent-46, score-0.478]
23 2 Efficient inference in Kronecker Gaussian processes Assume we are given a data matrix Y ∈ RN ×D with N rows and D columns, where N is the number of samples with D features each. [sent-47, score-0.136]
24 As an example, think of N as a number of micro-array experiments, where in each experiment the expression levels of the same D genes are measured; here, yrc would be the expression level of gene c in experiment r. [sent-48, score-0.237]
25 aLM B For modeling Y as a matrix-variate normal distribution with iid observation noise, we first introduce N × D additional latent variables Z, which can be thought of as the noise-free observations. [sent-77, score-0.243]
26 The data Y is then given by Z plus iid Gaussian observation noise: p(Y | Z, σ 2 ) = N vec(Y) vec(Z), σ 2 IN ·D . [sent-78, score-0.166]
27 (2) Here, the matrix C is a D × D column covariance matrix and R is an N × N row covariance matrix that may depend on hyperparameters ΘC and ΘR respectively. [sent-80, score-0.421]
28 Note that for σ 2 = 0, the likelihood model in Equation (3) reduces to the matrix-variate normal distribution in Equation (2). [sent-83, score-0.086]
29 Likelihood evaluation Using these identities, the log of the likelihood in Equation (3) follows as 1 N ·D ln(2π) − ln SC ⊗ SR + σ 2 I 2 2 1 − vec(UT YUC )T (SC ⊗ SR + σ 2 I)−1 vec(UT YUC ). [sent-87, score-0.113]
30 (6) R R 2 This term can be interpreted as a multivariate normal distribution with diagonal covariance matrix (SC ⊗ SR + σ 2 I) on rotated data vec(UT YUC )T , similar to an approach that is used to speed up R mixed models in genetics [13]. [sent-88, score-0.291]
31 Runtime and memory complexity A na¨ve implementation for optimizing the likelihood (3) with ı respect to the hyperparameters would have runtime complexity O(N 3 D3 ) and memory complexity O(N 2 D2 ). [sent-93, score-0.11]
32 3 3 Graphical Lasso in the presence of confounders Estimation of sparse inverse covariance matrices is widely used to identify undirected network structures from observational data. [sent-96, score-0.502]
33 However, non-iid observations due to hidden confounding variables may hinder accurate recovery of the true network structure. [sent-97, score-0.305]
34 If not accounted for, confounders may lead to a large number of false positive edges. [sent-98, score-0.203]
35 As an application of the framework described in Section 2, we here propose an approach to learning sparse inverse covariance matrices between features, while accounting for covariation between samples due to confounders. [sent-100, score-0.353]
36 First, we briefly review the “orthogonal” approaches that account for the corresponding types of sample and feature covariance we set out to model. [sent-101, score-0.171]
37 It has been used in the context of biological studies to recover the hidden network structure of gene-gene interrelationships [14], for instance. [sent-104, score-0.113]
38 The GLASSO assumes a multivariate Gaussian distribution on features with a sparse precision (inverse covariance) matrix. [sent-105, score-0.083]
39 The sparsity is induced by an L1 penalty on the entries of C−1 , the inverse of the feature covariance matrix. [sent-106, score-0.231]
40 Under the simplifying assumption of iid samples, the posterior distribution of Y under this model is proportional to N p(Y, C−1 ) = p(C−1 ) N (Yr,: | 0D , C) . [sent-107, score-0.114]
41 2 Modeling confounders using the Gaussian process latent variable model Confounders are unobserved variables that can lead to spurious associations between observed variables and to covariation between samples. [sent-111, score-0.247]
42 A possible approach to identify such confounders is dimensionality reduction. [sent-112, score-0.16]
43 In the context of applications, these methods have previously been applied to identify regulatory processes [16], and to recover confounding factors with broad effects on many features [11, 12]. [sent-114, score-0.356]
44 In dual probabilistic PCA [15], the observed data Y is explained as a linear combination of K latent variables (“factors”), plus independent observation noise. [sent-115, score-0.11]
45 The model is as follows: Y = XW + E, where X ∈ RN ×K contains the values of K latent variables (“factors”), W ∈ RK×D contains independent standard-normally distributed weights that specify the mapping between latent and observed variables. [sent-116, score-0.116]
46 Finally, E ∈ RN ×D contains iid Gaussian noise with Erc ∼ N (0, σ 2 ). [sent-117, score-0.143]
47 p(Y | X) = (10) c=1 Learning the latent factors X and the observation noise variance σ 2 can be done by maximum likelihood. [sent-119, score-0.144]
48 , xsK ) for some covariance function κ : RK × RK → R. [sent-126, score-0.15]
49 Instead of treating either the samples or the features as being (conditionally) independent, we aim to learn a joint covariance for the observed data matrix Y. [sent-129, score-0.224]
50 We use the sparse L1 penalty (9) for the feature inverse covariance C−1 and use a linear kernel for the covariance on rows R = XXT + ρ2 IN . [sent-134, score-0.437]
51 Learning the model parameters proceeds via MAP inference, optimizing the log likelihood implied by Equation (11) with respect to X and C−1 , and the hyperparameters σ 2 , ρ2 . [sent-135, score-0.08]
52 By combining the GLASSO and GPLVM in this way, we can recover network structure in the presence of confounders. [sent-136, score-0.092]
53 ı Even using the tricks discussed in Section 2, free-form sparse inverse covariance updates for C−1 are intractable under the L1 penalty when depending on gradient updates. [sent-142, score-0.285]
54 Similar as in Section 2, the first step towards efficient inference is to introduce N × D additional latent variables Z, which can be thought of as the noise-free observations: p(Y|Z, σ 2 ) = N vec(Y) vec(Z), σ 2 IN ·D (12) p(Z|R, C) = N (vec(Z) | 0N ·D , C ⊗ R) . [sent-143, score-0.095]
55 Compared are the standard GLASSO, our algorithm with Kronecker structure (Kronecker GLASSO) and as a reference an idealized setting, applying standard GLASSO to a similar dataset without confounding influences (Ideal GLASSO). [sent-155, score-0.281]
56 The model that accounts for confounders approaches the performance of an idealized model, while standard GLASSO finds a large fraction of false positive edges. [sent-156, score-0.266]
57 First consider: ˆ ˆ ln N vec(Z) 0N ·D , C ⊗ R = − 1 1 N ·D ˆ ˆ ˆ ˆ ln(2π) − ln C ⊗ R − vec(Z)T (C ⊗ R)−1 vec(Z). [sent-162, score-0.14]
58 2 2 2 Now, using the Kronecker identity (4) and ln |A ⊗ B| = rank(B) ln |A| + rank(A) ln |B| , we can rewrite the log likelihood as: ˆ ˆ ˆ ln N vec(Z) 0, C ⊗ R p(C−1 ) ·D 1 ˆ ˆ ˆˆ = − N2 ln(2π) − 1 D ln |R| + 2 N ln C−1 − 1 Tr(ZT R−1 ZC−1 ). [sent-163, score-0.463]
59 2 2 ˆ ˆ Thus we obtain a standard GLASSO problem with covariance matrix ZT R−1 Z: 1 1 ˆ ˆ ˆ ˆˆ ˆ ˆ argmax p(C−1 | Z, R) = argmax − Tr(ZT R−1 ZC−1 ) + N ln C−1 − λ C−1 2 2 ˆ ˆ C−1 C−1 0 . [sent-164, score-0.324]
60 (15) 1 The inverse sample covariance R−1 in Equation (15) rotates the data covariance, similar as in the established flip-flop algorithm for inference in matrix-variate normal distributions [7, 1]. [sent-165, score-0.286]
61 1 Simulation study First, we considered an artificial dataset to illustrate the effect of confounding factors on the solution quality of sparse inverse covariance estimation. [sent-168, score-0.513]
62 We generated the sparse inverse column covariance C−1 choosing edges at random with a sparsity level of 1%. [sent-171, score-0.273]
63 Non-zero entries of the inverse covariance were drawn from a Gaussian with mean 1 and variance 2. [sent-172, score-0.206]
64 The row covariance matrix R was created from K = 3 random factors xk , each drawn from unit variance iid Gaussian variables. [sent-173, score-0.321]
65 The weighting between the confounders and the iid component ρ2 was set such that the factors explained equal variance, which corresponds to moderate extent of confounding influences. [sent-174, score-0.55]
66 Standard GLASSO, not accounting for confounders, found more false positive edges for a wide range of recall rates. [sent-179, score-0.144]
67 We considered standard GLASSO and our Kronecker model that accounts for the confounding influence (Kronecker GLASSO). [sent-184, score-0.276]
68 For reference, we also considered an idealized setting, applying GLASSO to a similar dataset without the confounding effects (Ideal GLASSO), obtained by setting X = 0N ·K in the generative model. [sent-185, score-0.281]
69 To determine an appropriate latent dimensionality of Kronecker GLASSO, we used the BIC criterion on multiple restarts with K = 1 to K = 5 latent factors. [sent-186, score-0.116]
70 While Kronecker GLASSO reconstructed the same network as the ideal model, standard GLASSO found an excess of false positive edges. [sent-194, score-0.142]
71 2 Network reconstruction of protein-signaling networks Important practical applications of the GLASSO include the reconstruction of gene and protein networks. [sent-196, score-0.203]
72 We combined measurements from the first 3 experiments, yielding a heterogeneous mix of 2,666 samples that are not expected to be an iid sample set. [sent-200, score-0.167]
73 We used the directed ground truth network and moralized the graph structure to obtain an undirected ground truth network. [sent-202, score-0.202]
74 Analogous to the simulation setting, the Kronecker GLASSO model found true network links with greater accuracy than standard graphical lasso. [sent-205, score-0.098]
75 This results suggest that our model is suitable to account for confounding variation as it occurs in real settings. [sent-206, score-0.268]
76 3 Large-scale application to yeast gene expression data Next, we considered an application to large-scale gene expression profiling data from yeast. [sent-208, score-0.255]
77 [19], consisting of 109 genetically diverse yeast strains, each of which has been expression profiled in two environmental conditions (glucose and ethanol). [sent-210, score-0.105]
78 Because 7 r^2 correlation with true confounder 1. [sent-211, score-0.099]
79 4 1 10 102 GPLVM Kronecker GLasso 103 Number of features (genes) (a) Confounder reconstruction (b) GLASSO consistency (68%) (c) Kron. [sent-218, score-0.077]
80 GLASSO consistency (74%) Figure 3: (a) Correlation coefficient between learned confounding factor and true environmental condition for different subsets of all features (genes). [sent-219, score-0.33]
81 Compared are the standard GPLVM model with a linear covariance and our proposed model that accounts for low rank confounders and sparse gene-gene relationships (Kronecker GLASSO). [sent-220, score-0.37]
82 Kronecker GLASSO is able to better recover the hidden confounder by accounting for the covariance structure between genes. [sent-221, score-0.348]
83 (b,c) Consistency of edges on the largest network with 1,000 nodes learnt on the joint dataset, comparing the results when combining both conditions with those for a single condition (glucose). [sent-222, score-0.094]
84 the confounder in this dataset is known explicitly, we tested the ability of Kronecker GLASSO to recover it from observational data. [sent-223, score-0.18]
85 Because of missing complete ground truth information, we could not evaluate the network reconstruction quality directly. [sent-224, score-0.162]
86 To simplify the comparison to the known confounding factor, we chose a fixed number of confounders that we set to K = 1. [sent-226, score-0.407]
87 Recovery of the known confounder Figure 3a shows the r2 correlation coefficient between the inferred factor and the true environmental condition for increasing number of features (genes) that were used for learning. [sent-227, score-0.161]
88 In particular for small numbers of genes, accounting for the network structure between genes improved the ability to recover the true confounding effect. [sent-228, score-0.463]
89 Consistency of obtained networks Next, we tested the consistency when applying GLASSO and Kronecker GLASSO to data that combines both conditions, glucose and ethanol, comparing to the recovered network from a single condition alone (glucose). [sent-229, score-0.161]
90 5 Conclusions and Discussion We have shown an efficient scheme for parameter learning in matrix-variate normal distributions with iid observation noise. [sent-232, score-0.185]
91 As an application of our framework, we have proposed a method that accounts for confounding influences while estimating a sparse inverse covariance structure. [sent-235, score-0.513]
92 Our approach extends the Graphical Lasso, generalizing the rigid assumption of iid samples to more general sample covariances. [sent-236, score-0.136]
93 For this purpose, we employ a Kronecker product covariance structure and learn a low-rank covariance between samples, thereby accounting for potential confounding influences. [sent-237, score-0.635]
94 We provided synthetic and real world examples where our method is of practical use, reducing the number of false positive edges learned. [sent-238, score-0.079]
95 Invariant gaussian process latent variable models and o application in causal discovery. [sent-290, score-0.092]
96 Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. [sent-296, score-0.136]
97 Capturing heterogeneity in gene expression studies by surrogate variable analysis. [sent-309, score-0.111]
98 A bayesian framework to account for complex non-genetic factors in gene expression levels greatly increases power in eqtl studies. [sent-316, score-0.161]
99 Gene regulatory e networks from multifactorial perturbations using graphical lasso: Application to the dream4 challenge. [sent-339, score-0.091]
100 Probabilistic non-linear principal component analysis with gaussian process latent variable models. [sent-343, score-0.092]
wordName wordTfidf (topN-words)
[('glasso', 0.697), ('vec', 0.305), ('kronecker', 0.304), ('confounding', 0.247), ('confounders', 0.16), ('covariance', 0.15), ('iid', 0.114), ('confounder', 0.099), ('sr', 0.08), ('gene', 0.077), ('covariances', 0.074), ('ln', 0.07), ('gplvm', 0.067), ('sc', 0.067), ('geostatistics', 0.066), ('yuc', 0.066), ('accounting', 0.065), ('ut', 0.061), ('genes', 0.059), ('latent', 0.058), ('network', 0.058), ('inverse', 0.056), ('glucose', 0.053), ('pjnk', 0.05), ('pka', 0.05), ('pkc', 0.05), ('plcg', 0.05), ('pmek', 0.05), ('praf', 0.05), ('observational', 0.047), ('xxt', 0.044), ('sachs', 0.044), ('false', 0.043), ('likelihood', 0.043), ('normal', 0.043), ('genetics', 0.042), ('ideal', 0.041), ('lasso', 0.041), ('graphical', 0.04), ('environmental', 0.038), ('institutes', 0.038), ('argmax', 0.038), ('hyperparameters', 0.037), ('inference', 0.037), ('truth', 0.037), ('edges', 0.036), ('ground', 0.035), ('expression', 0.034), ('idealized', 0.034), ('recover', 0.034), ('gaussian', 0.034), ('plos', 0.033), ('kron', 0.033), ('mvn', 0.033), ('rur', 0.033), ('stegle', 0.033), ('yrc', 0.033), ('signaling', 0.033), ('bingen', 0.033), ('yeast', 0.033), ('protein', 0.033), ('op', 0.033), ('reconstruction', 0.032), ('zt', 0.032), ('na', 0.031), ('sparse', 0.031), ('heterogeneous', 0.031), ('runtime', 0.03), ('noise', 0.029), ('accounts', 0.029), ('networks', 0.029), ('factors', 0.029), ('ethanol', 0.029), ('covariation', 0.029), ('alm', 0.029), ('shef', 0.029), ('stability', 0.028), ('multivariate', 0.028), ('observation', 0.028), ('factorizes', 0.028), ('matrix', 0.028), ('planck', 0.027), ('rk', 0.026), ('penalty', 0.025), ('rows', 0.025), ('zc', 0.025), ('plus', 0.024), ('curve', 0.024), ('bonilla', 0.024), ('features', 0.024), ('germany', 0.023), ('tricks', 0.023), ('product', 0.023), ('rn', 0.022), ('samples', 0.022), ('regulatory', 0.022), ('consistency', 0.021), ('account', 0.021), ('biological', 0.021), ('netherlands', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
Author: Oliver Stegle, Christoph Lippert, Joris M. Mooij, Neil D. Lawrence, Karsten M. Borgwardt
Abstract: Inference in matrix-variate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders. 1
2 0.18113627 262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
Author: Cho-jui Hsieh, Inderjit S. Dhillon, Pradeep K. Ravikumar, Mátyás A. Sustik
Abstract: The 1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program. In contrast to other state-of-the-art methods that largely use first order gradient information, our algorithm is based on Newton’s method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and also present experimental results using synthetic and real application data that demonstrate the considerable improvements in performance of our method when compared to other state-of-the-art methods.
3 0.14428516 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
4 0.12408377 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion
Author: Nicolas Boumal, Pierre-antoine Absil
Abstract: We consider large matrices of low rank. We address the problem of recovering such matrices when most of the entries are unknown. Matrix completion finds applications in recommender systems. In this setting, the rows of the matrix may correspond to items and the columns may correspond to users. The known entries are the ratings given by users to some items. The aim is to predict the unobserved ratings. This problem is commonly stated in a constrained optimization framework. We follow an approach that exploits the geometry of the low-rank constraint to recast the problem as an unconstrained optimization problem on the Grassmann manifold. We then apply first- and second-order Riemannian trust-region methods to solve it. The cost of each iteration is linear in the number of known entries. Our methods, RTRMC 1 and 2, outperform state-of-the-art algorithms on a wide range of problem instances. 1
5 0.07751511 301 nips-2011-Variational Gaussian Process Dynamical Systems
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
6 0.076402783 102 nips-2011-Generalised Coupled Tensor Factorisation
7 0.071240738 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
8 0.07004384 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
9 0.067679822 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
10 0.062072635 259 nips-2011-Sparse Estimation with Structured Dictionaries
11 0.060097378 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
12 0.05870178 217 nips-2011-Practical Variational Inference for Neural Networks
13 0.056076206 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
14 0.051790636 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
15 0.051030692 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection
16 0.050739072 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
17 0.049209692 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data
18 0.048928052 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
19 0.046921559 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
20 0.046039682 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
topicId topicWeight
[(0, 0.148), (1, 0.024), (2, 0.016), (3, -0.104), (4, -0.07), (5, -0.023), (6, 0.051), (7, 0.022), (8, 0.105), (9, 0.096), (10, 0.024), (11, -0.112), (12, -0.014), (13, -0.009), (14, 0.001), (15, -0.009), (16, -0.054), (17, 0.023), (18, 0.08), (19, -0.072), (20, 0.003), (21, 0.031), (22, 0.025), (23, 0.141), (24, -0.028), (25, -0.019), (26, 0.003), (27, -0.032), (28, 0.03), (29, -0.057), (30, -0.019), (31, -0.054), (32, 0.063), (33, -0.004), (34, -0.005), (35, 0.006), (36, -0.003), (37, 0.025), (38, -0.042), (39, -0.088), (40, -0.076), (41, 0.053), (42, 0.029), (43, 0.116), (44, 0.038), (45, -0.081), (46, -0.022), (47, -0.032), (48, 0.147), (49, -0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.90976596 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
Author: Oliver Stegle, Christoph Lippert, Joris M. Mooij, Neil D. Lawrence, Karsten M. Borgwardt
Abstract: Inference in matrix-variate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders. 1
2 0.684497 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
3 0.63863677 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
Author: Tzu-kuo Huang, Jeff G. Schneider
Abstract: Vector Auto-regressive models (VAR) are useful tools for analyzing time series data. In quite a few modern time series modelling tasks, the collection of reliable time series turns out to be a major challenge, either due to the slow progression of the dynamic process of interest, or inaccessibility of repetitive measurements of the same dynamic process over time. In those situations, however, we observe that it is often easier to collect a large amount of non-sequence samples, or snapshots of the dynamic process of interest. In this work, we assume a small amount of time series data are available, and propose methods to incorporate non-sequence data into penalized least-square estimation of VAR models. We consider non-sequence data as samples drawn from the stationary distribution of the underlying VAR model, and devise a novel penalization scheme based on the Lyapunov equation concerning the covariance of the stationary distribution. Experiments on synthetic and video data demonstrate the effectiveness of the proposed methods. 1
4 0.59187365 262 nips-2011-Sparse Inverse Covariance Matrix Estimation Using Quadratic Approximation
Author: Cho-jui Hsieh, Inderjit S. Dhillon, Pradeep K. Ravikumar, Mátyás A. Sustik
Abstract: The 1 regularized Gaussian maximum likelihood estimator has been shown to have strong statistical guarantees in recovering a sparse inverse covariance matrix, or alternatively the underlying graph structure of a Gaussian Markov Random Field, from very limited samples. We propose a novel algorithm for solving the resulting optimization problem which is a regularized log-determinant program. In contrast to other state-of-the-art methods that largely use first order gradient information, our algorithm is based on Newton’s method and employs a quadratic approximation, but with some modifications that leverage the structure of the sparse Gaussian MLE problem. We show that our method is superlinearly convergent, and also present experimental results using synthetic and real application data that demonstrate the considerable improvements in performance of our method when compared to other state-of-the-art methods.
5 0.57292956 301 nips-2011-Variational Gaussian Process Dynamical Systems
Author: Neil D. Lawrence, Michalis K. Titsias, Andreas Damianou
Abstract: High dimensional time series are endemic in applications of machine learning such as robotics (sensor data), computational biology (gene expression data), vision (video sequences) and graphics (motion capture data). Practical nonlinear probabilistic approaches to this data are required. In this paper we introduce the variational Gaussian process dynamical system. Our work builds on recent variational approximations for Gaussian process latent variable models to allow for nonlinear dimensionality reduction simultaneously with learning a dynamical prior in the latent space. The approach also allows for the appropriate dimensionality of the latent space to be automatically determined. We demonstrate the model on a human motion capture data set and a series of high resolution video sequences. 1
6 0.55042738 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
7 0.52728009 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion
8 0.49654979 51 nips-2011-Clustered Multi-Task Learning Via Alternating Structure Optimization
9 0.4953014 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection
10 0.4707526 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
11 0.46498108 269 nips-2011-Spike and Slab Variational Inference for Multi-Task and Multiple Kernel Learning
12 0.46102059 148 nips-2011-Learning Probabilistic Non-Linear Latent Variable Models for Tracking Complex Activities
13 0.4543936 102 nips-2011-Generalised Coupled Tensor Factorisation
14 0.44930726 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation
15 0.44752184 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
16 0.4455193 104 nips-2011-Generalized Beta Mixtures of Gaussians
17 0.43901852 84 nips-2011-EigenNet: A Bayesian hybrid of generative and conditional models for sparse learning
18 0.43853763 4 nips-2011-A Convergence Analysis of Log-Linear Training
19 0.43642485 68 nips-2011-Demixed Principal Component Analysis
20 0.43172982 240 nips-2011-Robust Multi-Class Gaussian Process Classification
topicId topicWeight
[(0, 0.018), (4, 0.038), (20, 0.036), (26, 0.025), (31, 0.076), (33, 0.011), (43, 0.082), (45, 0.077), (46, 0.271), (57, 0.045), (65, 0.021), (66, 0.015), (74, 0.071), (83, 0.06), (84, 0.018), (99, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.75923175 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
Author: Oliver Stegle, Christoph Lippert, Joris M. Mooij, Neil D. Lawrence, Karsten M. Borgwardt
Abstract: Inference in matrix-variate Gaussian models has major applications for multioutput prediction and joint learning of row and column covariances from matrixvariate data. Here, we discuss an approach for efficient inference in such models that explicitly account for iid observation noise. Computational tractability can be retained by exploiting the Kronecker product between row and column covariance matrices. Using this framework, we show how to generalize the Graphical Lasso in order to learn a sparse inverse covariance between features while accounting for a low-rank confounding covariance between samples. We show practical utility on applications to biology, where we model covariances with more than 100,000 dimensions. We find greater accuracy in recovering biological network structures and are able to better reconstruct the confounders. 1
2 0.54031008 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations
Author: Kamiar R. Rad, Liam Paninski
Abstract: Many fundamental questions in theoretical neuroscience involve optimal decoding and the computation of Shannon information rates in populations of spiking neurons. In this paper, we apply methods from the asymptotic theory of statistical inference to obtain a clearer analytical understanding of these quantities. We find that for large neural populations carrying a finite total amount of information, the full spiking population response is asymptotically as informative as a single observation from a Gaussian process whose mean and covariance can be characterized explicitly in terms of network and single neuron properties. The Gaussian form of this asymptotic sufficient statistic allows us in certain cases to perform optimal Bayesian decoding by simple linear transformations, and to obtain closed-form expressions of the Shannon information carried by the network. One technical advantage of the theory is that it may be applied easily even to non-Poisson point process network models; for example, we find that under some conditions, neural populations with strong history-dependent (non-Poisson) effects carry exactly the same information as do simpler equivalent populations of non-interacting Poisson neurons with matched firing rates. We argue that our findings help to clarify some results from the recent literature on neural decoding and neuroprosthetic design.
3 0.53655177 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
Author: Jun-ichiro Hirayama, Aapo Hyvärinen
Abstract: Components estimated by independent component analysis and related methods are typically not independent in real data. A very common form of nonlinear dependency between the components is correlations in their variances or energies. Here, we propose a principled probabilistic model to model the energycorrelations between the latent variables. Our two-stage model includes a linear mixing of latent signals into the observed ones like in ICA. The main new feature is a model of the energy-correlations based on the structural equation model (SEM), in particular, a Linear Non-Gaussian SEM. The SEM is closely related to divisive normalization which effectively reduces energy correlation. Our new twostage model enables estimation of both the linear mixing and the interactions related to energy-correlations, without resorting to approximations of the likelihood function or other non-principled approaches. We demonstrate the applicability of our method with synthetic dataset, natural images and brain signals. 1
4 0.5355171 258 nips-2011-Sparse Bayesian Multi-Task Learning
Author: Shengbo Guo, Onno Zoeter, Cédric Archambeau
Abstract: We propose a new sparse Bayesian model for multi-task regression and classification. The model is able to capture correlations between tasks, or more specifically a low-rank approximation of the covariance matrix, while being sparse in the features. We introduce a general family of group sparsity inducing priors based on matrix-variate Gaussian scale mixtures. We show the amount of sparsity can be learnt from the data by combining an approximate inference approach with type II maximum likelihood estimation of the hyperparameters. Empirical evaluations on data sets from biology and vision demonstrate the applicability of the model, where on both regression and classification tasks it achieves competitive predictive performance compared to previously proposed methods. 1
5 0.53491908 183 nips-2011-Neural Reconstruction with Approximate Message Passing (NeuRAMP)
Author: Alyson K. Fletcher, Sundeep Rangan, Lav R. Varshney, Aniruddha Bhargava
Abstract: Many functional descriptions of spiking neurons assume a cascade structure where inputs are passed through an initial linear filtering stage that produces a lowdimensional signal that drives subsequent nonlinear stages. This paper presents a novel and systematic parameter estimation procedure for such models and applies the method to two neural estimation problems: (i) compressed-sensing based neural mapping from multi-neuron excitation, and (ii) estimation of neural receptive fields in sensory neurons. The proposed estimation algorithm models the neurons via a graphical model and then estimates the parameters in the model using a recently-developed generalized approximate message passing (GAMP) method. The GAMP method is based on Gaussian approximations of loopy belief propagation. In the neural connectivity problem, the GAMP-based method is shown to be computational efficient, provides a more exact modeling of the sparsity, can incorporate nonlinearities in the output and significantly outperforms previous compressed-sensing methods. For the receptive field estimation, the GAMP method can also exploit inherent structured sparsity in the linear weights. The method is validated on estimation of linear nonlinear Poisson (LNP) cascade models for receptive fields of salamander retinal ganglion cells. 1
6 0.53215456 276 nips-2011-Structured sparse coding via lateral inhibition
7 0.52852625 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
8 0.52655447 301 nips-2011-Variational Gaussian Process Dynamical Systems
9 0.52379698 133 nips-2011-Inferring spike-timing-dependent plasticity from spike train data
10 0.52280724 144 nips-2011-Learning Auto-regressive Models from Sequence and Non-sequence Data
11 0.52271092 219 nips-2011-Predicting response time and error rates in visual search
12 0.52190149 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
13 0.52153271 102 nips-2011-Generalised Coupled Tensor Factorisation
14 0.51991814 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation
15 0.51987219 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons
16 0.5197162 68 nips-2011-Demixed Principal Component Analysis
17 0.51895744 75 nips-2011-Dynamical segmentation of single trials from population neural data
18 0.51847392 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries
19 0.51833659 186 nips-2011-Noise Thresholds for Spectral Clustering
20 0.51816213 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data