nips nips2009 nips2009-224 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jean Honorio, Dimitris Samaras, Nikos Paragios, Rita Goldstein, Luis E. Ortiz
Abstract: Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically under-sampled and high-dimensional, they often need to be represented with low-complexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through 1 -norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex real-world datasets and demonstrate that it captures useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through 1 -norm penalization. [sent-10, score-0.67]
2 Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. [sent-12, score-0.329]
3 Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets. [sent-14, score-0.232]
4 Accuracy of representation is measured by the likelihood that the model explains the observed data, while complexity of a graphical model is measured by its number of parameters. [sent-16, score-0.229]
5 In this paper, we propose local constancy as a prior for learning Gaussian graphical models, which is natural for spatial datasets such as those encountered in computer vision [1, 2, 3]. [sent-19, score-0.851]
6 For Gaussian graphical models, the number of parameters, the number of edges in the structure and the number of non-zero elements in the inverse covariance or precision matrix are equivalent 1 measures of complexity. [sent-20, score-0.639]
7 Therefore, several techniques focus on enforcing sparsity of the precision matrix. [sent-21, score-0.217]
8 Maximum likelihood estimation with an 1 -norm penalty for encouraging sparseness is proposed in [5, 6, 7]. [sent-23, score-0.312]
9 It has been shown theoretically and experimentally, that only the covariance selection [5] as well as graphical lasso [6] converge to the maximum likelihood estimator. [sent-25, score-0.547]
10 In datasets which are a collection of measurements for variables with some spatial arrangement, one can define a local neighborhood for each variable or manifold. [sent-26, score-0.479]
11 Similarly, one can define a four-pixel neighborhood for 2D images as well as six-pixel neighborhood for 3D images. [sent-29, score-0.22]
12 However, there is little research on spatial regularization for structure learning. [sent-30, score-0.241]
13 silhouettes) and that variables far apart are only weakly correlated [8], interaction between a priori known groups of variables as in [9], or block structures as in [10] in the context of Bayesian networks. [sent-33, score-0.227]
14 First, we propose local constancy, which encourages finding connectivities between two close or distant clusters of variables, instead of between isolated variables. [sent-35, score-0.293]
15 It does not heavily constrain the set of possible structures, since it only imposes restrictions of spatial closeness for each cluster independently, but not between clusters. [sent-36, score-0.265]
16 We impose an 1 -norm penalty for differences of spatially neighboring variables, which allows obtaining locally constant models that preserve sparseness, unlike 2 -norm penalties. [sent-37, score-0.444]
17 Positive definiteness of the estimated precision matrix is also guaranteed, since this is a necessary condition for the definition of a multivariate normal distribution. [sent-39, score-0.284]
18 Second, since optimization methods for structure learning on Gaussian graphical models [5, 6, 4, 7] are unable to handle local constancy constraints, we propose an efficient algorithm by maximizing with respect to one row and column of the precision matrix at a time. [sent-40, score-0.953]
19 We initially test the ability of our method to recover the ground truth structure from data, of a complex synthetic model which includes locally and not locally constant interactions as well as independent variables. [sent-42, score-0.779]
20 We demonstrate the ability of our method to discover useful structures from datasets with a diverse nature of probabilistic relationships and spatial neighborhoods: manually labeled silhouettes in a walking sequence, cardiac magnetic resonance images (MRI) and functional brain MRI. [sent-45, score-0.988]
21 Section 2 introduces Gaussian graphical models as well as techniques for learning such structures from data. [sent-46, score-0.252]
22 Section 3 presents our sparse and locally constant Gaussian graphical models. [sent-47, score-0.424]
23 For convenience, we define two new operators: the zero structure operator and the diagonal excluded product. [sent-52, score-0.228]
24 A Gaussian graphical model [11] is a graph in which all random variables are continuous and jointly Gaussian. [sent-53, score-0.247]
25 This model corresponds to the multivariate normal distribution for N variables x ∈ RN with mean vector µ ∈ RN and a covariance matrix Σ ∈ RN ×N , or equivalently x ∼ N (µ, Σ) where Σ 0. [sent-54, score-0.264]
26 Conditional independence in a Gaussian graphical model is simply reflected in the zero entries of the precision matrix Ω = Σ−1 [11]. [sent-55, score-0.568]
27 The precision matrix representation is preferred because it allows detecting cases in which two seemingly correlated variables, actually depend on a third confounding variable. [sent-57, score-0.284]
28 mn amn bmn Hadamard or entrywise product of A, B ∈ RM ×N , i. [sent-71, score-0.248]
29 (A ◦ B)mn = amn bmn zero structure operator of A ∈ RM ×N , by using the Iverson bracket jmn (A) = [amn = 0] diagonal excluded product of A ∈ RM ×N and B ∈ RN ×N , i. [sent-73, score-0.379]
30 It has the property that no diagonal entry of B is used in A B A ∈ RN ×N is symmetric and positive definite matrix with diagonal elements of A ∈ RN ×N only vector containing all elements of A ∈ RM ×N Table 1: Notation used in this paper. [sent-76, score-0.204]
31 The concept of robust estimation by performing covariance selection was first introduced in [12] where the number of parameters to be estimated is reduced by setting some elements of the precision matrix Ω to zero. [sent-77, score-0.527]
32 Since finding the most sparse precision matrix which fits a dataset is a NP-hard problem [5], in order to overcome it, several 1 -regularization methods have been proposed for learning Gaussian graphical models from data. [sent-78, score-0.519]
33 Covariance selection computes small perturbations on the sample covariance matrix such that it generates a sparse precision matrix, which results in a box-constrained quadratic programming. [sent-80, score-0.545]
34 The Meinshausen-B¨ hlmann approximation [4] obtains the conditional dependencies by performing u a sparse linear regression for each variable, by using lasso regression [13]. [sent-82, score-0.304]
35 The graphical lasso technique [6] solves the dual form of eq. [sent-87, score-0.333]
36 There is little work on spatial regularization for structure learning. [sent-93, score-0.241]
37 Adaptive banding on the Cholesky factors of the precision matrix has been proposed in [8]. [sent-94, score-0.284]
38 Instead of using the traditional lasso penalty, a nested lasso penalty is enforced. [sent-95, score-0.306]
39 Grouping of entries in the precision matrix into disjoint subsets has been proposed in [9]. [sent-98, score-0.395]
40 Although such a formulation allows for more general settings, its main disadvantage is the need for an a priori segmentation of the entries in the precision matrix. [sent-100, score-0.328]
41 In [10] it is assumed that variables belong to unknown classes and probabilities of having edges among different classes were enforced to account for structure regularity, thus producing block structures only. [sent-102, score-0.212]
42 3 Sparse and Locally Constant Gaussian Graphical Models First, we describe our local constancy assumption and its use to model the spatial coherence of dependence/independence relationships. [sent-103, score-0.568]
43 Local constancy is defined as follows: if variable xn1 is dependent (or independent) of variable xn2 , then a spatial neighbor xn1 of xn1 is more likely to be dependent (or independent) of xn2 . [sent-104, score-0.514]
44 This encourages finding connectivities between two close or distant clusters of variables, instead of between isolated variables. [sent-105, score-0.239]
45 Note that local constancy imposes restrictions of spatial closeness for each cluster independently, but not between clusters. [sent-106, score-0.702]
46 In this paper, we impose constraints on the difference of entries in the precision matrix Ω ∈ RN ×N for N variables, which correspond to spatially neighboring variables. [sent-107, score-0.529]
47 Let Σ ∈ RN ×N be the dense sample covariance matrix and D ∈ RM ×N be the discrete derivative operator on the manifold, where M ∈ O(N ) is the number of spatial neighborhood relationships. [sent-108, score-0.532]
48 The following penalized maximum likelihood estimation is proposed: / max log det Ω − Σ, Ω − ρ Ω Ω 0 1 −τ D Ω 1 (2) for some ρ, τ > 0. [sent-111, score-0.211]
49 The third term ρ Ω 1 encourages sparseness while the fourth term τ D Ω 1 encourages local constancy in the precision matrix by penalizing the differences of spatially neighboring variables. [sent-113, score-1.27]
50 As discussed further in [19], 1 -norm penalties lead to locally constant models which preserve sparseness, where as 2 -norm penalties of differences fail to do so. [sent-115, score-0.242]
51 The use of the diagonal excluded product for penalizing differences instead of the regular product of matrices, is crucial. [sent-116, score-0.212]
52 The regular product of matrices would penalize the difference between the diagonal and off-diagonal entries of the precision matrix, and potentially destroy positive definiteness of the solution for strongly regularized models. [sent-117, score-0.418]
53 (2) does not affect the positive definiteness properties of the estimated precision matrix or the optimization algorithm, in the following Section 4, we discuss positive definiteness properties and develop an optimization algorithm for the specific case of the discrete derivative operator D. [sent-119, score-0.471]
54 4 Coordinate-Direction Descent Algorithm Positive definiteness of the precision matrix is a necessary condition for the definition of a multivariate normal distribution. [sent-120, score-0.284]
55 Maximization can be performed with respect to one row and column of the precision matrix Ω at a time. [sent-124, score-0.284]
56 4 In term of the variables y, z and the constant matrix W, the penalized maximum likelihood estimation problem in eq. [sent-128, score-0.338]
57 It can be shown that the precision matrix Ω is positive definite since its Schur complement z − yT W−1 y is positive. [sent-130, score-0.327]
58 Although coordinate descent methods [5, 6] are suitable when only sparseness is enforced, they are not when local constancy is encouraged. [sent-139, score-0.63]
59 For a discrete derivative operator D used in the penalized maximum likelihood estimation problem in eq. [sent-141, score-0.252]
60 , 0)T or two spatially neighboring variables g = (0, . [sent-148, score-0.208]
61 We sort and remove duplicate values in s, and propagate changes to r by adding the entries corresponding to the duplicate values in s. [sent-161, score-0.199]
62 Given a dense sample covariance matrix Σ, sparseness parameter ρ, local constancy parameter τ and a discrete derivative operator D, find the precision matrix Ω 0 that maximizes: log det Ω − Σ, Ω − ρ Ω 1 −τ D Ω 1 −1 2. [sent-172, score-1.216]
63 (3) (b) Update W−1 by using the Sherman-Woodbury-Morrison formula (Note that when iterating from one variable to the next one, only one row and column change on matrix W) (c) Transform local constancy regularization term from D into A and b as described in eq. [sent-181, score-0.555]
64 (5) (d) Compute W−1 y and Ay (e) For each direction g involving either one variable or two spatially neighboring variables i. [sent-182, score-0.208]
65 Update Ay ← Ay + tAg 1 (f) Update z ← v+ρ + yT W−1 y Table 2: Coordinate-direction descent algorithm for learning sparse and locally constant Gaussian graphical models. [sent-187, score-0.473]
66 Positive interactions are shown in blue, negative interactions in red. [sent-193, score-0.248]
67 The polynomial dependency on the number of variables of O(N 3 ) is expected since we cannot produce an algorithm faster than computing the inverse of the sample covariance in the case of an infinite sample. [sent-197, score-0.197]
68 Given a P -dimensional spatial neighborhood or manifold (e. [sent-199, score-0.301]
69 P = 1 for silhouettes, P = 2 for a four-pixel neighborhood on 2D images, P = 3 for a six-pixel neighborhood on 3D images), the objective function in eq. [sent-201, score-0.22]
70 Since this condition does not depend on specific entries in the iterative estimation of the precision matrix, this property can be used to reduce the size of the problem in advance by removing such variables. [sent-203, score-0.372]
71 We begin with a small synthetic example to test the ability of the method for recovering the ground truth structure from data, in a complex scenario in which our method has to deal with both locally and not locally constant interactions as well as independent variables. [sent-205, score-0.779]
72 The ground truth Gaussian graphical model is shown in Figure 1 and it contains 9 variables arranged in an open contour manifold. [sent-206, score-0.468]
73 4 2 M Fu M ll B− o B− r an C d ov S G el La SL sso C G G M r M B− an d C ov Se l G La ss o SL C G G M B− o ll B− or B− an d C ov Se l G La SL sso C G G M M M 3 0 Fu B− or B− an d C ov Se l G La SL sso C G G M M M Fu 0. [sent-212, score-0.63]
74 8 6 In d 3 Recall Precision 1 ep 4 ll Kullback−Leibler divergence 5 Figure 2: Kullback-Leibler divergence with respect to the best method, average precision, recall and Frobenius norm between the recovered model and the ground truth. [sent-216, score-0.212]
75 Our method (SLCGGM) outperforms the fully connected model (Full), Meinshausen-B¨ hlmann approximation (MB-or, MB-and), covariance selection (CovSel), u graphical lasso (GLasso) for small datasets (in blue solid line) and for large datasets (in red dashed line). [sent-217, score-0.882]
76 state-of-the-art structure learning techniques: covariance selection [5] and graphical lasso [6], since it has been shown theoretically and experimentally that they both converge to the maximum likelihood estimator. [sent-221, score-0.606]
77 Two different scenarios are tested: small datasets of four samples, and large datasets of 400 samples. [sent-224, score-0.22]
78 Under each scenario, 50 datasets are randomly generated from the ground truth Gaussian graphical model. [sent-225, score-0.504]
79 This is due to the fact that the ground truth data contains locally constant interactions, and our method imposes a prior for local constancy. [sent-227, score-0.524]
80 Although this is a complex scenario which also contains not locally constant interactions as well as an independent variable, our method can recover a more plausible model when compared to other methods. [sent-228, score-0.313]
81 A visual comparison of the ground truth versus the best recovered model by our method from small and large datasets is shown in Figure 1. [sent-230, score-0.418]
82 The image shows the precision matrix in which red squares represent negative entries, while blue squares represent positive entries. [sent-231, score-0.375]
83 There is very little difference between the ground truth and the recovered model from large datasets. [sent-232, score-0.308]
84 Although the model is not fully recovered from small datasets, our technique performs better than the MeinshausenB¨ hlmann approximation, covariance selection and graphical lasso in Figure 2. [sent-233, score-0.742]
85 Each dataset is also diverse in the type of spatial neighborhood: one-dimensional for silhouettes in a walking sequence, two-dimensional for cardiac MRI and three-dimensional for functional brain MRI. [sent-237, score-0.753]
86 This is strong evidence that datasets that are measured over a spatial manifold are locally constant, as well as that our method is a good regularization technique that avoids over-fitting and allows for better generalization. [sent-243, score-0.536]
87 Another interesting fact is that for the brain MRI dataset, which is high dimensional and contains a small number of samples, the model that assumes full independence performed better than the Meinshausen-B¨ hlmann approximation, u covariance selection and graphical lasso. [sent-244, score-0.562]
88 Our method (SLCGGM) outperforms the Meinshausen-B¨ hlmann approximation (MB-and, MB-or), covariance selection (CovSel), graphical lasso u (GLasso) and the fully independent model (Indep). [sent-281, score-0.614]
89 6 Conclusions and Future Work In this paper, we proposed local constancy for Gaussian graphical models, which encourages finding probabilistic connectivities between two close or distant clusters of variables, instead of between isolated variables. [sent-283, score-0.849]
90 We introduced an 1 -norm penalty for local constancy into a strictly convex maximum likelihood estimation. [sent-284, score-0.605]
91 Furthermore, we proposed an efficient optimization algorithm and proved that our method guarantees positive definiteness of the estimated precision matrix. [sent-285, score-0.26]
92 We tested the ability of our method to recover the ground truth structure from data, in a complex scenario with locally and not locally constant interactions as well as independent variables. [sent-286, score-0.736]
93 We also tested the generalization performance of our method in a wide range of complex real-world datasets with a diverse nature of probabilistic relationships as well as neighborhood type. [sent-287, score-0.263]
94 Methods for selecting regularization parameters for sparseness and local constancy need to be further investigated. [sent-289, score-0.632]
95 Although the positive definiteness properties of the precision matrix as well as the optimization algorithm still hold when including operators such as the Laplacian for encouraging smoothness, benefits of such a regularization approach need to be analyzed. [sent-290, score-0.378]
96 Convex optimization techniques for fitting sparse Gaussian graphical models. [sent-322, score-0.235]
97 Model selection and estimation in the Gaussian graphical model. [sent-333, score-0.293]
98 Sparse estimation of large covariance matrices via a nested lasso penalty. [sent-339, score-0.286]
99 Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. [sent-370, score-0.238]
100 High dimensional graphical model selection using regularized logistic regression. [sent-397, score-0.249]
wordName wordTfidf (topN-words)
[('constancy', 0.383), ('precision', 0.217), ('silhouettes', 0.191), ('graphical', 0.173), ('mri', 0.165), ('cardiac', 0.154), ('sparseness', 0.144), ('locally', 0.143), ('spatial', 0.131), ('ground', 0.125), ('interactions', 0.124), ('walking', 0.123), ('hlmann', 0.123), ('covariance', 0.123), ('ay', 0.121), ('lasso', 0.119), ('entries', 0.111), ('neighborhood', 0.11), ('datasets', 0.11), ('rm', 0.108), ('niteness', 0.106), ('rn', 0.106), ('truth', 0.096), ('ov', 0.096), ('amn', 0.096), ('recovered', 0.087), ('encourages', 0.084), ('sl', 0.082), ('slcggm', 0.082), ('sso', 0.082), ('yt', 0.079), ('structures', 0.079), ('selection', 0.076), ('variables', 0.074), ('closeness', 0.074), ('glasso', 0.072), ('indep', 0.072), ('penalty', 0.068), ('spatially', 0.068), ('brain', 0.067), ('matrix', 0.067), ('covsel', 0.066), ('connectivities', 0.066), ('displacements', 0.066), ('neighboring', 0.066), ('learnt', 0.064), ('excluded', 0.062), ('sm', 0.062), ('sparse', 0.062), ('imposes', 0.06), ('det', 0.06), ('manifold', 0.06), ('operator', 0.06), ('structure', 0.059), ('likelihood', 0.056), ('ag', 0.056), ('addicted', 0.055), ('bmn', 0.055), ('brook', 0.055), ('cocaine', 0.055), ('entrywise', 0.055), ('stony', 0.055), ('local', 0.054), ('la', 0.054), ('differences', 0.053), ('fu', 0.053), ('penalized', 0.051), ('diag', 0.051), ('regularization', 0.051), ('gaussian', 0.05), ('vec', 0.05), ('penalizing', 0.05), ('descent', 0.049), ('distant', 0.048), ('monetary', 0.048), ('goldstein', 0.048), ('blue', 0.048), ('diagonal', 0.047), ('discover', 0.046), ('constant', 0.046), ('closed', 0.045), ('estimation', 0.044), ('functional', 0.044), ('falsely', 0.044), ('schur', 0.044), ('tg', 0.044), ('duplicate', 0.044), ('strictly', 0.044), ('synthetic', 0.043), ('diverse', 0.043), ('positive', 0.043), ('se', 0.042), ('mn', 0.042), ('heart', 0.041), ('recovers', 0.041), ('isolated', 0.041), ('cn', 0.041), ('derivative', 0.041), ('shrinking', 0.041), ('technique', 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
Author: Jean Honorio, Dimitris Samaras, Nikos Paragios, Rita Goldstein, Luis E. Ortiz
Abstract: Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically under-sampled and high-dimensional, they often need to be represented with low-complexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through 1 -norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex real-world datasets and demonstrate that it captures useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets. 1
2 0.12710157 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
Author: Sahand Negahban, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar
Abstract: High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M -estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. The goal of this paper is to provide a unified framework for establishing consistency and convergence rates for such regularized M estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M -estimators have fast convergence rates. 1
3 0.12393834 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes
Author: Mladen Kolar, Le Song, Eric P. Xing
Abstract: To estimate the changing structure of a varying-coefficient varying-structure (VCVS) model remains an important and open problem in dynamic system modelling, which includes learning trajectories of stock prices, or uncovering the topology of an evolving gene network. In this paper, we investigate sparsistent learning of a sub-family of this model — piecewise constant VCVS models. We analyze two main issues in this problem: inferring time points where structural changes occur and estimating model structure (i.e., model selection) on each of the constant segments. We propose a two-stage adaptive procedure, which first identifies jump points of structural changes and then identifies relevant covariates to a response on each of the segments. We provide an asymptotic analysis of the procedure, showing that with the increasing sample size, number of structural changes, and number of variables, the true model can be consistently selected. We demonstrate the performance of the method on synthetic data and apply it to the brain computer interface dataset. We also consider how this applies to structure estimation of time-varying probabilistic graphical models. 1
4 0.11723342 164 nips-2009-No evidence for active sparsification in the visual cortex
Author: Pietro Berkes, Ben White, Jozsef Fiser
Abstract: The proposal that cortical activity in the visual cortex is optimized for sparse neural activity is one of the most established ideas in computational neuroscience. However, direct experimental evidence for optimal sparse coding remains inconclusive, mostly due to the lack of reference values on which to judge the measured sparseness. Here we analyze neural responses to natural movies in the primary visual cortex of ferrets at different stages of development and of rats while awake and under different levels of anesthesia. In contrast with prediction from a sparse coding model, our data shows that population and lifetime sparseness decrease with visual experience, and increase from the awake to anesthetized state. These results suggest that the representation in the primary visual cortex is not actively optimized to maximize sparseness. 1
5 0.11189042 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
Author: Matthias Seeger
Abstract: We show how to sequentially optimize magnetic resonance imaging measurement designs over stacks of neighbouring image slices, by performing convex variational inference on a large scale non-Gaussian linear dynamical system, tracking dominating directions of posterior covariance without imposing any factorization constraints. Our approach can be scaled up to high-resolution images by reductions to numerical mathematics primitives and parallelization on several levels. In a first study, designs are found that improve significantly on others chosen independently for each slice or drawn at random. 1
6 0.11054885 147 nips-2009-Matrix Completion from Noisy Entries
8 0.096438527 47 nips-2009-Boosting with Spatial Regularization
9 0.095418237 137 nips-2009-Learning transport operators for image manifolds
10 0.090980068 246 nips-2009-Time-Varying Dynamic Bayesian Networks
11 0.090076581 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
12 0.088335186 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
13 0.086095616 157 nips-2009-Multi-Label Prediction via Compressed Sensing
14 0.085267954 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
15 0.081465811 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
16 0.08050701 23 nips-2009-Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models
17 0.080116652 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction
18 0.0797306 43 nips-2009-Bayesian estimation of orientation preference maps
19 0.078122593 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
20 0.077967256 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
topicId topicWeight
[(0, -0.276), (1, 0.025), (2, -0.017), (3, 0.082), (4, 0.001), (5, 0.04), (6, 0.148), (7, -0.134), (8, 0.034), (9, -0.023), (10, 0.004), (11, 0.014), (12, 0.038), (13, 0.042), (14, -0.06), (15, -0.015), (16, -0.038), (17, -0.057), (18, -0.076), (19, -0.04), (20, -0.061), (21, 0.053), (22, 0.025), (23, 0.005), (24, 0.085), (25, 0.043), (26, -0.001), (27, -0.007), (28, 0.044), (29, -0.037), (30, 0.043), (31, 0.02), (32, -0.066), (33, 0.068), (34, 0.003), (35, -0.066), (36, -0.073), (37, -0.06), (38, -0.063), (39, 0.049), (40, -0.024), (41, -0.025), (42, -0.026), (43, -0.026), (44, -0.032), (45, -0.01), (46, 0.005), (47, -0.053), (48, 0.015), (49, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.96103162 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
Author: Jean Honorio, Dimitris Samaras, Nikos Paragios, Rita Goldstein, Luis E. Ortiz
Abstract: Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically under-sampled and high-dimensional, they often need to be represented with low-complexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through 1 -norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex real-world datasets and demonstrate that it captures useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets. 1
2 0.63740206 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
Author: Sahand Negahban, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar
Abstract: High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M -estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. The goal of this paper is to provide a unified framework for establishing consistency and convergence rates for such regularized M estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M -estimators have fast convergence rates. 1
3 0.63438249 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
Author: David P. Wipf, Srikantan S. Nagarajan
Abstract: Finding maximally sparse representations from overcomplete feature dictionaries frequently involves minimizing a cost function composed of a likelihood (or data fit) term and a prior (or penalty function) that favors sparsity. While typically the prior is factorial, here we examine non-factorial alternatives that have a number of desirable properties relevant to sparse estimation and are easily implemented using an efficient and globally-convergent, reweighted 1 -norm minimization procedure. The first method under consideration arises from the sparse Bayesian learning (SBL) framework. Although based on a highly non-convex underlying cost function, in the context of canonical sparse estimation problems, we prove uniform superiority of this method over the Lasso in that, (i) it can never do worse, and (ii) for any dictionary and sparsity profile, there will always exist cases where it does better. These results challenge the prevailing reliance on strictly convex penalty functions for finding sparse solutions. We then derive a new non-factorial variant with similar properties that exhibits further performance improvements in some empirical tests. For both of these methods, as well as traditional factorial analogs, we demonstrate the effectiveness of reweighted 1 -norm algorithms in handling more general sparse estimation problems involving classification, group feature selection, and non-negativity constraints. As a byproduct of this development, a rigorous reformulation of sparse Bayesian classification (e.g., the relevance vector machine) is derived that, unlike the original, involves no approximation steps and descends a well-defined objective function. 1
4 0.63188523 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem
Author: Han Liu, Xi Chen
Abstract: This paper studies the forward greedy strategy in sparse nonparametric regression. For additive models, we propose an algorithm called additive forward regression; for general multivariate models, we propose an algorithm called generalized forward regression. Both algorithms simultaneously conduct estimation and variable selection in nonparametric settings for the high dimensional sparse learning problem. Our main emphasis is empirical: on both simulated and real data, these two simple greedy methods can clearly outperform several state-ofthe-art competitors, including LASSO, a nonparametric version of LASSO called the sparse additive model (SpAM) and a recently proposed adaptive parametric forward-backward algorithm called Foba. We also provide some theoretical justifications of specific versions of the additive forward regression. 1
Author: John Wright, Arvind Ganesh, Shankar Rao, Yigang Peng, Yi Ma
Abstract: Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized “robust principal component analysis” problem of recovering a low rank matrix A from corrupted observations D = A + E. Here, the corrupted entries E are unknown and the errors can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns by solving a simple convex program, for which we give a fast and provably convergent algorithm. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation space and the number of errors E grows in proportion to the total number of entries in the matrix. A by-product of our analysis is the first proportional growth results for the related problem of completing a low-rank matrix from a small fraction of its entries. Simulations and real-data examples corroborate the theoretical results, and suggest potential applications in computer vision. 1
6 0.61131078 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes
7 0.59772879 138 nips-2009-Learning with Compressible Priors
8 0.59297973 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
9 0.59261787 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
10 0.58864433 23 nips-2009-Accelerating Bayesian Structural Inference for Non-Decomposable Gaussian Graphical Models
11 0.58137935 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
12 0.57364172 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
13 0.5725286 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
14 0.56745881 185 nips-2009-Orthogonal Matching Pursuit From Noisy Random Measurements: A New Analysis
15 0.56004298 147 nips-2009-Matrix Completion from Noisy Entries
16 0.55833435 43 nips-2009-Bayesian estimation of orientation preference maps
17 0.55341148 164 nips-2009-No evidence for active sparsification in the visual cortex
18 0.5490765 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints
19 0.54264927 137 nips-2009-Learning transport operators for image manifolds
20 0.5378139 70 nips-2009-Discriminative Network Models of Schizophrenia
topicId topicWeight
[(24, 0.037), (25, 0.068), (35, 0.061), (36, 0.138), (39, 0.055), (42, 0.256), (58, 0.125), (61, 0.023), (71, 0.059), (81, 0.022), (86, 0.062), (91, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.82172817 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
Author: Jean Honorio, Dimitris Samaras, Nikos Paragios, Rita Goldstein, Luis E. Ortiz
Abstract: Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically under-sampled and high-dimensional, they often need to be represented with low-complexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through 1 -norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex real-world datasets and demonstrate that it captures useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets. 1
2 0.80221158 175 nips-2009-Occlusive Components Analysis
Author: Jörg Lücke, Richard Turner, Maneesh Sahani, Marc Henniges
Abstract: We study unsupervised learning in a probabilistic generative model for occlusion. The model uses two types of latent variables: one indicates which objects are present in the image, and the other how they are ordered in depth. This depth order then determines how the positions and appearances of the objects present, specified in the model parameters, combine to form the image. We show that the object parameters can be learnt from an unlabelled set of images in which objects occlude one another. Exact maximum-likelihood learning is intractable. However, we show that tractable approximations to Expectation Maximization (EM) can be found if the training images each contain only a small number of objects on average. In numerical experiments it is shown that these approximations recover the correct set of object parameters. Experiments on a novel version of the bars test using colored bars, and experiments on more realistic data, show that the algorithm performs well in extracting the generating causes. Experiments based on the standard bars benchmark test for object learning show that the algorithm performs well in comparison to other recent component extraction approaches. The model and the learning algorithm thus connect research on occlusion with the research field of multiple-causes component extraction methods. 1
3 0.69206357 151 nips-2009-Measuring Invariances in Deep Networks
Author: Ian Goodfellow, Honglak Lee, Quoc V. Le, Andrew Saxe, Andrew Y. Ng
Abstract: For many pattern recognition tasks, the ideal input feature would be invariant to multiple confounding properties (such as illumination and viewing angle, in computer vision applications). Recently, deep architectures trained in an unsupervised manner have been proposed as an automatic method for extracting useful features. However, it is difficult to evaluate the learned features by any means other than using them in a classifier. In this paper, we propose a number of empirical tests that directly measure the degree to which these learned features are invariant to different input transformations. We find that stacked autoencoders learn modestly increasingly invariant features with depth when trained on natural images. We find that convolutional deep belief networks learn substantially more invariant features in each layer. These results further justify the use of “deep” vs. “shallower” representations, but suggest that mechanisms beyond merely stacking one autoencoder on top of another may be important for achieving invariance. Our evaluation metrics can also be used to evaluate future work in deep learning, and thus help the development of future algorithms. 1
4 0.65866411 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
Author: Novi Quadrianto, John Lim, Dale Schuurmans, Tibério S. Caetano
Abstract: We develop a convex relaxation of maximum a posteriori estimation of a mixture of regression models. Although our relaxation involves a semidefinite matrix variable, we reformulate the problem to eliminate the need for general semidefinite programming. In particular, we provide two reformulations that admit fast algorithms. The first is a max-min spectral reformulation exploiting quasi-Newton descent. The second is a min-min reformulation consisting of fast alternating steps of closed-form updates. We evaluate the methods against Expectation-Maximization in a real problem of motion segmentation from video data. 1
5 0.65821284 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
Author: Jaakko Luttinen, Alexander T. Ihler
Abstract: We present a probabilistic factor analysis model which can be used for studying spatio-temporal datasets. The spatial and temporal structure is modeled by using Gaussian process priors both for the loading matrix and the factors. The posterior distributions are approximated using the variational Bayesian framework. High computational cost of Gaussian process modeling is reduced by using sparse approximations. The model is used to compute the reconstructions of the global sea surface temperatures from a historical dataset. The results suggest that the proposed model can outperform the state-of-the-art reconstruction systems.
6 0.65448254 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
7 0.65360284 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
8 0.65216947 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction
9 0.65063602 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
10 0.6496101 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes
11 0.64809924 97 nips-2009-Free energy score space
12 0.64784265 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
13 0.64718306 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations
14 0.64703155 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling
15 0.64700609 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
16 0.64696819 76 nips-2009-Efficient Learning using Forward-Backward Splitting
17 0.64540136 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models
18 0.64528292 100 nips-2009-Gaussian process regression with Student-t likelihood
19 0.64526808 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem
20 0.64468366 70 nips-2009-Discriminative Network Models of Schizophrenia