nips nips2012 nips2012-26 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Konstantina Palla, Zoubin Ghahramani, David A. Knowles
Abstract: Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. This motivates attempting to find a disjoint partition, i.e. a simple clustering, of observed variables into highly correlated subsets. We introduce a Bayesian non-parametric approach to this problem, and demonstrate advantages over heuristic methods proposed to date. Our Dirichlet process variable clustering (DPVC) model can discover blockdiagonal covariance structures in data. We evaluate our method on both synthetic and gene expression analysis problems. 1
Reference: text
sentIndex sentText sentNum sentScore
1 A nonparametric variable clustering model Konstantina Palla∗ University of Cambridge kp376@cam. [sent-1, score-0.169]
2 uk Abstract Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. [sent-9, score-0.046]
3 a simple clustering, of observed variables into highly correlated subsets. [sent-12, score-0.091]
4 Our Dirichlet process variable clustering (DPVC) model can discover blockdiagonal covariance structures in data. [sent-14, score-0.178]
5 We evaluate our method on both synthetic and gene expression analysis problems. [sent-15, score-0.216]
6 For all these applications sparse factor analysis models can have advantages in terms of both predictive performance and interpretability (Fokoue, 2004; Fevotte and Godsill, 2006; Carvalho et al. [sent-18, score-0.171]
7 For example, data exploration might involve investigating which variables have significant loadings on a shared factor, which is aided if the model itself is sparse. [sent-20, score-0.123]
8 However, even using sparse models interpreting the results of a factor analysis can be non-trivial since a variable will typically have significant loadings on multiple factors. [sent-21, score-0.162]
9 As a result of these problems researchers will often simply cluster variables using a traditional agglomerative hierarchical clustering algorithm (Vigneau and Qannari, 2003; Duda et al. [sent-22, score-0.304]
10 Interest in variable clustering exists in many applied fields, e. [sent-24, score-0.132]
11 However, it is most commonly applied to gene expression analysis (Eisen et al. [sent-28, score-0.218]
12 Note that variable clustering represents the opposite regime to the usual clustering setting where we partition samples rather than dimensions (but of course a clustering algorithm can be made to work like this simply by transposing the data matrix). [sent-32, score-0.43]
13 Typical clustering algorithms, and their probabilistic mixture model analogues, consider how similar entities are (e. [sent-33, score-0.159]
14 in terms of Euclidean distance) rather how correlated they are, which would be closer in spirit to the ability of factor analysis to model covariance structure. [sent-35, score-0.176]
15 While using correlation distance (one minus the Pearson correlation coefficient) between variables has been proposed for clustering genes with heuristic methods, the corresponding probabilistic model appears not to have been explored to the best of our knowledge. [sent-36, score-0.331]
16 ∗ These authors contributed equally to this work 1 To address the general problem of variable clustering we develop a simple Bayesian nonparametric model which partitions observed variables into sets of highly correlated variables. [sent-37, score-0.26]
17 DPVC exhibits the usual advantages over heuristic methods of being both probabilistic and non-parametric: we can naturally handle missing data, learn the appropriate number of clusters from data, and avoid overfitting. [sent-39, score-0.069]
18 In Section 3 we note relationships to existing nonparametric sparse factor analysis models, Dirichlet process mixture models, structure learning with hidden variables, and the closely related “CrossCat” model (Shafto et al. [sent-42, score-0.177]
19 In Section 4 we describe efficient MCMC and variational Bayes algorithms for performing posterior inference in DPVC, and point out computational savings resulting from the simple nature of the model. [sent-44, score-0.049]
20 In Section 5 we present results on synthetic data where we test the method’s ability to recover a “true” partitioning, and then focus on clustering genes based on gene expression data, where we assess predictive performance on held out data. [sent-45, score-0.489]
21 The D observed dimensions correspond to measured variables for each sample, and our goal is to cluster these variables. [sent-50, score-0.176]
22 The CRP defines a distribution over partitionings (clustering) where the maximum possible number of clusters does not need to be specified a priori. [sent-55, score-0.103]
23 The CRP can be described using a sequential generative process: D customers enter a Chinese restaurant one at a time. [sent-56, score-0.079]
24 The first customer sits at some table and each subsequent customer sits at table k with mk current customers with probability proportional to mk , or at a new table with probability proportional to α, where α is a parameter of the CRP. [sent-57, score-0.13]
25 The seating arrangement of the customers at tables corresponds to a partitioning of the D customers. [sent-58, score-0.079]
26 , cD ) ∼ CRP(α), cd ∈ N (1) where cd = k denotes that variable d belongs to cluster k. [sent-62, score-0.444]
27 For each cluster k we have a single latent factor 2 xkn ∼ N (0, σx ) (2) which models correlations between the variables in cluster k. [sent-64, score-0.397]
28 Given these latent factors, real valued observed data can be modeled as ydn = gd xcd n + dn (3) 2 where gd is a factor loading for dimension d, and dn ∼ N (0, σd ) is Gaussian noise. [sent-65, score-0.6]
29 We place a 2 Gaussian prior N (0, σg ) on every element gd independently. [sent-66, score-0.163]
30 It is straightforward to generalise the model by substituting other noise models for Equation 3, for example using a logistic link for binary data ydn ∈ {0, 1}. [sent-67, score-0.078]
31 Gray nodes represent the D = 6 observed variables yd and white nodes represent the K = 3 latent variables xk . [sent-73, score-0.184]
32 Where they used the Indian buffet process to allow dimensions to have non-zero loadings on multiple factors, we use the Chinese restaurant process to explicitly enforce that a dimension can be explained by only one factor. [sent-75, score-0.158]
33 Replacing our Chinese restaurant process prior on Z with an Indian buffet prior recovers an infinite factor analysis model. [sent-79, score-0.128]
34 Equation 4 has the form of a factor analysis model. [sent-80, score-0.083]
35 It is straightforward to show that the 2 conditional covariance of y given the factor loading matrix W := G · Z is σx WWT + σ 2 I. [sent-81, score-0.168]
36 Analogously for DPVC we find cov(ydn , yd n |G, c) = 2 2 σx gd gd + σd δdd , cd = cd 0, otherwise (5) Thus we see the covariance structure implied by DPVC is block diagonal: only dimensions belonging to the same cluster have non-zero covariance. [sent-82, score-0.911]
37 The obvious probabilistic approach to clustering genes would be to simply apply a Dirichlet process mixture (DPM) of Gaussians, but considering the genes (our dimensions) as samples, and our samples as “features” so that the partitioning would be over the genes. [sent-83, score-0.37]
38 However, this approach would not achieve the desired result of clustering correlated variables, and would rather cluster together variables close in terms of Euclidean distance. [sent-84, score-0.321]
39 For example two variables which have the relationship yd = ayd for a = −1 (or a = 2) are perfectly correlated but not close in Euclidean space; a DPM approach would likely fail to cluster these together. [sent-85, score-0.25]
40 Also, practitioners typically choose either to use restrictive diagonal Gaussians, or full covariance Gaussians which result in considerably greater computational cost than our method (see Section 4. [sent-86, score-0.076]
41 DPVC can also be seen as performing a simple form of structure learning, where the observed variables are partitioned into groups explained by a single latent variable. [sent-88, score-0.079]
42 CrossCat also uses a CRP to partition variables into clusters, but then uses a second level of independent CRPs to model the dependence of variables within a cluster. [sent-94, score-0.088]
43 In other words whereas the latent variables x in Figure 1 are discrete variables (indicating cluster assignment) in CrossCat, they are continuous variables in DPVC corresponding to the latent factors. [sent-95, score-0.3]
44 For certain data the CrossCat model may be more appropriate but our simple factor analysis model is more computationally tractable and often has good predictive performance as well. [sent-96, score-0.141]
45 (2012) is related to CrossCat in the same way that NSFA is related to DPVC, by allowing an observed dimension to belong to multiple features using the IBP rather than only one cluster using the CRP. [sent-98, score-0.098]
46 4 Inference We demonstrate both MCMC and variational inference for the model. [sent-99, score-0.049]
47 The Gibbs update equations for the factor 2 2 loadings g, factors X, noise variance σd and σg are standard, and therefore only sketched out below with the details deferred to supplementary material. [sent-102, score-0.193]
48 We sample the cluster assignments c using Algorithm 8 of Neal (2000), with g integrated out but instantiating X. [sent-104, score-0.166]
49 We require P (cd = k|yd: , xk: , σg , c−d ) = P (cd |c−d ) P (yd: |xk: , gd )p(gd |σg )dgd the calculation of which is given in the supplementary material, along with expressions for µ∗ , λg , µX:n and ΛX:n . [sent-106, score-0.163]
50 Both steps here are straightforward: sampling from the prior followed by sampling from the likelihood model. [sent-110, score-0.056]
51 We look at various characteristics of the samples, including the number of clusters and the mean of X. [sent-116, score-0.069]
52 The distribution of the number of features under the successive-conditional sampler matches that under the marginal-conditional sampler almost perfectly. [sent-117, score-0.086]
53 Under the correct successive-conditional sampler the average number of clusters is 1. [sent-118, score-0.112]
54 , α/T ) cd ∼ Discrete(w) (9) (10) where we have truncated to allow a maximum of T clusters. [sent-128, score-0.173]
55 Where not otherwise specified we choose T = D so that every dimension could use its own cluster if this is supported by the data. [sent-129, score-0.098]
56 We use a variational posterior of the form D 2 2 q(v) = qw (w)qσg (σg ) N 2 2 qcd (cd )qσd (σd )qgd |cd (gd |cd ) qxnd (xnd ) (11) n=1 d=1 2 2 where qw is a Dirichlet distribution, each qcd is a discrete distribution on {1, . [sent-131, score-0.26]
57 , T }, qσg and qσd are Inverse Gamma distributions and qnd and qgd |cd are univariate Gaussian distributions. [sent-133, score-0.059]
58 We found that using the structured approximation qgd |cd (gd |cd ) where the variational distribution on gd is conditional on the cluster assignment cd gave considerably improved performance. [sent-134, score-0.572]
59 We experimented with initialising either the variational distribution over the factors qxnd (xnd ) with mean N (0, 0. [sent-139, score-0.153]
60 1) and variance 1 or each cluster assignments distribution qcd (cd ) to a sample from a uniform Dirichlet. [sent-140, score-0.19]
61 We found initialising the cluster assignments gave considerably better solutions on average. [sent-141, score-0.202]
62 For both models sampling the factor loadings matrix is O(DKN ), where K is the number of active features/clusters. [sent-147, score-0.19]
63 However, for DPVC sampling the factors X is considerably cheaper. [sent-148, score-0.089]
64 Note that mixture models with full covariance clusters would typically cost O(DKN 3 ) in this setting due to the need to perform Cholesky decompositions on N × N matrices. [sent-152, score-0.142]
65 5 Results We present results on synthetic data and two gene expression data sets. [sent-153, score-0.216]
66 We also compare to our implementation of Bayesian factor analysis (see for example Kaufman and Press (1973) or Rowe and Press (1998)) and the non-parametric sparse factor analysis (NSFA) model of (Knowles and Ghahramani, 2011). [sent-155, score-0.166]
67 We experimented with three publicly available implementations of DPM of Gaussian using full covariance matrices, but found that none of them were sufficiently numerically robust to cope with the high dimensional and sometimes ill conditioned gene expression data analysed in Section 5. [sent-156, score-0.234]
68 Dataset DPVC NSFA DPM FA (K = 5) FA (K = 10) FA (K = 20) Breast cancer −0. [sent-169, score-0.055]
69 052 Table 1: Predictive performance (mean log predictive loglikelihood over the test elements) results on two gene expression datasets. [sent-193, score-0.246]
70 1 Synthetic data In order to test the ability of the models to recover a true underlying partitioning of the variables into correlated groups we use synthetic data. [sent-195, score-0.164]
71 We generate synthetic data with D = 20 dimensions partitioned into K = 5 equally sized clusters (of four variables). [sent-196, score-0.131]
72 Within each cluster we sample analoguously to our model: sample xkn ∼ N (0, 1) for all k, n, then gd ∼ N (0, 1) for all d and finally sample ydn ∼ N (gd xcd n , 0. [sent-197, score-0.417]
73 We compare k-means (with the true number of clusters 5) using Euclidean distance and correlation distance, and DPVC with inference using MCMC or variational Bayes. [sent-200, score-0.154]
74 DPVC VB’s performance is somewhat disappointing, suggesting that even the structured variational posterior we use is a poor approximation of the true posterior. [sent-205, score-0.049]
75 2 Breast cancer dataset We assess these algorithms in terms of predictive performance on the breast cancer dataset of West et al. [sent-209, score-0.236]
76 The predictive log likelihood was calculated using every 10th sample form the final 500 samples. [sent-212, score-0.058]
77 We ran 10 repeats holding out a different random 10% of the the elements of the matrix as test data each time. [sent-213, score-0.068]
78 However, DPVC does outperform both the DPM and the finite (non-sparse) factor analysis models. [sent-217, score-0.083]
79 Middle: Agglomerative heirarchical clustering using average linkage and correlation distance. [sent-220, score-0.168]
80 significantly below that of the MCMC method, with a predictive log likelihood of −1. [sent-222, score-0.058]
81 The genes have been reordered in each plot according three different clusterings coming from k-means, hierarchical clustering and DPVC (MCMC, note we show the clustering corresponding to the posterior sample with the highest joint probability). [sent-226, score-0.347]
82 For both k-means and hierarchical clustering it was necessary to “tweak” the number of clusters to give a sensible result. [sent-227, score-0.201]
83 Hierarchical clustering in particular appeared to have a strong bias towards putting the majority of the genes in one large cluster/clade. [sent-228, score-0.215]
84 Note that such a visualisation is straightforward only because we have used a clustering based method rather than a factor analysis model, emphasising how partitionings can be more useful summaries of data for certain tasks than low dimensional embeddings. [sent-229, score-0.249]
85 Again we ran 10 repeats holding out a different random 10% of the the elements of the matrix as test data each time. [sent-235, score-0.068]
86 The results shown in Table 1 are broadly consistent with our findings for the breat cancer dataset: DPVC sits between NSFA and the less performant DPM and FA models. [sent-236, score-0.103]
87 6 Discussion We have introduced DPVC, a model for clustering variables into highly correlated subsets. [sent-241, score-0.223]
88 While, as expected, we found the predictive performance of DPVC is somewhat worse than that of state of the art nonparametric sparse factor analysis models (e. [sent-242, score-0.178]
89 NSFA), DPVC outperforms both nonparametric mixture models and Bayesian factor analysis models when applied to high dimensional data such as gene expression microarrays. [sent-244, score-0.335]
90 Regression coefficients would then correspond to the predictive ability of the clusters of variables. [sent-247, score-0.127]
91 Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. [sent-257, score-0.32]
92 Highdimensional sparse factor modeling: Applications in gene expression genomics. [sent-284, score-0.271]
93 Stochastic determination of the intrinsic structure in Bayesian factor analysis. [sent-322, score-0.083]
94 Genomic expression programs in the response of yeast cells to environmental changes. [sent-333, score-0.123]
95 Infinite sparse factor analysis and infinite independent components analysis. [sent-354, score-0.083]
96 Nonparametric Bayesian sparse factor models with application to gene expression modeling. [sent-361, score-0.271]
97 Markov chain sampling methods for Dirichlet process mixture models. [sent-383, score-0.055]
98 A nonparametric bayesian model for multiple clustering with overlapping feature views. [sent-394, score-0.204]
99 Gibbs sampling and hill climbing in Bayesian factor analysis. [sent-414, score-0.111]
100 High-dimensional sparse factor modelling: Applications in gene expression genomics. [sent-462, score-0.271]
wordName wordTfidf (topN-words)
[('dpvc', 0.742), ('nsfa', 0.215), ('cd', 0.173), ('gd', 0.163), ('clustering', 0.132), ('gene', 0.122), ('crosscat', 0.117), ('dkn', 0.098), ('cluster', 0.098), ('crp', 0.095), ('dpm', 0.092), ('knowles', 0.085), ('factor', 0.083), ('genes', 0.083), ('vb', 0.081), ('loadings', 0.079), ('mcmc', 0.078), ('ydn', 0.078), ('dirichlet', 0.075), ('clusters', 0.069), ('expression', 0.066), ('yd', 0.061), ('basak', 0.059), ('botstein', 0.059), ('qgd', 0.059), ('shafto', 0.059), ('predictive', 0.058), ('yeast', 0.057), ('cancer', 0.055), ('bishop', 0.052), ('eisen', 0.052), ('qcd', 0.052), ('variational', 0.049), ('sits', 0.048), ('fa', 0.048), ('minka', 0.048), ('correlated', 0.047), ('covariance', 0.046), ('ghahramani', 0.045), ('restaurant', 0.045), ('partitioning', 0.045), ('carvalho', 0.045), ('variables', 0.044), ('sampler', 0.043), ('chinese', 0.041), ('west', 0.041), ('assignments', 0.04), ('neal', 0.04), ('loading', 0.039), ('repeats', 0.039), ('fokoue', 0.039), ('gasch', 0.039), ('gute', 0.039), ('haeseleer', 0.039), ('lonergan', 0.039), ('qannari', 0.039), ('qxnd', 0.039), ('rowe', 0.039), ('sanche', 0.039), ('spellman', 0.039), ('vigneau', 0.039), ('xcd', 0.039), ('xkn', 0.039), ('breast', 0.038), ('nonparametric', 0.037), ('correlation', 0.036), ('latent', 0.035), ('bayesian', 0.035), ('duda', 0.034), ('fevotte', 0.034), ('geweke', 0.034), ('grunwald', 0.034), ('hotelling', 0.034), ('initialising', 0.034), ('partitionings', 0.034), ('qw', 0.034), ('rai', 0.034), ('xnd', 0.034), ('customers', 0.034), ('pearson', 0.034), ('dimensions', 0.034), ('winn', 0.032), ('nevins', 0.032), ('ig', 0.032), ('factors', 0.031), ('considerably', 0.03), ('chemical', 0.03), ('godsill', 0.03), ('kaufman', 0.03), ('lucas', 0.03), ('et', 0.03), ('ran', 0.029), ('instantiating', 0.028), ('niu', 0.028), ('pitman', 0.028), ('young', 0.028), ('gaussians', 0.028), ('sampling', 0.028), ('synthetic', 0.028), ('mixture', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 26 nips-2012-A nonparametric variable clustering model
Author: Konstantina Palla, Zoubin Ghahramani, David A. Knowles
Abstract: Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. This motivates attempting to find a disjoint partition, i.e. a simple clustering, of observed variables into highly correlated subsets. We introduce a Bayesian non-parametric approach to this problem, and demonstrate advantages over heuristic methods proposed to date. Our Dirichlet process variable clustering (DPVC) model can discover blockdiagonal covariance structures in data. We evaluate our method on both synthetic and gene expression analysis problems. 1
2 0.11889536 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
Author: Ke Jiang, Brian Kulis, Michael I. Jordan
Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1
3 0.10561857 126 nips-2012-FastEx: Hash Clustering with Exponential Families
Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy
Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1
4 0.10547903 69 nips-2012-Clustering Sparse Graphs
Author: Yudong Chen, Sujay Sanghavi, Huan Xu
Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1
5 0.10303317 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set
Author: Nan Li, Longin J. Latecki
Abstract: We formulate clustering aggregation as a special instance of Maximum-Weight Independent Set (MWIS) problem. For a given dataset, an attributed graph is constructed from the union of the input clusterings generated by different underlying clustering algorithms with different parameters. The vertices, which represent the distinct clusters, are weighted by an internal index measuring both cohesion and separation. The edges connect the vertices whose corresponding clusters overlap. Intuitively, an optimal aggregated clustering can be obtained by selecting an optimal subset of non-overlapping clusters partitioning the dataset together. We formalize this intuition as the MWIS problem on the attributed graph, i.e., finding the heaviest subset of mutually non-adjacent vertices. This MWIS problem exhibits a special structure. Since the clusters of each input clustering form a partition of the dataset, the vertices corresponding to each clustering form a maximal independent set (MIS) in the attributed graph. We propose a variant of simulated annealing method that takes advantage of this special structure. Our algorithm starts from each MIS, which is close to a distinct local optimum of the MWIS problem, and utilizes a local search heuristic to explore its neighborhood in order to find the MWIS. Extensive experiments on many challenging datasets show that: 1. our approach to clustering aggregation automatically decides the optimal number of clusters; 2. it does not require any parameter tuning for the underlying clustering algorithms; 3. it can combine the advantages of different underlying clustering algorithms to achieve superior performance; 4. it is robust against moderate or even bad input clusterings. 1
6 0.078747958 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
7 0.078272626 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering
8 0.073585011 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation
9 0.071529783 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk
10 0.071523093 327 nips-2012-Structured Learning of Gaussian Graphical Models
11 0.069765262 131 nips-2012-Feature Clustering for Accelerating Parallel Coordinate Descent
12 0.069719978 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
13 0.067068681 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters
14 0.064126246 315 nips-2012-Slice sampling normalized kernel-weighted completely random measure mixture models
15 0.063199364 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process
16 0.061521214 307 nips-2012-Semi-Crowdsourced Clustering: Generalizing Crowd Labeling by Robust Distance Metric Learning
17 0.060548503 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models
18 0.059038218 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking
19 0.058473762 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction
20 0.05673198 138 nips-2012-Fully Bayesian inference for neural models with negative-binomial spiking
topicId topicWeight
[(0, 0.152), (1, 0.066), (2, 0.0), (3, -0.022), (4, -0.146), (5, -0.048), (6, -0.013), (7, -0.051), (8, 0.048), (9, -0.019), (10, 0.049), (11, -0.096), (12, -0.035), (13, -0.036), (14, 0.074), (15, -0.051), (16, -0.052), (17, 0.019), (18, -0.053), (19, -0.042), (20, -0.063), (21, -0.013), (22, -0.006), (23, 0.071), (24, -0.008), (25, -0.06), (26, 0.03), (27, -0.031), (28, -0.008), (29, -0.013), (30, -0.06), (31, -0.038), (32, 0.003), (33, 0.014), (34, -0.005), (35, 0.0), (36, -0.031), (37, -0.006), (38, -0.012), (39, -0.068), (40, 0.032), (41, 0.014), (42, 0.05), (43, -0.02), (44, -0.036), (45, -0.013), (46, 0.008), (47, 0.018), (48, 0.025), (49, 0.056)]
simIndex simValue paperId paperTitle
same-paper 1 0.9371919 26 nips-2012-A nonparametric variable clustering model
Author: Konstantina Palla, Zoubin Ghahramani, David A. Knowles
Abstract: Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. This motivates attempting to find a disjoint partition, i.e. a simple clustering, of observed variables into highly correlated subsets. We introduce a Bayesian non-parametric approach to this problem, and demonstrate advantages over heuristic methods proposed to date. Our Dirichlet process variable clustering (DPVC) model can discover blockdiagonal covariance structures in data. We evaluate our method on both synthetic and gene expression analysis problems. 1
2 0.7829057 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
Author: Ke Jiang, Brian Kulis, Michael I. Jordan
Abstract: Sampling and variational inference techniques are two standard methods for inference in probabilistic models, but for many problems, neither approach scales effectively to large-scale data. An alternative is to relax the probabilistic model into a non-probabilistic formulation which has a scalable associated algorithm. This can often be fulfilled by performing small-variance asymptotics, i.e., letting the variance of particular distributions in the model go to zero. For instance, in the context of clustering, such an approach yields connections between the kmeans and EM algorithms. In this paper, we explore small-variance asymptotics for exponential family Dirichlet process (DP) and hierarchical Dirichlet process (HDP) mixture models. Utilizing connections between exponential family distributions and Bregman divergences, we derive novel clustering algorithms from the asymptotic limit of the DP and HDP mixtures that features the scalability of existing hard clustering methods as well as the flexibility of Bayesian nonparametric models. We focus on special cases of our analysis for discrete-data problems, including topic modeling, and we demonstrate the utility of our results by applying variants of our algorithms to problems arising in vision and document analysis. 1
3 0.74487472 126 nips-2012-FastEx: Hash Clustering with Exponential Families
Author: Amr Ahmed, Sujith Ravi, Alex J. Smola, Shravan M. Narayanamurthy
Abstract: Clustering is a key component in any data analysis toolbox. Despite its importance, scalable algorithms often eschew rich statistical models in favor of simpler descriptions such as k-means clustering. In this paper we present a sampler, capable of estimating mixtures of exponential families. At its heart lies a novel proposal distribution using random projections to achieve high throughput in generating proposals, which is crucial for clustering models with large numbers of clusters. 1
4 0.7338627 99 nips-2012-Dip-means: an incremental clustering method for estimating the number of clusters
Author: Argyris Kalogeratos, Aristidis Likas
Abstract: Learning the number of clusters is a key problem in data clustering. We present dip-means, a novel robust incremental method to learn the number of data clusters that can be used as a wrapper around any iterative clustering algorithm of k-means family. In contrast to many popular methods which make assumptions about the underlying cluster distributions, dip-means only assumes a fundamental cluster property: each cluster to admit a unimodal distribution. The proposed algorithm considers each cluster member as an individual ‘viewer’ and applies a univariate statistic hypothesis test for unimodality (dip-test) on the distribution of distances between the viewer and the cluster members. Important advantages are: i) the unimodality test is applied on univariate distance vectors, ii) it can be directly applied with kernel-based methods, since only the pairwise distances are involved in the computations. Experimental results on artificial and real datasets indicate the effectiveness of our method and its superiority over analogous approaches.
5 0.6798197 69 nips-2012-Clustering Sparse Graphs
Author: Yudong Chen, Sujay Sanghavi, Huan Xu
Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1
6 0.67250657 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk
7 0.64923686 251 nips-2012-On Lifting the Gibbs Sampling Algorithm
8 0.6485067 299 nips-2012-Scalable imputation of genetic data with a discrete fragmentation-coagulation process
9 0.64773846 68 nips-2012-Clustering Aggregation as Maximum-Weight Independent Set
10 0.63546777 294 nips-2012-Repulsive Mixtures
11 0.60789442 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation
12 0.56469905 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models
13 0.56012124 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
14 0.53129578 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery
15 0.53107911 52 nips-2012-Bayesian Nonparametric Modeling of Suicide Attempts
16 0.52578926 349 nips-2012-Training sparse natural image models with a fast Gibbs sampler of an extended state space
17 0.52198768 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
18 0.51988679 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
19 0.51791215 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
20 0.51231319 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering
topicId topicWeight
[(0, 0.079), (9, 0.01), (17, 0.015), (21, 0.026), (38, 0.084), (39, 0.025), (42, 0.024), (54, 0.024), (55, 0.014), (74, 0.047), (76, 0.149), (80, 0.088), (92, 0.046), (95, 0.285)]
simIndex simValue paperId paperTitle
1 0.74131143 365 nips-2012-Why MCA? Nonlinear sparse coding with spike-and-slab prior for neurally plausible image encoding
Author: Philip Sterne, Joerg Bornschein, Abdul-saboor Sheikh, Joerg Luecke, Jacquelyn A. Shelton
Abstract: Modelling natural images with sparse coding (SC) has faced two main challenges: flexibly representing varying pixel intensities and realistically representing lowlevel image components. This paper proposes a novel multiple-cause generative model of low-level image statistics that generalizes the standard SC model in two crucial points: (1) it uses a spike-and-slab prior distribution for a more realistic representation of component absence/intensity, and (2) the model uses the highly nonlinear combination rule of maximal causes analysis (MCA) instead of a linear combination. The major challenge is parameter optimization because a model with either (1) or (2) results in strongly multimodal posteriors. We show for the first time that a model combining both improvements can be trained efficiently while retaining the rich structure of the posteriors. We design an exact piecewise Gibbs sampling method and combine this with a variational method based on preselection of latent dimensions. This combined training scheme tackles both analytical and computational intractability and enables application of the model to a large number of observed and hidden dimensions. Applying the model to image patches we study the optimal encoding of images by simple cells in V1 and compare the model’s predictions with in vivo neural recordings. In contrast to standard SC, we find that the optimal prior favors asymmetric and bimodal activity of simple cells. Testing our model for consistency we find that the average posterior is approximately equal to the prior. Furthermore, we find that the model predicts a high percentage of globular receptive fields alongside Gabor-like fields. Similarly high percentages are observed in vivo. Our results thus argue in favor of improvements of the standard sparse coding model for simple cells by using flexible priors and nonlinear combinations. 1
same-paper 2 0.73840827 26 nips-2012-A nonparametric variable clustering model
Author: Konstantina Palla, Zoubin Ghahramani, David A. Knowles
Abstract: Factor analysis models effectively summarise the covariance structure of high dimensional data, but the solutions are typically hard to interpret. This motivates attempting to find a disjoint partition, i.e. a simple clustering, of observed variables into highly correlated subsets. We introduce a Bayesian non-parametric approach to this problem, and demonstrate advantages over heuristic methods proposed to date. Our Dirichlet process variable clustering (DPVC) model can discover blockdiagonal covariance structures in data. We evaluate our method on both synthetic and gene expression analysis problems. 1
3 0.71120083 5 nips-2012-A Conditional Multinomial Mixture Model for Superset Label Learning
Author: Liping Liu, Thomas G. Dietterich
Abstract: In the superset label learning problem (SLL), each training instance provides a set of candidate labels of which one is the true label of the instance. As in ordinary regression, the candidate label set is a noisy version of the true label. In this work, we solve the problem by maximizing the likelihood of the candidate label sets of training instances. We propose a probabilistic model, the Logistic StickBreaking Conditional Multinomial Model (LSB-CMM), to do the job. The LSBCMM is derived from the logistic stick-breaking process. It first maps data points to mixture components and then assigns to each mixture component a label drawn from a component-specific multinomial distribution. The mixture components can capture underlying structure in the data, which is very useful when the model is weakly supervised. This advantage comes at little cost, since the model introduces few additional parameters. Experimental tests on several real-world problems with superset labels show results that are competitive or superior to the state of the art. The discovered underlying structures also provide improved explanations of the classification predictions. 1
4 0.68981951 260 nips-2012-Online Sum-Product Computation Over Trees
Author: Mark Herbster, Stephen Pasteris, Fabio Vitale
Abstract: We consider the problem of performing efficient sum-product computations in an online setting over a tree. A natural application of our methods is to compute the marginal distribution at a vertex in a tree-structured Markov random field. Belief propagation can be used to solve this problem, but requires time linear in the size of the tree, and is therefore too slow in an online setting where we are continuously receiving new data and computing individual marginals. With our method we aim to update the data and compute marginals in time that is no more than logarithmic in the size of the tree, and is often significantly less. We accomplish this via a hierarchical covering structure that caches previous local sum-product computations. Our contribution is three-fold: we i) give a linear time algorithm to find an optimal hierarchical cover of a tree; ii) give a sum-productlike algorithm to efficiently compute marginals with respect to this cover; and iii) apply “i” and “ii” to find an efficient algorithm with a regret bound for the online allocation problem in a multi-task setting. 1
5 0.59678566 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes
Author: Michael Bryant, Erik B. Sudderth
Abstract: Variational methods provide a computationally scalable alternative to Monte Carlo methods for large-scale, Bayesian nonparametric learning. In practice, however, conventional batch and online variational methods quickly become trapped in local optima. In this paper, we consider a nonparametric topic model based on the hierarchical Dirichlet process (HDP), and develop a novel online variational inference algorithm based on split-merge topic updates. We derive a simpler and faster variational approximation of the HDP, and show that by intelligently splitting and merging components of the variational posterior, we can achieve substantially better predictions of test data than conventional online and batch variational algorithms. For streaming analysis of large datasets where batch analysis is infeasible, we show that our split-merge updates better capture the nonparametric properties of the underlying model, allowing continual learning of new topics.
6 0.59172493 192 nips-2012-Learning the Dependency Structure of Latent Factors
7 0.58939576 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models
8 0.58824629 47 nips-2012-Augment-and-Conquer Negative Binomial Processes
9 0.58752692 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System
10 0.58724928 233 nips-2012-Multiresolution Gaussian Processes
11 0.58706939 197 nips-2012-Learning with Recursive Perceptual Representations
12 0.58694798 355 nips-2012-Truncation-free Online Variational Inference for Bayesian Nonparametric Models
13 0.58641827 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model
14 0.58518678 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models
16 0.58487684 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs
17 0.58417726 294 nips-2012-Repulsive Mixtures
18 0.58374476 188 nips-2012-Learning from Distributions via Support Measure Machines
19 0.58346659 58 nips-2012-Bayesian models for Large-scale Hierarchical Classification
20 0.58328444 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models