nips nips2002 nips2002-150 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David A. Ross, Richard S. Zemel
Abstract: We propose a model that can learn parts-based representations of highdimensional data. Our key assumption is that the dimensions of the data can be separated into several disjoint subsets, or factors, which take on values independently of each other. We assume each factor has a small number of discrete states, and model it using a vector quantizer. The selected states of each factor represent the multiple causes of the input. Given a set of training examples, our model learns the association of data dimensions with factors, as well as the states of each VQ. Inference and learning are carried out efficiently via variational algorithms. We present applications of this model to problems in image decomposition, collaborative filtering, and text classification.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We propose a model that can learn parts-based representations of highdimensional data. [sent-5, score-0.046]
2 Our key assumption is that the dimensions of the data can be separated into several disjoint subsets, or factors, which take on values independently of each other. [sent-6, score-0.061]
3 The selected states of each factor represent the multiple causes of the input. [sent-8, score-0.108]
4 Given a set of training examples, our model learns the association of data dimensions with factors, as well as the states of each VQ. [sent-9, score-0.17]
5 Inference and learning are carried out efficiently via variational algorithms. [sent-10, score-0.056]
6 We present applications of this model to problems in image decomposition, collaborative filtering, and text classification. [sent-11, score-0.18]
7 1 Introduction Many collections of data exhibit a common underlying structure: they consist of a number of parts or factors, each of which has a small number of discrete states. [sent-12, score-0.057]
8 For example, in a collection of facial images, every image contains eyes, a nose, and a mouth (except under occlusion), each of which has a range of different appearances. [sent-13, score-0.061]
9 A specific image can be described as a composite sketch: a selection of the appearance of each part, depending on the individual depicted. [sent-14, score-0.061]
10 In this paper, we describe a stochastic generative model for data of this type. [sent-15, score-0.051]
11 This model is well-suited to decomposing images into parts (it can be thought of as a Mr. [sent-16, score-0.166]
12 Potato Head model), but also applies to domains such as text and collaborative filtering in which the parts correspond to latent features, each having several alternative instantiations. [sent-17, score-0.223]
13 This representational scheme is powerful due to its combinatorial nature: while a standard clustering/VQ method containing N states can represent at most N items, if we divide the N into j-state VQs, we can represent j N/j items. [sent-18, score-0.087]
14 To generate an observed data example of D dimensions, x ∈ D , we stochastically select one state for each VQ, and one VQ for each dimension. [sent-21, score-0.048]
15 Given these selections, a single state from a single VQ determines the value of each data dimension x d . [sent-22, score-0.057]
16 ad rd xd sk bk µ kj D o kj J K Figure 1: Graphical model representation of MCVQ. [sent-23, score-0.465]
17 The plates depict repetitions across the appropriate dimensions for each of the three variables: the K VQs, the J states (codebook vectors) per VQ, and the D input dimensions. [sent-26, score-0.233]
18 The selections are represented as binary latent variables, S = {skj }, R = {rdk }, for d = 1. [sent-27, score-0.143]
19 The variable skj = 1 if and only if state j has been selected from VQ k. [sent-37, score-0.16]
20 Similarly rdk = 1 when VQ k has been selected for data dimension d. [sent-38, score-0.2]
21 The selections of states for the different data dimensions are joined along the J dimension and occur independently along the K dimension. [sent-49, score-0.232]
22 We apply a variational EM algorithm to learn the parameters θ, and infer hidden variables given observations. [sent-51, score-0.085]
23 The negative of the free energy −F is a lower bound on the log likelihood of generating the observations. [sent-53, score-0.05]
24 The variational EM algorithm improves this bound by iteratively improving −F with respect to Q (E-step) and to θ (M-step). [sent-54, score-0.056]
25 Let C be the set of training cases, and Qc be the approximation to the posterior distribution over latent variables given the training case (observation) c ∈ C. [sent-55, score-0.159]
26 We further constrain this c variational approach, forcing the {gdk } to be consistent across all observations xc . [sent-56, score-0.139]
27 Hence these parameters relating to the gating variables that govern the selection of a factor for a given observation dimension, are not dependent on the observation. [sent-57, score-0.079]
28 This approach encourages the model to learn representations that conform to this constraint. [sent-58, score-0.046]
29 That is, if there are several posterior distributions consistent with an observed data vector, it favours distributions over {rd } that are consistent with those of other observed data vectors. [sent-59, score-0.063]
30 Variational EM updates for this model are identical to those above, 1 except that the C terms in the updates for gdk disappear. [sent-61, score-0.284]
31 This can be thought of as gradually moving from a generative model in which the rdk ’s can vary across examples, to one in which they are the same for each example. [sent-63, score-0.251]
32 The inferred values of the variational parameters specify a posterior distribution over the VQ states, which in turn implies a mixture of Gaussians for each input dimension. [sent-64, score-0.105]
33 Beˆ low we use the mean of this mixture, xc = k,j mc gdk µdkj , to measure the model’s kj d reconstruction error on case c. [sent-65, score-0.658]
34 4 Related models MCVQ falls into the expanding class of unsupervised algorithms known as factorial methods, in which the aim of the learning algorithm is to discover multiple independent causes, or factors, that can well characterize the observed data. [sent-66, score-0.047]
35 MCVQ entails another round of competition, from amongst the VQ selections rather than the linear combination of CVQ and NMF, which leads to a division of input dimensions into separate causes. [sent-69, score-0.183]
36 MCVQ also resembles a wide range of generative models developed to address image segmentation [5, 6, 7]. [sent-71, score-0.09]
37 These are generally complex, hierarchical models designed to focus on a different aspect of this problem than that of MCVQ: to dynamically decide which pixels belong to which objects. [sent-72, score-0.086]
38 part locations, assuming that these are consistent across images, and instead focuses on the assembling of input dimensions into parts, and the variety of instantiations of each part. [sent-77, score-0.117]
39 Finally, MCVQ also closely relates to sparse matrix decomposition techniques, such as the aspect model [8], a latent variable model which associates an unobserved class variable, the aspect z, with each observation. [sent-79, score-0.315]
40 The latent Dirichlet allocation model [9] can be seen as a proper generative version of the aspect model: each document/input vector is not represented as a set of labels for a particular vector in the training set, and there is a natural way to examine the probability of some unseen vector. [sent-81, score-0.226]
41 MCVQ shares the ability of these models to associate multiple aspects with a given document, yet it achieves this by sampling from multiple aspects in parallel, rather than repeated sampling of an aspect within a document. [sent-82, score-0.136]
42 It also imposes the additional selection of an aspect for each input dimension, which leads to a soft decomposition of these dimensions based on their choice of aspect. [sent-83, score-0.205]
43 Below we present some initial experiments examining whether MCVQ can match the successful application of the aspect model to information retrieval and collaborative filtering problems, after evaluating it on image data. [sent-84, score-0.258]
44 1 Parts-based Image Decomposition: Shapes and Faces The first dataset used to test our model consisted of 11 × 11 gray-scale images, as pictured in Fig. [sent-86, score-0.125]
45 Each image in the set contains three shapes: a box, a triangle, and a cross. [sent-88, score-0.061]
46 A model containing 3 VQs, 5 states each, was trained on a set of 100 shape images. [sent-90, score-0.144]
47 The training set was selected so that none of the examples depict cases in which all three shapes are located near the top of the image. [sent-94, score-0.108]
48 Despite this handicap, MCVQ is able to learn the full range of shape positions, and can accurately reconstruct such an image (Fig. [sent-95, score-0.09]
49 3b) produce holistic representations of the data, in which each basis vector tries to account for variation observed across the entire image. [sent-99, score-0.073]
50 Unlike MCVQ, NMF does not group related parts, and its generative model does not limit the combination of parts to only produce valid images. [sent-102, score-0.128]
51 As an empirical comparison, we tested the reconstruction error of each of the aforementioned methods on an independent test set of 629 images. [sent-103, score-0.126]
52 9 × 105 bits, and pixel 1 We define description length to be the number of bits required to represent the model, plus the a) µ for each component G b) k=1 VQ 1 k=2 VQ 2 k=3 VQ 3 c) Original Reconstruction Figure 2: a) A sample of 24 training images from the Shapes dataset. [sent-108, score-0.174]
53 b) A typical representation learned by MCVQ with 3 VQs and 5 states per VQ. [sent-109, score-0.088]
54 c) Reconstruction of a test image: original (left) and reconstruction (right). [sent-110, score-0.126]
55 a) b) c) d) Original VQ PCA NMF Figure 3: Other methods trained on shape images: a) VQ, b) PCA, and c) NMF. [sent-111, score-0.055]
56 d) Reconstruction of a test image by the three methods (cf. [sent-112, score-0.093]
57 As a more interesting visual application, we trained our model on a database of face images (www. [sent-127, score-0.112]
58 The dataset consists of 19 × 19 gray-scale images, each containing a single frontal or near-frontal face. [sent-131, score-0.038]
59 A model of 6 VQs with 12 states each was trained on 2000 images, requiring 15 iterations of EM to converge. [sent-132, score-0.115]
60 As with shape images, the model learned a parts-based representation of the faces. [sent-133, score-0.051]
61 The reconstruction of two test images, along with the specific parts used to generate each, is illustrated in Fig. [sent-134, score-0.183]
62 We again compared the reconstruction error of MCVQ with VQ, PCA, and NMF. [sent-139, score-0.094]
63 The training and testing sets contained 1800 and 629 images respectively. [sent-140, score-0.084]
64 reconstruction error number of bits to encode all the test examples using the model. [sent-145, score-0.165]
65 This metric balances the large model cost and small encoding cost of VQ/MCVQ with the small model cost and large encoding cost of PCA/NMF. [sent-146, score-0.044]
66 Figure 4: The reconstruction of two test images from the Faces dataset. [sent-147, score-0.19]
67 Beside each reconstruction are the parts—the most active state in each of six VQs—used to generate it. [sent-148, score-0.121]
68 Each part j ∈ k is represented by its gated prediction (gdk ∗ mkj ) for each image pixel i. [sent-149, score-0.187]
69 2 Collaborative Filtering The application of MCVQ to image data assumes that the images are normalized, i. [sent-156, score-0.125]
70 Normalization can be difficult to achieve in some image contexts; however, in many other types of applications, the input representation is more stable. [sent-159, score-0.089]
71 For example, many information retrieval applications employ bag-of-words representations, in which a given word always occupies the same input element. [sent-160, score-0.073]
72 We test MCVQ on a collaborative filtering task, utilizing the EachMovie dataset, where the input vectors are ratings by users of movies, and a given element always corresponds to the same movie. [sent-161, score-0.339]
73 The original dataset contains ratings, on a scale from 1 to 6, of a set of 1649 movies, by 74,424 users. [sent-162, score-0.038]
74 In order to reduce the sparseness of the dataset, since many users rated only a few movies, we only included users who rated at least 75 movies and movies rated by at least 126 users, leaving a total of 1003 movies and 5831 users. [sent-163, score-0.863]
75 The remaining dataset was still very sparse, as the maximum user rated 928 movies, and the maximum movie was rated by 5401 users. [sent-164, score-0.282]
76 We split the data randomly into 4831 users for a training set, and 1000 users in a test set. [sent-165, score-0.258]
77 We ran MCVQ with 8 VQs and 6 states per VQ on this dataset. [sent-166, score-0.088]
78 1), all the observation dimensions are leaves, so an input variable whose value is not specified in a particular observation vector will not play a role in inference or learning. [sent-170, score-0.156]
79 We compare the performance of MCVQ on this dataset to the aspect model. [sent-172, score-0.124]
80 We implemented a version of the aspect model, with 50 aspects and truncated Gaussians for ratings, and used “tempered EM” (with smoothing) to fit the parameters[10]. [sent-173, score-0.111]
81 For both models, we train the model on the 4831 users in the training set, and then, for each test user, we let the model observe some fixed number of ratings and hold out the rest. [sent-174, score-0.31]
82 We evaluate the models by measuring the absolute difference between their predictions for a held-out rating and the user’s true rating, averaged over all held-out ratings for all test users (Fig. [sent-175, score-0.301]
83 7 (-) Figure 5: The MCVQ representation of two test users in the EachMovie dataset. [sent-233, score-0.135]
84 The 3 most conspicuously high-rated (bold) and low-rated movies by the most active states of 4 of the 8 VQs are shown, where conspicuousness is the deviation from the mean rating for a given movie. [sent-234, score-0.26]
85 Each state’s predictions, µdkj , can be compared to the test user’s true ratings (in parentheses); the model’s prediction is a convex combination of state predictions. [sent-235, score-0.19]
86 Note the intuitive decomposition of movies into separate VQs, and that different states within a VQ may predict very different rating patterns for the same movies. [sent-236, score-0.29]
87 2 Figure 6: The average absolute deviation of predicted and true values of held-out ratings is compared for MCVQ and the aspect model. [sent-239, score-0.197]
88 Note that the number of users per x-bin decreases with increasing x, as a user must rate at least x+1 movies to be included. [sent-240, score-0.3]
89 8 200 300 400 500 600 Number of observed ratings 5. [sent-246, score-0.132]
90 3 Text Classification MCVQ can also be used for information retrieval from text documents, by employing the bag-of-words representation. [sent-247, score-0.056]
91 A model of 8 VQs, 8 states each, was trained on the data, converging after 15 iterations of EM. [sent-258, score-0.115]
92 When trained on text data, the values of {gdk } provide a segmentation of the vocabulary into subsets of words with correlated frequencies. [sent-261, score-0.099]
93 6 Conclusion We have presented a novel method for learning factored representations of data which can be efficiently learned, and employed across a wide variety of problem domains. [sent-263, score-0.052]
94 Sejnowski afferent ekf latent ltp lgn niranjan som gerstner interneurons freitas detection zador excitatory kalman search soma membrane wp data depression svms svm margin kernel risk The Relevance Vector Machine Michael E. [sent-269, score-0.069]
95 For each document we show the states selected for it from 4 VQs. [sent-271, score-0.088]
96 The bold (plain) words for each state are those most conspicuous by their above (below) average predicted frequency. [sent-272, score-0.048]
97 use multiple causes to generate input, with competitive aspects of clustering methods. [sent-273, score-0.045]
98 In addition, it gains combinatorial power by splitting the input into subsets, and can readily handle sparse, high-dimensional data. [sent-274, score-0.048]
99 One direction of further research involves extending the applications described above, including applying MCVQ to other dimensions of the NIPS corpus such as authors to find groupings of authors based on word-use frequency. [sent-275, score-0.089]
100 Learning the parts of objects by non-negative matrix factorization. [sent-309, score-0.057]
wordName wordTfidf (topN-words)
[('mcvq', 0.579), ('vq', 0.311), ('vqs', 0.299), ('dkj', 0.28), ('gdk', 0.262), ('kj', 0.156), ('rdk', 0.149), ('movies', 0.138), ('nmf', 0.131), ('skj', 0.112), ('ratings', 0.111), ('mkj', 0.104), ('users', 0.103), ('reconstruction', 0.094), ('mc', 0.091), ('aspect', 0.086), ('rated', 0.081), ('selections', 0.074), ('latent', 0.069), ('states', 0.067), ('collaborative', 0.065), ('images', 0.064), ('dimensions', 0.061), ('image', 0.061), ('parts', 0.057), ('variational', 0.056), ('xc', 0.055), ('rating', 0.055), ('xd', 0.051), ('rd', 0.047), ('movie', 0.044), ('documents', 0.042), ('em', 0.04), ('shapes', 0.039), ('bits', 0.039), ('user', 0.038), ('dataset', 0.038), ('cvq', 0.037), ('eachmovie', 0.037), ('florette', 0.037), ('robocop', 0.037), ('cooperative', 0.037), ('pca', 0.037), ('sk', 0.033), ('pictured', 0.033), ('prioritized', 0.033), ('text', 0.032), ('test', 0.032), ('quantization', 0.031), ('dimension', 0.03), ('log', 0.03), ('jean', 0.03), ('decomposition', 0.03), ('ltering', 0.03), ('description', 0.029), ('factors', 0.029), ('toronto', 0.029), ('generative', 0.029), ('shape', 0.029), ('variables', 0.029), ('across', 0.028), ('input', 0.028), ('depict', 0.028), ('gating', 0.028), ('pomdps', 0.028), ('corpus', 0.028), ('state', 0.027), ('factorial', 0.026), ('eyes', 0.026), ('trained', 0.026), ('aspects', 0.025), ('dk', 0.025), ('eq', 0.025), ('representations', 0.024), ('retrieval', 0.024), ('editors', 0.023), ('thought', 0.023), ('leen', 0.023), ('inference', 0.023), ('tesauro', 0.023), ('model', 0.022), ('pixel', 0.022), ('observation', 0.022), ('word', 0.021), ('bold', 0.021), ('mdp', 0.021), ('vocabulary', 0.021), ('exp', 0.021), ('posterior', 0.021), ('selected', 0.021), ('per', 0.021), ('observed', 0.021), ('head', 0.021), ('faces', 0.021), ('combination', 0.02), ('training', 0.02), ('combinatorial', 0.02), ('causes', 0.02), ('free', 0.02), ('correlated', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 150 nips-2002-Multiple Cause Vector Quantization
Author: David A. Ross, Richard S. Zemel
Abstract: We propose a model that can learn parts-based representations of highdimensional data. Our key assumption is that the dimensions of the data can be separated into several disjoint subsets, or factors, which take on values independently of each other. We assume each factor has a small number of discrete states, and model it using a vector quantizer. The selected states of each factor represent the multiple causes of the input. Given a set of training examples, our model learns the association of data dimensions with factors, as well as the states of each VQ. Inference and learning are carried out efficiently via variational algorithms. We present applications of this model to problems in image decomposition, collaborative filtering, and text classification.
2 0.076933339 74 nips-2002-Dynamic Structure Super-Resolution
Author: Amos J. Storkey
Abstract: The problem of super-resolution involves generating feasible higher resolution images, which are pleasing to the eye and realistic, from a given low resolution image. This might be attempted by using simple filters for smoothing out the high resolution blocks or through applications where substantial prior information is used to imply the textures and shapes which will occur in the images. In this paper we describe an approach which lies between the two extremes. It is a generic unsupervised method which is usable in all domains, but goes beyond simple smoothing methods in what it achieves. We use a dynamic tree-like architecture to model the high resolution data. Approximate conditioning on the low resolution image is achieved through a mean field approach. 1
3 0.068074502 10 nips-2002-A Model for Learning Variance Components of Natural Images
Author: Yan Karklin, Michael S. Lewicki
Abstract: We present a hierarchical Bayesian model for learning efficient codes of higher-order structure in natural images. The model, a non-linear generalization of independent component analysis, replaces the standard assumption of independence for the joint distribution of coefficients with a distribution that is adapted to the variance structure of the coefficients of an efficient image basis. This offers a novel description of higherorder image structure and provides a way to learn coarse-coded, sparsedistributed representations of abstract image properties such as object location, scale, and texture.
4 0.063964315 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
Author: Peter Sykacek, Stephen J. Roberts
Abstract: We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques. We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. The variational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.
5 0.061909124 114 nips-2002-Information Regularization with Partially Labeled Data
Author: Martin Szummer, Tommi S. Jaakkola
Abstract: Classification with partially labeled data requires using a large number of unlabeled examples (or an estimated marginal P (x)), to further constrain the conditional P (y|x) beyond a few available labeled examples. We formulate a regularization approach to linking the marginal and the conditional in a general way. The regularization penalty measures the information that is implied about the labels over covering regions. No parametric assumptions are required and the approach remains tractable even for continuous marginal densities P (x). We develop algorithms for solving the regularization problem for finite covers, establish a limiting differential equation, and exemplify the behavior of the new regularization approach in simple cases.
6 0.06094268 115 nips-2002-Informed Projections
7 0.059468832 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
8 0.059020698 87 nips-2002-Fast Transformation-Invariant Factor Analysis
9 0.057906583 39 nips-2002-Bayesian Image Super-Resolution
10 0.05727604 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains
11 0.055666428 163 nips-2002-Prediction and Semantic Association
12 0.053540345 24 nips-2002-Adaptive Scaling for Feature Selection in SVMs
13 0.053070258 112 nips-2002-Inferring a Semantic Representation of Text via Cross-Language Correlation Analysis
14 0.051951267 143 nips-2002-Mean Field Approach to a Probabilistic Model in Information Retrieval
15 0.05193799 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
16 0.050759252 86 nips-2002-Fast Sparse Gaussian Process Methods: The Informative Vector Machine
17 0.048739623 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
18 0.048067953 73 nips-2002-Dynamic Bayesian Networks with Deterministic Latent Tables
19 0.047980208 116 nips-2002-Interpreting Neural Response Variability as Monte Carlo Sampling of the Posterior
20 0.047422346 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
topicId topicWeight
[(0, -0.151), (1, -0.017), (2, -0.025), (3, 0.104), (4, -0.058), (5, 0.055), (6, -0.001), (7, -0.042), (8, 0.016), (9, -0.031), (10, -0.047), (11, -0.066), (12, 0.049), (13, -0.024), (14, 0.021), (15, -0.025), (16, -0.021), (17, 0.017), (18, 0.041), (19, 0.056), (20, 0.007), (21, -0.022), (22, -0.043), (23, 0.008), (24, -0.05), (25, 0.015), (26, 0.002), (27, -0.084), (28, -0.077), (29, 0.023), (30, -0.075), (31, 0.026), (32, 0.002), (33, 0.033), (34, 0.012), (35, 0.013), (36, -0.019), (37, 0.031), (38, 0.026), (39, -0.039), (40, 0.07), (41, 0.014), (42, 0.096), (43, -0.025), (44, -0.062), (45, -0.08), (46, 0.031), (47, 0.187), (48, 0.072), (49, -0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.88217652 150 nips-2002-Multiple Cause Vector Quantization
Author: David A. Ross, Richard S. Zemel
Abstract: We propose a model that can learn parts-based representations of highdimensional data. Our key assumption is that the dimensions of the data can be separated into several disjoint subsets, or factors, which take on values independently of each other. We assume each factor has a small number of discrete states, and model it using a vector quantizer. The selected states of each factor represent the multiple causes of the input. Given a set of training examples, our model learns the association of data dimensions with factors, as well as the states of each VQ. Inference and learning are carried out efficiently via variational algorithms. We present applications of this model to problems in image decomposition, collaborative filtering, and text classification.
2 0.50963676 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains
Author: Dmitry Y. Pavlov, David M. Pennock
Abstract: We develop a maximum entropy (maxent) approach to generating recommendations in the context of a user’s current navigation stream, suitable for environments where data is sparse, high-dimensional, and dynamic— conditions typical of many recommendation applications. We address sparsity and dimensionality reduction by first clustering items based on user access patterns so as to attempt to minimize the apriori probability that recommendations will cross cluster boundaries and then recommending only within clusters. We address the inherent dynamic nature of the problem by explicitly modeling the data as a time series; we show how this representational expressivity fits naturally into a maxent framework. We conduct experiments on data from ResearchIndex, a popular online repository of over 470,000 computer science documents. We show that our maxent formulation outperforms several competing algorithms in offline tests simulating the recommendation of documents to ResearchIndex users.
3 0.50406581 204 nips-2002-VIBES: A Variational Inference Engine for Bayesian Networks
Author: Christopher M. Bishop, David Spiegelhalter, John Winn
Abstract: In recent years variational methods have become a popular tool for approximate inference and learning in a wide variety of probabilistic models. For each new application, however, it is currently necessary first to derive the variational update equations, and then to implement them in application-specific code. Each of these steps is both time consuming and error prone. In this paper we describe a general purpose inference engine called VIBES (‘Variational Inference for Bayesian Networks’) which allows a wide variety of probabilistic models to be implemented and solved variationally without recourse to coding. New models are specified either through a simple script or via a graphical interface analogous to a drawing package. VIBES then automatically generates and solves the variational equations. We illustrate the power and flexibility of VIBES using examples from Bayesian mixture modelling. 1
4 0.49689606 7 nips-2002-A Hierarchical Bayesian Markovian Model for Motifs in Biopolymer Sequences
Author: Eric P. Xing, Michael I. Jordan, Richard M. Karp, Stuart Russell
Abstract: We propose a dynamic Bayesian model for motifs in biopolymer sequences which captures rich biological prior knowledge and positional dependencies in motif structure in a principled way. Our model posits that the position-specific multinomial parameters for monomer distribution are distributed as a latent Dirichlet-mixture random variable, and the position-specific Dirichlet component is determined by a hidden Markov process. Model parameters can be fit on training motifs using a variational EM algorithm within an empirical Bayesian framework. Variational inference is also used for detecting hidden motifs. Our model improves over previous models that ignore biological priors and positional dependence. It has much higher sensitivity to motifs during detection and a notable ability to distinguish genuine motifs from false recurring patterns.
5 0.48536727 21 nips-2002-Adaptive Classification by Variational Kalman Filtering
Author: Peter Sykacek, Stephen J. Roberts
Abstract: We propose in this paper a probabilistic approach for adaptive inference of generalized nonlinear classification that combines the computational advantage of a parametric solution with the flexibility of sequential sampling techniques. We regard the parameters of the classifier as latent states in a first order Markov process and propose an algorithm which can be regarded as variational generalization of standard Kalman filtering. The variational Kalman filter is based on two novel lower bounds that enable us to use a non-degenerate distribution over the adaptation rate. An extensive empirical evaluation demonstrates that the proposed method is capable of infering competitive classifiers both in stationary and non-stationary environments. Although we focus on classification, the algorithm is easily extended to other generalized nonlinear models.
6 0.48366693 87 nips-2002-Fast Transformation-Invariant Factor Analysis
7 0.48000139 1 nips-2002-"Name That Song!" A Probabilistic Approach to Querying on Music and Text
8 0.47841933 74 nips-2002-Dynamic Structure Super-Resolution
9 0.46993282 143 nips-2002-Mean Field Approach to a Probabilistic Model in Information Retrieval
10 0.4542945 115 nips-2002-Informed Projections
11 0.41522482 190 nips-2002-Stochastic Neighbor Embedding
12 0.41383138 37 nips-2002-Automatic Derivation of Statistical Algorithms: The EM Family and Beyond
13 0.40831035 39 nips-2002-Bayesian Image Super-Resolution
14 0.40538049 122 nips-2002-Learning About Multiple Objects in Images: Factorial Learning without Factorial Search
15 0.39996389 81 nips-2002-Expected and Unexpected Uncertainty: ACh and NE in the Neocortex
16 0.39963105 18 nips-2002-Adaptation and Unsupervised Learning
17 0.39928243 182 nips-2002-Shape Recipes: Scene Representations that Refer to the Image
18 0.39390317 157 nips-2002-On the Dirichlet Prior and Bayesian Regularization
19 0.38069716 205 nips-2002-Value-Directed Compression of POMDPs
20 0.37762907 107 nips-2002-Identity Uncertainty and Citation Matching
topicId topicWeight
[(11, 0.013), (14, 0.345), (23, 0.031), (42, 0.081), (54, 0.1), (55, 0.025), (67, 0.019), (68, 0.019), (73, 0.01), (74, 0.094), (87, 0.01), (92, 0.04), (98, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.78955078 150 nips-2002-Multiple Cause Vector Quantization
Author: David A. Ross, Richard S. Zemel
Abstract: We propose a model that can learn parts-based representations of highdimensional data. Our key assumption is that the dimensions of the data can be separated into several disjoint subsets, or factors, which take on values independently of each other. We assume each factor has a small number of discrete states, and model it using a vector quantizer. The selected states of each factor represent the multiple causes of the input. Given a set of training examples, our model learns the association of data dimensions with factors, as well as the states of each VQ. Inference and learning are carried out efficiently via variational algorithms. We present applications of this model to problems in image decomposition, collaborative filtering, and text classification.
2 0.76333135 96 nips-2002-Generalized² Linear² Models
Author: Geoffrey J. Gordon
Abstract: We introduce the Generalized 2 Linear 2 Model, a statistical estimator which combines features of nonlinear regression and factor analysis. A (GL)2M approximately decomposes a rectangular matrix X into a simpler representation j(g(A)h(B)). Here A and Bare low-rank matrices, while j, g, and h are link functions. (GL)2Ms include many useful models as special cases, including principal components analysis, exponential-family peA, the infomax formulation of independent components analysis, linear regression, and generalized linear models. They also include new and interesting special cases, one of which we describe below. We also present an iterative procedure which optimizes the parameters of a (GL)2M. This procedure reduces to well-known algorithms for some of the special cases listed above; for other special cases, it is new. 1
3 0.75453413 83 nips-2002-Extracting Relevant Structures with Side Information
Author: Gal Chechik, Naftali Tishby
Abstract: The problem of extracting the relevant aspects of data, in face of multiple conflicting structures, is inherent to modeling of complex data. Extracting structure in one random variable that is relevant for another variable has been principally addressed recently via the information bottleneck method [15]. However, such auxiliary variables often contain more information than is actually required due to structures that are irrelevant for the task. In many other cases it is in fact easier to specify what is irrelevant than what is, for the task at hand. Identifying the relevant structures, however, can thus be considerably improved by also minimizing the information about another, irrelevant, variable. In this paper we give a general formulation of this problem and derive its formal, as well as algorithmic, solution. Its operation is demonstrated in a synthetic example and in two real world problems in the context of text categorization and face images. While the original information bottleneck problem is related to rate distortion theory, with the distortion measure replaced by the relevant information, extracting relevant features while removing irrelevant ones is related to rate distortion with side information.
4 0.54183888 100 nips-2002-Half-Lives of EigenFlows for Spectral Clustering
Author: Chakra Chennubhotla, Allan D. Jepson
Abstract: Using a Markov chain perspective of spectral clustering we present an algorithm to automatically find the number of stable clusters in a dataset. The Markov chain’s behaviour is characterized by the spectral properties of the matrix of transition probabilities, from which we derive eigenflows along with their halflives. An eigenflow describes the flow of probability mass due to the Markov chain, and it is characterized by its eigenvalue, or equivalently, by the halflife of its decay as the Markov chain is iterated. A ideal stable cluster is one with zero eigenflow and infinite half-life. The key insight in this paper is that bottlenecks between weakly coupled clusters can be identified by computing the sensitivity of the eigenflow’s halflife to variations in the edge weights. We propose a novel E IGEN C UTS algorithm to perform clustering that removes these identified bottlenecks in an iterative fashion.
5 0.52353758 82 nips-2002-Exponential Family PCA for Belief Compression in POMDPs
Author: Nicholas Roy, Geoffrey J. Gordon
Abstract: Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are intractable for large models. The intractability of these algorithms is due to a great extent to their generating an optimal policy over the entire belief space. However, in real POMDP problems most belief states are unlikely, and there is a structured, low-dimensional manifold of plausible beliefs embedded in the high-dimensional belief space. We introduce a new method for solving large-scale POMDPs by taking advantage of belief space sparsity. We reduce the dimensionality of the belief space by exponential family Principal Components Analysis [1], which allows us to turn the sparse, highdimensional belief space into a compact, low-dimensional representation in terms of learned features of the belief state. We then plan directly on the low-dimensional belief features. By planning in a low-dimensional space, we can find policies for POMDPs that are orders of magnitude larger than can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and also on a mobile robot navigation task.
6 0.51117349 132 nips-2002-Learning to Detect Natural Image Boundaries Using Brightness and Texture
7 0.49847922 27 nips-2002-An Impossibility Theorem for Clustering
8 0.49760941 28 nips-2002-An Information Theoretic Approach to the Functional Classification of Neurons
9 0.49331611 2 nips-2002-A Bilinear Model for Sparse Coding
10 0.49240017 169 nips-2002-Real-Time Particle Filters
11 0.49043536 8 nips-2002-A Maximum Entropy Approach to Collaborative Filtering in Dynamic, Sparse, High-Dimensional Domains
12 0.48889273 53 nips-2002-Clustering with the Fisher Score
13 0.48862758 40 nips-2002-Bayesian Models of Inductive Generalization
14 0.48469192 188 nips-2002-Stability-Based Model Selection
15 0.48397166 70 nips-2002-Distance Metric Learning with Application to Clustering with Side-Information
16 0.47838587 190 nips-2002-Stochastic Neighbor Embedding
17 0.47351342 3 nips-2002-A Convergent Form of Approximate Policy Iteration
18 0.4734593 52 nips-2002-Cluster Kernels for Semi-Supervised Learning
19 0.47182372 127 nips-2002-Learning Sparse Topographic Representations with Products of Student-t Distributions
20 0.47149175 74 nips-2002-Dynamic Structure Super-Resolution