nips nips2003 nips2003-83 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas L. Griffiths, Michael I. Jordan, Joshua B. Tenenbaum, David M. Blei
Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu University of California, Berkeley Berkeley, CA 94720 Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We address the problem of learning topic hierarchies from data. [sent-11, score-0.368]
2 We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. [sent-13, score-0.757]
3 We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. [sent-15, score-0.472]
4 An important instance of such modeling challenges is provided by the problem of learning a topic hierarchy from data. [sent-20, score-0.409]
5 Given a collection of “documents,” each of which contains a set of “words,” we wish to discover common usage patterns or “topics” in the documents, and to organize these topics into a hierarchy. [sent-21, score-0.231]
6 (Note that while we use the terminology of document modeling throughout this paper, the methods that we describe are general. [sent-22, score-0.249]
7 We approach this model selection problem by specifying a generative probabilistic model for hierarchical structures and taking a Bayesian perspective on the problem of learning these structures from data. [sent-24, score-0.224]
8 Thus our hierarchies are random variables; moreover, these random variables are specified procedurally, according to an algorithm that constructs the hierarchy as data are made available. [sent-25, score-0.28]
9 The probabilistic object that underlies this approach is a distribution on partitions of integers known as the Chinese restaurant process [1]. [sent-26, score-0.6]
10 We show how to extend the Chinese restaurant process to a hierarchy of partitions, and show how to use this new process as a representation of prior and posterior distributions for topic hierarchies. [sent-27, score-1.017]
11 There are several possible approaches to the modeling of topic hierarchies. [sent-28, score-0.267]
12 In our approach, each node in the hierarchy is associated with a topic, where a topic is a distribution across words. [sent-29, score-0.442]
13 A document is generated by choosing a path from the root to a leaf, repeatedly sampling topics along that path, and sampling the words from the selected topics. [sent-30, score-0.68]
14 Thus the organization of topics into a hierarchy aims to capture the breadth of usage of topics across the corpus, reflecting underlying syntactic and semantic notions of generality and specificity. [sent-31, score-0.524]
15 This approach differs from models of topic hierarchies which are built on the premise that the distributions associated with parents and children are similar [2]. [sent-32, score-0.41]
16 Our model more closely resembles the hierarchical topic model considered in [3], though that work does not address the model selection problem which is our primary focus. [sent-34, score-0.404]
17 2 Chinese restaurant processes We begin with a brief description of the Chinese restaurant process and subsequently show how this process can be extended to hierarchies. [sent-35, score-1.06]
18 1 The Chinese restaurant process The Chinese restaurant process (CRP) is a distribution on partitions of integers obtained by imagining a process by which M customers sit down in a Chinese restaurant with an infinite number of tables. [sent-37, score-1.789]
19 The mth subsequent customer sits at a table drawn from the following distribution: p(occupied table i | previous customers) = p(next unoccupied table | previous customers) = mi γ+m−1 γ γ+m−1 (1) where mi is the number of previous customers at table i, and γ is a parameter. [sent-40, score-0.306]
20 In a species sampling mixture [6], each table in the Chinese restaurant is associated with a draw from p(β | η) where β is a mixture component parameter. [sent-47, score-0.868]
21 First, it is a distribution over seating plans; the number of mixture components is determined by the number of tables which the data occupy. [sent-51, score-0.24]
22 Applications to various kinds of mixture models have begun to appear in recent years; examples include Gaussian mixture models [8], hidden Markov models [9] and mixtures of experts [10]. [sent-54, score-0.184]
23 1 The terminology was inspired by the Chinese restaurants in San Francisco which seem to have an infinite seating capacity. [sent-55, score-0.257]
24 2 Extending the CRP to hierarchies The CRP is amenable to mixture modeling because we can establish a one-to-one relationship between tables and mixture components and a one-to-many relationship between mixture components and data. [sent-58, score-0.532]
25 In the models that we will consider, however, each data point is associated with multiple mixture components which lie along a path in a hierarchy. [sent-59, score-0.213]
26 A nested Chinese restaurant process can be defined by imagining the following scenario. [sent-61, score-0.733]
27 Suppose that there are an infinite number of infinite-table Chinese restaurants in a city. [sent-62, score-0.169]
28 One restaurant is determined to be the root restaurant and on each of its infinite tables is a card with the name of another restaurant. [sent-63, score-1.094]
29 On each of the tables in those restaurants are cards that refer to other restaurants, and this structure repeats infinitely. [sent-64, score-0.236]
30 Each restaurant is referred to exactly once; thus, the restaurants in the city are organized into an infinitely-branched tree. [sent-65, score-0.66]
31 Note that each restaurant is associated with a level in this tree (e. [sent-66, score-0.629]
32 , the root restaurant is at level 1 and the restaurants it refers to are at level 2). [sent-68, score-0.822]
33 On the first evening, he enters the root Chinese restaurant and selects a table using Eq. [sent-70, score-0.594]
34 On the second evening, he goes to the restaurant identified on the first night’s table and chooses another table, again from Eq. [sent-72, score-0.525]
35 At the end of the trip, the tourist has sat at L restaurants which constitute a path from the root to a restaurant at the Lth level in the infinite tree described above. [sent-75, score-0.941]
36 After M tourists take L-day vacations, the collection of paths describe a particular L-level subtree of the infinite tree (see Figure 1a for an example of such a tree). [sent-76, score-0.224]
37 This prior can be used to model topic hierarchies. [sent-77, score-0.302]
38 Just as a standard CRP can be used to express uncertainty about a possible number of components, the nested CRP can be used to express uncertainty about possible L-level trees. [sent-78, score-0.21]
39 3 A hierarchical topic model Let us consider a data set composed of a corpus of documents. [sent-79, score-0.442]
40 Each document is a collection of words, where a word is an item in a vocabulary. [sent-80, score-0.262]
41 Our basic assumption is that the words in a document are generated according to a mixture model where the mixing proportions are random and document-specific. [sent-81, score-0.461]
42 These topics (one distribution for each possible value of z) are the basic mixture components in our model. [sent-83, score-0.317]
43 Temporarily assuming K possible topics in the corpus, an assumption that we will soon relax, z thus ranges over K possible values and θ is a K-dimensional K vector. [sent-85, score-0.181]
44 LDA is thus a twolevel generative process in which documents are associated with topic proportions, and the corpus is modeled as a Dirichlet distribution on these topic proportions. [sent-89, score-0.847]
45 We now describe an extension of this model in which the topics lie in a hierarchy. [sent-90, score-0.204]
46 A document is generated as follows: (1) choose a path from the root of the tree to a leaf; (2) draw a vector of topic proportions θ from an L-dimensional Dirichlet; (3) generate the words in the document from a mixture of the topics along the path from root to leaf, with mixing proportions θ. [sent-92, score-1.449]
47 Finally, we use the nested CRP to relax the assumption of a fixed tree structure. [sent-94, score-0.234]
48 As we have seen, the nested CRP can be used to place a prior on possible trees. [sent-95, score-0.216]
49 We also place a prior on the topics βi , each of which is associated with a restaurant in the infinite tree (in particular, we assume a symmetric Dirichlet with hyperparameter η). [sent-96, score-0.852]
50 A document is drawn by first choosing an L-level path through the restaurants and then drawing the words from the L topics which are associated with the restaurants along that path. [sent-97, score-0.865]
51 Note that all documents share the topic associated with the root restaurant. [sent-98, score-0.488]
52 , L}: (a) Draw a table from restaurant c −1 using Eq. [sent-105, score-0.525]
53 Set c to be the restaurant referred to by that table. [sent-107, score-0.491]
54 Draw an L-dimensional topic proportion vector θ from Dir(α). [sent-109, score-0.235]
55 (b) Draw wn from the topic associated with restaurant cz . [sent-118, score-0.768]
56 The node labeled T refers to a collection of an infinite number of L-level paths drawn from a nested CRP. [sent-120, score-0.314]
57 Given T , the cm, variables are deterministic—simply look up the th level of the mth path in the infinite collection of paths. [sent-121, score-0.189]
58 However, not having observed T , the distribution of c m, will be defined by the nested Chinese restaurant process, conditioned on all the c q, for q < m. [sent-122, score-0.683]
59 Its posterior path will depend, through the unobserved T , on the posterior paths of all the documents in the original corpus. [sent-129, score-0.353]
60 Subsequent new documents will also depend on the original corpus and any new documents which were observed before them. [sent-130, score-0.396]
61 (1), any new document can choose a previously unvisited restaurant at any level of the tree. [sent-132, score-0.746]
62 , even if we have a peaked posterior on T which has essentially selected a particular tree, a new document can change that hierarchy if its words provide justification for such a change. [sent-135, score-0.418]
63 In another variation of this model, we can consider a process that flattens the nested CRP into a standard CRP, but retains the idea that a tourist eats L meals. [sent-136, score-0.303]
64 That is, the tourist eats L times in a single restaurant under the constraint that he does not choose the same table twice. [sent-137, score-0.617]
65 In particular, it can be used as a prior for a flat LDA model in which each document can use at most L topics from the potentially infinite total set of topics. [sent-139, score-0.438]
66 4 Approximate inference by Gibbs sampling In this section, we describe a Gibbs sampling algorithm for sampling from the posterior nested CRP and corresponding topics in the hLDA model. [sent-141, score-0.555]
67 The Gibbs sampler provides a method for simultaneously exploring the parameter space (the particular topics of the corpus) and the model space (L-level trees). [sent-142, score-0.289]
68 The solid lines connect each restaurant to the restaurants referred to by its tables. [sent-144, score-0.66]
69 This illustrates a sample from the state space of the posterior nested CRP of Figure 1b for four documents. [sent-146, score-0.218]
70 (b) The graphical model representation of hierarchical LDA with a nested CRP prior. [sent-147, score-0.267]
71 We have separated the nested Chinese restaurant process from the topics. [sent-148, score-0.702]
72 The conditional distribution for cm , the L topics associated with document m, is: p(cm | w, c−m , z) ∝ p(wm | c, w−m , z)p(cm | c−m ), where w−m and c−m denote the w and c variables for all documents other than m. [sent-156, score-0.715]
73 This expression is an instance of Bayes’ rule with p(wm | c, w−m , z) as the likelihood of the data given a particular choice of cm and p(cm | c−m ) as the prior on cm implied by the nested CRP. [sent-157, score-0.448]
74 The set of possible values for cm corresponds to the union of the set of existing paths through the tree, equal to the number of leaves, with the set of possible novel paths, equal to the number of internal nodes. [sent-161, score-0.18]
75 (1) and the definition of a nested CRP in Section 2. [sent-163, score-0.172]
76 Each document has 1000 words from a 25 term vocabulary. [sent-166, score-0.249]
77 (b) The correct hierarchy found by the Gibbs sampler on this corpus. [sent-167, score-0.208]
78 We illustrate that the nested CRP process is feasible for learning text hierarchies in hLDA by using a contrived corpus on a small vocabulary. [sent-176, score-0.474]
79 We generated a corpus of 100 1000word documents from a three-level hierarchy with a vocabulary of 25 terms. [sent-177, score-0.474]
80 In this corpus, topics on the vocabulary can be viewed as bars on a 5 × 5 grid. [sent-178, score-0.256]
81 The root topic places its probability mass on the bottom bar. [sent-179, score-0.324]
82 On the second level, one topic is identified with the leftmost bar, while the rightmost bar represents a second topic. [sent-180, score-0.292]
83 The leftmost topic has two subtopics while the rightmost topic has one subtopic. [sent-181, score-0.539]
84 Figure 2b illustrates the recovered hierarchy using the Gibbs sampling algorithm described in Section 4. [sent-183, score-0.175]
85 In estimating hierarchy structures, hypothesis testing approaches to model selection are impractical since they do not provide a viable method of searching over the large space of trees. [sent-184, score-0.174]
86 We generated 210 corpora of 100 1000-word documents each from an LDA model with K ∈ {5, . [sent-186, score-0.219]
87 , 25}, L = 5, a vocabulary size of 100, and randomly generated mixture components from a symmetric Dirichlet (η = 0. [sent-189, score-0.213]
88 Each corpus has 100 documents of 1000 words from a vocabulary of 100 terms. [sent-204, score-0.388]
89 Figure 3 reports the results of sampling from the resulting posterior on trees with the Gibbs sampler from Section 4. [sent-205, score-0.208]
90 Using 1717 NIPS abstracts from 1987–1999 [14] with 208,896 words and a vocabulary of 1600 terms, we estimated a three level hierarchy as illustrated in Figure 5. [sent-209, score-0.344]
91 6 Summary We have presented the nested Chinese restaurant process, a distribution on hierarchical partitions. [sent-214, score-0.755]
92 We have shown that this process can be used as a nonparametric prior for a hierarchical extension to the latent Dirichlet allocation model. [sent-215, score-0.219]
93 The result is a flexible, general model for topic hierarchies that naturally accommodates growing data collections. [sent-216, score-0.422]
94 First, we have restricted ourselves to hierarchies of fixed depth L for simplicity, but it is straightforward to consider a model in which L can vary from document to document. [sent-219, score-0.346]
95 Each document is still a mixture of topics along a path in a hierarchy, but different documents can express paths of different lengths as they represent varying levels of specialization. [sent-220, score-0.743]
96 Second, although in our current model a document is associated with a single path, it is also natural to consider models in which documents are allowed to mix over paths. [sent-221, score-0.397]
97 Each node contains the top eight words from its corresponding topic distribution. [sent-225, score-0.316]
98 The cluster-abstraction model: Unsupervised learning of topic hierarchies from text data. [sent-239, score-0.386]
99 Generalized weighted Chinese restaurant processes for species sampling mixture models. [sent-254, score-0.659]
100 Markov chain sampling methods for Dirichlet process mixture models. [sent-258, score-0.183]
wordName wordTfidf (topN-words)
[('restaurant', 0.491), ('crp', 0.445), ('chinese', 0.292), ('topic', 0.235), ('document', 0.19), ('topics', 0.181), ('nested', 0.172), ('restaurants', 0.169), ('documents', 0.142), ('hierarchies', 0.133), ('hierarchy', 0.123), ('ncm', 0.123), ('cm', 0.116), ('corpus', 0.112), ('dirichlet', 0.107), ('hlda', 0.107), ('lda', 0.093), ('mixture', 0.092), ('sampler', 0.085), ('gibbs', 0.081), ('vocabulary', 0.075), ('hierarchical', 0.072), ('root', 0.069), ('customers', 0.067), ('paths', 0.064), ('tree', 0.062), ('seating', 0.061), ('tourist', 0.061), ('words', 0.059), ('path', 0.055), ('proportions', 0.054), ('wm', 0.053), ('abstracts', 0.053), ('sampling', 0.052), ('bayes', 0.05), ('leaf', 0.049), ('tourists', 0.046), ('posterior', 0.046), ('mth', 0.045), ('prior', 0.044), ('tables', 0.043), ('associated', 0.042), ('draw', 0.041), ('word', 0.041), ('process', 0.039), ('factors', 0.039), ('level', 0.034), ('blei', 0.034), ('table', 0.034), ('modeling', 0.032), ('corpora', 0.032), ('hyperparameter', 0.032), ('accommodates', 0.031), ('eats', 0.031), ('evening', 0.031), ('imagining', 0.031), ('sit', 0.031), ('sits', 0.031), ('subtopics', 0.031), ('unvisited', 0.031), ('collection', 0.031), ('nite', 0.03), ('partitions', 0.03), ('selection', 0.028), ('structures', 0.028), ('customer', 0.027), ('terminology', 0.027), ('ths', 0.027), ('latent', 0.026), ('refers', 0.025), ('trees', 0.025), ('components', 0.024), ('repeats', 0.024), ('species', 0.024), ('dimension', 0.024), ('variables', 0.024), ('grif', 0.023), ('model', 0.023), ('node', 0.022), ('generative', 0.022), ('generated', 0.022), ('simulated', 0.021), ('subtree', 0.021), ('nth', 0.021), ('nonparametric', 0.021), ('mixing', 0.021), ('mass', 0.02), ('syntactic', 0.02), ('integers', 0.02), ('distribution', 0.02), ('grow', 0.02), ('express', 0.019), ('leftmost', 0.019), ('challenges', 0.019), ('rightmost', 0.019), ('abstraction', 0.019), ('bar', 0.019), ('usage', 0.019), ('text', 0.018), ('allocation', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
Author: Thomas L. Griffiths, Michael I. Jordan, Joshua B. Tenenbaum, David M. Blei
Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
2 0.10163052 108 nips-2003-Learning a Distance Metric from Relative Comparisons
Author: Matthew Schultz, Thorsten Joachims
Abstract: This paper presents a method for learning a distance metric from relative comparison such as “A is closer to B than A is to C”. Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We show that such constraints lead to a convex quadratic programming problem that can be solved by adapting standard methods for SVM training. We empirically evaluate the performance and the modelling flexibility of the algorithm on a collection of text documents. 1
3 0.099638365 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures
Author: Alexander T. Ihler, Erik B. Sudderth, William T. Freeman, Alan S. Willsky
Abstract: The problem of approximating the product of several Gaussian mixture distributions arises in a number of contexts, including the nonparametric belief propagation (NBP) inference algorithm and the training of product of experts models. This paper develops two multiscale algorithms for sampling from a product of Gaussian mixtures, and compares their performance to existing methods. The first is a multiscale variant of previously proposed Monte Carlo techniques, with comparable theoretical guarantees but improved empirical convergence rates. The second makes use of approximate kernel density evaluation methods to construct a fast approximate sampler, which is guaranteed to sample points to within a tunable parameter of their true probability. We compare both multiscale samplers on a set of computational examples motivated by NBP, demonstrating significant improvements over existing methods. 1
4 0.080086395 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
Author: Mark Girolami, Ata Kabán
Abstract: To provide a compact generative representation of the sequential activity of a number of individuals within a group there is a tradeoff between the definition of individual specific and global models. This paper proposes a linear-time distributed model for finite state symbolic sequences representing traces of individual user activity by making the assumption that heterogeneous user behavior may be ‘explained’ by a relatively small number of common structurally simple behavioral patterns which may interleave randomly in a user-specific proportion. The results of an empirical study on three different sources of user traces indicates that this modelling approach provides an efficient representation scheme, reflected by improved prediction performance as well as providing lowcomplexity and intuitively interpretable representations.
5 0.070017792 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing
Author: Konrad P. Körding, Daniel M. Wolpert
Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.
6 0.058155585 169 nips-2003-Sample Propagation
7 0.05741943 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
8 0.056931704 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
9 0.052270286 73 nips-2003-Feature Selection in Clustering Problems
10 0.0504703 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
11 0.047680713 130 nips-2003-Model Uncertainty in Classical Conditioning
12 0.047188804 172 nips-2003-Semi-Supervised Learning with Trees
13 0.045216471 12 nips-2003-A Model for Learning the Semantics of Pictures
14 0.044797845 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering
15 0.042063151 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs
16 0.040969733 189 nips-2003-Tree-structured Approximations by Expectation Propagation
17 0.040525332 99 nips-2003-Kernels for Structured Natural Language Data
18 0.039311662 113 nips-2003-Learning with Local and Global Consistency
19 0.03874423 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models
20 0.038572878 59 nips-2003-Efficient and Robust Feature Extraction by Maximum Margin Criterion
topicId topicWeight
[(0, -0.123), (1, -0.018), (2, -0.005), (3, 0.056), (4, -0.037), (5, -0.014), (6, 0.026), (7, 0.021), (8, -0.0), (9, 0.033), (10, 0.031), (11, -0.05), (12, -0.036), (13, -0.008), (14, 0.138), (15, -0.002), (16, -0.138), (17, 0.05), (18, 0.122), (19, -0.057), (20, -0.03), (21, 0.029), (22, -0.117), (23, 0.118), (24, 0.011), (25, -0.033), (26, 0.06), (27, -0.102), (28, 0.199), (29, 0.028), (30, 0.038), (31, -0.054), (32, -0.035), (33, 0.08), (34, -0.054), (35, 0.024), (36, -0.135), (37, -0.041), (38, -0.076), (39, 0.029), (40, 0.058), (41, -0.119), (42, -0.011), (43, -0.01), (44, 0.048), (45, -0.042), (46, 0.181), (47, 0.069), (48, -0.028), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.94294918 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
Author: Thomas L. Griffiths, Michael I. Jordan, Joshua B. Tenenbaum, David M. Blei
Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
2 0.47846401 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering
Author: Benjamin M. Marlin
Abstract: In this paper we present a generative latent variable model for rating-based collaborative filtering called the User Rating Profile model (URP). The generative process which underlies URP is designed to produce complete user rating profiles, an assignment of one rating to each item for each user. Our model represents each user as a mixture of user attitudes, and the mixing proportions are distributed according to a Dirichlet random variable. The rating for each item is generated by selecting a user attitude for the item, and then selecting a rating according to the preference pattern associated with that attitude. URP is related to several models including a multinomial mixture model, the aspect model [7], and LDA [1], but has clear advantages over each. 1
3 0.46420601 161 nips-2003-Probabilistic Inference in Human Sensorimotor Processing
Author: Konrad P. Körding, Daniel M. Wolpert
Abstract: When we learn a new motor skill, we have to contend with both the variability inherent in our sensors and the task. The sensory uncertainty can be reduced by using information about the distribution of previously experienced tasks. Here we impose a distribution on a novel sensorimotor task and manipulate the variability of the sensory feedback. We show that subjects internally represent both the distribution of the task as well as their sensory uncertainty. Moreover, they combine these two sources of information in a way that is qualitatively predicted by optimal Bayesian processing. We further analyze if the subjects can represent multimodal distributions such as mixtures of Gaussians. The results show that the CNS employs probabilistic models during sensorimotor learning even when the priors are multimodal.
4 0.45151162 108 nips-2003-Learning a Distance Metric from Relative Comparisons
Author: Matthew Schultz, Thorsten Joachims
Abstract: This paper presents a method for learning a distance metric from relative comparison such as “A is closer to B than A is to C”. Taking a Support Vector Machine (SVM) approach, we develop an algorithm that provides a flexible way of describing qualitative training data as a set of constraints. We show that such constraints lead to a convex quadratic programming problem that can be solved by adapting standard methods for SVM training. We empirically evaluate the performance and the modelling flexibility of the algorithm on a collection of text documents. 1
5 0.44959012 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures
Author: Alexander T. Ihler, Erik B. Sudderth, William T. Freeman, Alan S. Willsky
Abstract: The problem of approximating the product of several Gaussian mixture distributions arises in a number of contexts, including the nonparametric belief propagation (NBP) inference algorithm and the training of product of experts models. This paper develops two multiscale algorithms for sampling from a product of Gaussian mixtures, and compares their performance to existing methods. The first is a multiscale variant of previously proposed Monte Carlo techniques, with comparable theoretical guarantees but improved empirical convergence rates. The second makes use of approximate kernel density evaluation methods to construct a fast approximate sampler, which is guaranteed to sample points to within a tunable parameter of their true probability. We compare both multiscale samplers on a set of computational examples motivated by NBP, demonstrating significant improvements over existing methods. 1
6 0.4366188 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
7 0.43396026 172 nips-2003-Semi-Supervised Learning with Trees
8 0.42948821 24 nips-2003-An Iterative Improvement Procedure for Hierarchical Clustering
9 0.38723657 196 nips-2003-Wormholes Improve Contrastive Divergence
10 0.38498977 130 nips-2003-Model Uncertainty in Classical Conditioning
11 0.35236916 118 nips-2003-Link Prediction in Relational Data
12 0.34760553 79 nips-2003-Gene Expression Clustering with Functional Mixture Models
13 0.32246819 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
14 0.31536907 169 nips-2003-Sample Propagation
15 0.31406185 138 nips-2003-Non-linear CCA and PCA by Alignment of Local Models
16 0.29356778 77 nips-2003-Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data
17 0.28444499 99 nips-2003-Kernels for Structured Natural Language Data
18 0.28382263 136 nips-2003-New Algorithms for Efficient High Dimensional Non-parametric Classification
19 0.27972838 71 nips-2003-Fast Embedding of Sparse Similarity Graphs
20 0.27540776 25 nips-2003-An MCMC-Based Method of Comparing Connectionist Models in Cognitive Science
topicId topicWeight
[(0, 0.041), (11, 0.025), (29, 0.024), (30, 0.025), (34, 0.01), (35, 0.037), (53, 0.076), (63, 0.01), (71, 0.099), (76, 0.046), (85, 0.065), (91, 0.092), (94, 0.314), (99, 0.036)]
simIndex simValue paperId paperTitle
1 0.82721639 19 nips-2003-Algorithms for Interdependent Security Games
Author: Michael Kearns, Luis E. Ortiz
Abstract: unkown-abstract
same-paper 2 0.78665411 83 nips-2003-Hierarchical Topic Models and the Nested Chinese Restaurant Process
Author: Thomas L. Griffiths, Michael I. Jordan, Joshua B. Tenenbaum, David M. Blei
Abstract: We address the problem of learning topic hierarchies from data. The model selection problem in this domain is daunting—which of the large collection of possible trees to use? We take a Bayesian approach, generating an appropriate prior via a distribution on partitions that we refer to as the nested Chinese restaurant process. This nonparametric prior allows arbitrarily large branching factors and readily accommodates growing data collections. We build a hierarchical topic model by combining this prior with a likelihood that is based on a hierarchical variant of latent Dirichlet allocation. We illustrate our approach on simulated data and with an application to the modeling of NIPS abstracts. 1
3 0.69418955 104 nips-2003-Learning Curves for Stochastic Gradient Descent in Linear Feedforward Networks
Author: Justin Werfel, Xiaohui Xie, H. S. Seung
Abstract: Gradient-following learning methods can encounter problems of implementation in many applications, and stochastic variants are frequently used to overcome these difficulties. We derive quantitative learning curves for three online training methods used with a linear perceptron: direct gradient descent, node perturbation, and weight perturbation. The maximum learning rate for the stochastic methods scales inversely with the first power of the dimensionality of the noise injected into the system; with sufficiently small learning rate, all three methods give identical learning curves. These results suggest guidelines for when these stochastic methods will be limited in their utility, and considerations for architectures in which they will be effective. 1
4 0.48912969 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions
Author: Tong Zhang
Abstract: In this paper we obtain convergence bounds for the concentration of Bayesian posterior distributions (around the true distribution) using a novel method that simplifies and enhances previous results. Based on the analysis, we also introduce a generalized family of Bayesian posteriors, and show that the convergence behavior of these generalized posteriors is completely determined by the local prior structure around the true distribution. This important and surprising robustness property does not hold for the standard Bayesian posterior in that it may not concentrate when there exist “bad” prior structures even at places far away from the true distribution. 1
5 0.48852438 189 nips-2003-Tree-structured Approximations by Expectation Propagation
Author: Yuan Qi, Tom Minka
Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1
6 0.48493436 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints
7 0.48276764 109 nips-2003-Learning a Rare Event Detection Cascade by Direct Feature Selection
8 0.48260787 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images
9 0.48191071 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications
10 0.48137701 158 nips-2003-Policy Search by Dynamic Programming
11 0.48107955 41 nips-2003-Boosting versus Covering
12 0.48072863 117 nips-2003-Linear Response for Approximate Inference
13 0.48071983 30 nips-2003-Approximability of Probability Distributions
14 0.48034108 78 nips-2003-Gaussian Processes in Reinforcement Learning
15 0.48019093 126 nips-2003-Measure Based Regularization
16 0.47978693 147 nips-2003-Online Learning via Global Feedback for Phrase Recognition
17 0.47952235 121 nips-2003-Log-Linear Models for Label Ranking
18 0.47938669 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
19 0.47891331 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering
20 0.47827977 145 nips-2003-Online Classification on a Budget