nips nips2011 nips2011-115 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. [sent-8, score-0.369]
2 Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. [sent-9, score-0.341]
3 The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. [sent-10, score-0.686]
4 To share topics across the tree nodes, topic distributions are drawn from a Dirichlet process. [sent-12, score-0.691]
5 As a demonstration of this concept, we analyze real data on course selections of undergraduate students at Duke University, with the goal of uncovering and concisely representing structure in the curriculum and in the characteristics of the student body. [sent-13, score-0.583]
6 1 Introduction As time progresses, the choices humans make often change. [sent-14, score-0.101]
7 For example, the types of products one purchases, as well as the types of people one befriends, often change or evolve with time. [sent-15, score-0.119]
8 However, the choices one makes later in life are typically statistically related to choices made earlier. [sent-16, score-0.247]
9 In this paper we seek to develop a hierarchical tree structure for representation of this phenomena, with each tree path characteristic of a type of person, and a tree node defined by a distribution over choices (characterizing a type of person at a “stage of life”). [sent-18, score-0.883]
10 As time proceeds, each person moves along layers of a tree branch, making choices based on the node at a given layer, thereby yielding a hierarchical representation of behavior with time. [sent-19, score-0.635]
11 Note that as one moves deeper in the tree, the number of nodes at a given tree layer increases as a result of sequential branching; this appears to be well matched to the modeling of choices made by individuals, who often become more distinctive and specialized with increasing time. [sent-20, score-0.617]
12 To demonstrate this concept using real data, we consider selections of classes made by undergraduate students at Duke University, with the goal of uncovering structure in the students and classes, as inferred by time-evolving student choices. [sent-22, score-0.959]
13 For each student, the data presented to the model are a set of indices of selected classes (but not class names or subject matter), as well as the academic year in which each class was selected (e. [sent-23, score-0.19]
14 While the student majors and class names are not used by the model, they are known, and this information provides “truth” with which model-inferred structure may be assessed. [sent-26, score-0.238]
15 We seek to impose that as time progresses it is more probable that an individual’s choices are based on nodes deeper in the tree, so that as one moves from the tree root to the leaves, we observe the evolution of choices as people age. [sent-28, score-0.686]
16 The basic form of the nonparametric tree developed here is based on the the nested Chinese restaurant process (nCRP) topic model [4, 5]. [sent-36, score-0.574]
17 We develop a new “change-point” stick-breaking process (cpSBP), which is a stick-breaking process that induces probabilities that stochastically increase to an unknown changepoint and then decrease. [sent-38, score-0.124]
18 In the proposed model each individual has a unique cpSBP, that evolves with time such that choices at later times are encouraged to be associated with deeper nodes in the tree. [sent-40, score-0.316]
19 Another novel aspect of the proposed model concerns drawing the node-dependent topics from a Dirichlet process, sharing topics across the tree. [sent-42, score-0.634]
20 This is motivated by the idea that different types of people (paths) may be characterized by similar choices at different nodes in the respective paths (e. [sent-43, score-0.316]
21 , person Type A may make certain types of choices early in life, while person Type B may make similar choices later in life). [sent-45, score-0.304]
22 Such sharing of topics allows the inference of relationships between choices different people make over time. [sent-46, score-0.519]
23 1 Model Formulation Nested Chinese Restaurant Process The nested Chinese restaurant process (nCRP) [4, 5] is a generative probabilistic model that defines a prior distribution over a tree-structured hierarchy with infinitely many paths. [sent-48, score-0.236]
24 In an nCRP model of personal choice each individual picks a tree path by walking from the root node down the tree, from node to node. [sent-49, score-0.574]
25 This process is defined recursively such that each individual is allocated to one specific path of the tree hierarchy, through a sequence of probabilistic parent-child steps. [sent-51, score-0.367]
26 The tree hierarchy implied by the nCRP provides a natural framework to capture the structure of personal choices, where each node is characterized by a distribution on the items that may be selected (e. [sent-52, score-0.361]
27 However, there are unique aspects of the time-evolving personal-choice problem, particularly the goal motivated above that one should select topics deeper in the tree as time evolves, to capture the specialized characteristics of people as they age. [sent-56, score-0.613]
28 2 Change Point Stick Breaking Process In the proposed model, instead of constraining the SBP construction to start at the root node, we model the starting-point depth of the SBP as a random variable and infer it from data, while still maintaining a valid distribution over each layer of any path. [sent-59, score-0.261]
29 To do this we replace the single SBP over path layers with two statistically dependent SBPs: one begins from layer p + 1 and moves down 2 0. [sent-60, score-0.35]
30 1 0 5 10 15 20 25 30 Index 35 40 45 50 55 60 Figure 1: Illustrative comparison of the stick lengths between change point stick breaking process (cpSBP) and stick breaking process (SBP) with different value of α; typical draws from cpSBP and SBP are depicted. [sent-65, score-1.16]
31 aω and bω are both set to 1, the change point is set to p = 30 and the truncation of both stick breaking constructions are set to 60. [sent-66, score-0.463]
32 the tree away from the root, and the other begins from layer p and moves upward to the root; the latter SBP is truncated when it hits the root, while the former is in principle of infinite length. [sent-67, score-0.383]
33 The tree depth p which relates these two SBPs is modeled as a random variable, drawn from a Poisson distribution, and is denoted the change point. [sent-68, score-0.262]
34 In this way we encourage the principal stick weight to be placed heavily around the change point p, instead of restricting it to the top layers as in [4, 7]. [sent-69, score-0.402]
35 To model the time dependence, and encourage use of greater tree depths with increasing time, we seek a formulation that imposes that the Poisson parameter grows (statistically) with increasing time. [sent-70, score-0.17]
36 The temporal information is represented as covariate t(i, n), denoting the time at which the the nth selection/choice is made by individual i; in many applications t(i, n) ∈ {1, . [sent-71, score-0.116]
37 , T }, and for the student class-selection problem T = 4, corresponding to the freshman through senior years; below we drop the indices (i, n) on the time, for notational simplicity. [sent-74, score-0.324]
38 When individual i makes selections at time t, she employs a corresponding change point pi,t . [sent-75, score-0.154]
39 To integrate the temporal covariate into the model, we develop a product-of-gammas and Poission conjugate pair to model pi,t which encourages pi,t associated with larger t to locate deeper in the tree. [sent-76, score-0.133]
40 Specifically, consider t γi,l = Ga(γi,l |ai,l , bi,l ), λi,t = pi,t ∼ Poi(pi,t |λi,t ) γi,l , (1) l=1 The product-of-gammas construction in (1) is inspired by the multiplicative-gamma process (MGP) developed in [3] for sparse factor analysis. [sent-77, score-0.11]
41 p Given the change point pi,t = p, the cpSBP constructs the stick-weight vector θi,t over layers of p p ˆ ˜ path bi by dividing it to into two parts: θi,t and θi,t , modeling them separately as two SBPs, where ˆp = {θp (p), θp (p − 1), . [sent-80, score-0.233]
42 On the ˆp other hand when u ≤ p the stick weight θi,t (u) is constructed “backwards” from p to the root node, which is a truncated stick breaking process with truncation level set to p. [sent-88, score-0.749]
43 A thorough discussion of the truncated stick breaking process is found in [10]. [sent-89, score-0.48]
44 We further use a beta distributed latent variable ωbi to combine the two stick breaking process together while ensuring each element of p ˆp ˜p θi,t = {θi,t , θi,t } sums to one. [sent-90, score-0.479]
45 From the simulation studied in Figure 1, we observe that the change point, which is modeled through the temporal covariate t as in (1), corresponds to the layer with large stick weight and thus at which topic draws are most probable. [sent-92, score-0.661]
46 Also note that one may alternatively suggest simply using pi,t directly as the layer from which a topic is drawn, without the subsequent use of a cpSBP. [sent-93, score-0.341]
47 3 Sharing topics among different nodes One problem with the nCRP-based topic model, implied by the tree structure, is that all descendent sub-topics from parent node pa1 are distinct from the descendants of parent pa2 , if pa1 = pa2 . [sent-97, score-0.932]
48 Since we encourage the later choice selections to be drawn from topics deeper in the tree, redundant topics at multiple layers may be manifested if two types of people tend to make similar choices at different time points (e. [sent-100, score-1.003]
49 Thus it is likely that similar (redundant) topics may be learned on different layers of the tree, and the inability of the original nCRP construction to share these topics misses another opportunity to share statistical strength. [sent-103, score-0.798]
50 In [7, 13, 6] the authors addressed related challenges by replacing the tree structure with a directed acyclic graph (DAG), demonstrating success for document modeling. [sent-104, score-0.17]
51 However, those solutions don’t have the flexibility of sharing topics on nodes among different layers. [sent-105, score-0.431]
52 Here we propose a new and simpler approach, so that the nCRP-based tree hierarchy is retained, while different nodes in the whole tree may share the same topic, resolving the two problems discussed above. [sent-106, score-0.49]
53 To achieve this ˆ we draw a set of “global” topics {φk }, and a stick-breaking process is employed to allocate one of these global topics as φj , representing the jth node in the tree (this corresponds to drawing the {φj } from a Dirichlet process [16], with a Dirichlet distribution base). [sent-107, score-0.966]
54 Similar to [4], the per-node topic parameters φn p are marginalized out. [sent-110, score-0.205]
55 , the retrospective sampler [14] proposed for Dirichlet process hierarchical models. [sent-117, score-0.154]
56 p p(li,t |θi,t , ωi,t ) Sampling choice layer allocation li,n Given all the other variables, now we sample the layer allocation li,n for the nth choice made by individual i. [sent-124, score-0.436]
57 Denote ci,n as the nth choice made by individual i, Mzi,n ,ci,n = #[z−(i,n) = zi,n , c−(i,n) = ci,n ] + β as the smoothed count of seeing choice ci,n allocated to global topic zi,n , excluding the current choice. [sent-125, score-0.319]
58 One may also consider other alternatives for learning the hyperparameters within topic models [19]. [sent-128, score-0.205]
59 For computational reasons 5 the cpSBP and SBP employed over the tree-path depth are truncated to a maximum of 10 layers (beyond this depth the number of topics employed by the data was minimal), while the number children of each parent node is allowed to be unbounded. [sent-134, score-0.671]
60 We analyze the quality of the learned models using the remaining data (classes of 2011-2013), characterized by 5171 students and 2972 unique classes. [sent-136, score-0.349]
61 Each topic is a probabilistic vector defined over 3015 classes offered across all years. [sent-137, score-0.274]
62 2990 Table 1: Predictive log-likelihood comparison on two datasets, given the mean of number of topics and nodes learned with rounded standard deviation. [sent-148, score-0.359]
63 Compared to nCRP, the cpSBP-nCRP replaced SBP with the proposed cpSBP, while in DP-nCRP the draw of topic for each node is from Dirichlet process(DP) instead of Dirichlet distribution and retained the SBP construction in nCRP. [sent-150, score-0.363]
64 4 4 4 x 10 4 Freshman Sophomore Junior Senior 3 x 10 Freshman Sophomore Junior Senior 3 2 2 1 1 0 1 2 3 4 5 Layer 6 7 8 9 0 10 1 2 3 4 5 Layer 6 7 8 9 10 Figure 2: Histograms of class layer allocations according to their time covariates. [sent-153, score-0.205]
65 Left: Stick breaking process, Right: Change point stick breaking process In this section we examine the model’s ability to explain unseen data. [sent-154, score-0.584]
66 For comparison consistency we computed the predictive log-likelihood based on the samples collected in the same way as [4] (alternatives means of evaluating topic models are discussed in [20]). [sent-155, score-0.239]
67 We test the model using two different compositions of the data, the first is based on class selection history of students from class of 2011 (1696 students), where all 4 years of records are available. [sent-156, score-0.382]
68 The second is based on class selection history of students from class of 2011 to 2013 (3475 students), where for the later two years only partial course selection information is available, e. [sent-157, score-0.382]
69 , for students from class of 2013 only class selection choices made in freshman year are available. [sent-159, score-0.63]
70 Additionally, we compare the different models with respect to the learned number of topics and the learned number of tree nodes. [sent-160, score-0.451]
71 This comparison is an indicator of the level of “parsimony” of the proposed model, introduced by replacing independent draws of topics from a Dirichlet distribution by draws from a Dirichlet process (with Dirichlet distribution base), as explained in Section 2. [sent-161, score-0.343]
72 Since the number of tree nodes grows exponentially with the number of tree layers, from a practical viewpoint sharing topics among the nodes saves memory used to store the topic vectors, whose dimension is typically large (here the number of classes: 3015). [sent-163, score-1.054]
73 In addition to the above insight, as the experimental results indicates, sharing topics among different nodes can enhance the sharing of statistical strength, which leads to better predictive performance. [sent-164, score-0.537]
74 9 1 Figure 3: Change of the proportion of important majors along the layers of two paths which share the nodes up to the second layer. [sent-187, score-0.458]
75 These two paths correspond to the full versions (all 7 layers) of the top two paths in Figure 4. [sent-188, score-0.144]
76 the latent temporal dynamics within the data by the change point stick breaking process (cpSBP). [sent-190, score-0.492]
77 To demonstrate this point, in Figure 2 we compare how cpSBP and SBP guided the class layer allocations which have associated time covariates (e. [sent-191, score-0.242]
78 From Figure 2 we observe that under spSBP, as the students’ academic career advances, they are more probable to choose classes from topics deeper in the tree, while such pattern is less obvious in the SBP case. [sent-194, score-0.508]
79 Further, spSBP encouraged the data to utilize more layers in the tree than SBP. [sent-195, score-0.288]
80 3 Analyzing the learned tree With incorporation of time covariates, we examine if the uncovered hierarchical structure is consistent with the actual curriculum of students from their freshman to senior year. [sent-197, score-0.852]
81 The tree structured hierarchy captured the general trend of class selection within/across different majors. [sent-203, score-0.207]
82 In Figure 4 we also highlight a topic in red, shared by two nodes. [sent-204, score-0.205]
83 Note that the sharing of topics between nodes of different layers is a unique aspect of this model, not possible in [7, 13, 6]. [sent-207, score-0.549]
84 In the second analysis we examine how the majors of students are distributed in the learned tree; the ideal case would be that each tree path corresponds to an academic major, and the nodes shared by paths manifest sharing of topics between different but related majors. [sent-208, score-1.299]
85 In Figure 3 we show the change of proportions of different majors among different layers of the top two paths in Figure 4 (this is a zoom-in of a much larger tree). [sent-209, score-0.399]
86 For a clear illustration, we show the seven most popular majors for these paths as a function of time (out of a total of 80 majors), and the remaining 73 majors are group together. [sent-210, score-0.382]
87 We observe that the students with mechanical engineering (ME) majors share the node on the second layer with students with a computer science (CS) major, and the layers deeper in the tree begin to be exclusive to students with CS and ME majors, respectively. [sent-211, score-1.903]
88 This phenomenon corresponds to the process a student determining her major by choosing courses as she walks down tree path. [sent-212, score-0.367]
89 This also matches the fact that in this university, students declare their major during the sophomore year. [sent-213, score-0.418]
90 Each node shows two aggregated statistics at that node: the eight most common classes of the topic on that node and a histogram of the academic year the topic was selected by students (1-4, for freshman-senior). [sent-217, score-1.169]
91 The columns in each of the histogram correspond to freshman to senior year from left to right. [sent-218, score-0.301]
92 The two highlighted red nodes share the same topic. [sent-219, score-0.113]
93 5 Discussion We have extended hierarchical topic models to an important problem that has received limited attention to date: the evolution of personal choices over time. [sent-221, score-0.4]
94 Specifically, we develop a change-point stick-breaking process, coupled with a product of gammas and Poisson construction, that encourages individuals to be represented by nodes deeper in the tree as time evolves. [sent-223, score-0.345]
95 The Dirichlet process has also been used to design the node-dependent topics, sharing strength and inferring relationships between choices of different people over time. [sent-224, score-0.3]
96 The framework has been successfully demonstrated with a real-world data set: selection of courses over many years, for students at Duke University. [sent-225, score-0.401]
97 Although we worked only on one specific real-world data set, there are many other examples for which such a model may be of interest, especially when the data correspond to a sparse set of choices over time. [sent-226, score-0.101]
98 , the clothing choices of people as they advance from teen years to adulthood). [sent-229, score-0.166]
99 The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. [sent-261, score-0.404]
100 Hierarchical topic models and the nested Chinese restaurant process. [sent-272, score-0.342]
wordName wordTfidf (topN-words)
[('students', 0.349), ('cpsbp', 0.292), ('topics', 0.281), ('sbp', 0.272), ('stick', 0.23), ('topic', 0.205), ('tree', 0.17), ('intro', 0.155), ('majors', 0.155), ('ncrp', 0.151), ('breaking', 0.146), ('theater', 0.137), ('layer', 0.136), ('senior', 0.121), ('freshman', 0.12), ('layers', 0.118), ('node', 0.11), ('dirichlet', 0.104), ('choices', 0.101), ('deeper', 0.097), ('student', 0.083), ('pol', 0.083), ('duke', 0.083), ('nodes', 0.078), ('vh', 0.075), ('politics', 0.075), ('restaurant', 0.072), ('sharing', 0.072), ('paths', 0.072), ('chinese', 0.07), ('classes', 0.069), ('allocations', 0.069), ('sophomore', 0.069), ('people', 0.065), ('nested', 0.065), ('process', 0.062), ('path', 0.061), ('academic', 0.061), ('year', 0.06), ('selections', 0.06), ('poisson', 0.057), ('change', 0.054), ('dance', 0.052), ('courses', 0.052), ('sbps', 0.052), ('person', 0.051), ('hierarchical', 0.05), ('undergraduate', 0.049), ('construction', 0.048), ('chemistry', 0.045), ('life', 0.045), ('personal', 0.044), ('parent', 0.044), ('allocation', 0.042), ('truncated', 0.042), ('struct', 0.042), ('curriculum', 0.042), ('calculus', 0.042), ('retrospective', 0.042), ('beta', 0.041), ('nth', 0.04), ('dunson', 0.04), ('individual', 0.04), ('root', 0.039), ('comp', 0.039), ('psychology', 0.039), ('depth', 0.038), ('covariates', 0.037), ('hierarchy', 0.037), ('covariate', 0.036), ('earth', 0.035), ('moves', 0.035), ('mechanical', 0.035), ('cs', 0.035), ('share', 0.035), ('analy', 0.034), ('bio', 0.034), ('campaigns', 0.034), ('econ', 0.034), ('elections', 0.034), ('electromagnet', 0.034), ('era', 0.034), ('individ', 0.034), ('magnet', 0.034), ('magnetism', 0.034), ('poli', 0.034), ('pps', 0.034), ('prog', 0.034), ('pub', 0.034), ('repertory', 0.034), ('spsbp', 0.034), ('systm', 0.034), ('tennessee', 0.034), ('umbrella', 0.034), ('wars', 0.034), ('allocated', 0.034), ('predictive', 0.034), ('introductory', 0.033), ('constructions', 0.033), ('history', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
2 0.27396172 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
3 0.21570884 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
4 0.20014231 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
5 0.14528532 156 nips-2011-Learning to Learn with Compound HD Models
Author: Antonio Torralba, Joshua B. Tenenbaum, Ruslan Salakhutdinov
Abstract: We introduce HD (or “Hierarchical-Deep”) models, a new compositional learning architecture that integrates deep learning models with structured hierarchical Bayesian models. Specifically we show how we can learn a hierarchical Dirichlet process (HDP) prior over the activities of the top-level features in a Deep Boltzmann Machine (DBM). This compound HDP-DBM model learns to learn novel concepts from very few training examples, by learning low-level generic features, high-level features that capture correlations among low-level features, and a category hierarchy for sharing priors over the high-level features that are typical of different kinds of concepts. We present efficient learning and inference algorithms for the HDP-DBM model and show that it is able to learn new concepts from very few examples on CIFAR-100 object recognition, handwritten character recognition, and human motion capture datasets. 1
6 0.13132787 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
7 0.12924941 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
8 0.12528235 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
9 0.092049271 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
10 0.091631807 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
11 0.087810077 226 nips-2011-Projection onto A Nonnegative Max-Heap
12 0.083060928 285 nips-2011-The Kernel Beta Process
13 0.07766813 244 nips-2011-Selecting Receptive Fields in Deep Networks
14 0.075121328 134 nips-2011-Infinite Latent SVM for Classification and Multi-task Learning
15 0.071154773 221 nips-2011-Priors over Recurrent Continuous Time Processes
16 0.068971999 60 nips-2011-Confidence Sets for Network Structure
17 0.065044045 131 nips-2011-Inference in continuous-time change-point models
18 0.063946187 250 nips-2011-Shallow vs. Deep Sum-Product Networks
19 0.063621283 104 nips-2011-Generalized Beta Mixtures of Gaussians
20 0.063409366 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries
topicId topicWeight
[(0, 0.173), (1, 0.076), (2, -0.024), (3, 0.054), (4, -0.037), (5, -0.404), (6, 0.117), (7, 0.093), (8, -0.131), (9, -0.074), (10, 0.068), (11, 0.112), (12, -0.101), (13, 0.049), (14, 0.002), (15, 0.02), (16, 0.046), (17, -0.033), (18, -0.013), (19, -0.005), (20, 0.024), (21, -0.079), (22, -0.066), (23, 0.004), (24, -0.081), (25, -0.025), (26, -0.083), (27, 0.005), (28, -0.044), (29, -0.008), (30, -0.008), (31, 0.058), (32, 0.024), (33, 0.009), (34, 0.03), (35, -0.0), (36, 0.01), (37, 0.041), (38, 0.009), (39, 0.031), (40, 0.001), (41, 0.016), (42, 0.025), (43, -0.046), (44, 0.022), (45, 0.001), (46, -0.05), (47, 0.061), (48, 0.002), (49, -0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.96041721 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
2 0.79773366 281 nips-2011-The Doubly Correlated Nonparametric Topic Model
Author: Dae I. Kim, Erik B. Sudderth
Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual specification of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the first model to simultaneously capture all three of these properties. The DCNT models metadata via a flexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1
3 0.77200186 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation
Author: David Sontag, Dan Roy
Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of finding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, finding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of finding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we briefly discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1
4 0.77127302 129 nips-2011-Improving Topic Coherence with Regularized Topic Models
Author: David Newman, Edwin V. Bonilla, Wray Buntine
Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1
5 0.69397807 110 nips-2011-Group Anomaly Detection using Flexible Genre Models
Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider
Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1
6 0.65846324 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
7 0.57338768 156 nips-2011-Learning to Learn with Compound HD Models
8 0.54225129 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations
9 0.46685779 14 nips-2011-A concave regularization technique for sparse mixture models
10 0.44006655 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection
11 0.43541628 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm
12 0.40249407 226 nips-2011-Projection onto A Nonnegative Max-Heap
13 0.39953336 177 nips-2011-Multi-armed bandits on implicit metric spaces
14 0.38199782 274 nips-2011-Structure Learning for Optimization
15 0.38050538 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
16 0.3724429 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure
17 0.37030104 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness
18 0.36254859 221 nips-2011-Priors over Recurrent Continuous Time Processes
19 0.36250368 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition
20 0.34840167 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression
topicId topicWeight
[(0, 0.041), (4, 0.041), (5, 0.306), (20, 0.035), (26, 0.015), (31, 0.088), (33, 0.035), (43, 0.046), (45, 0.086), (57, 0.053), (65, 0.019), (74, 0.066), (83, 0.033), (84, 0.032), (99, 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.77014869 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices
Author: Xianxing Zhang, Lawrence Carin, David B. Dunson
Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point
2 0.76385248 52 nips-2011-Clustering via Dirichlet Process Mixture Models for Portable Skill Discovery
Author: Scott Niekum, Andrew G. Barto
Abstract: Skill discovery algorithms in reinforcement learning typically identify single states or regions in state space that correspond to task-specific subgoals. However, such methods do not directly address the question of how many distinct skills are appropriate for solving the tasks that the agent faces. This can be highly inefficient when many identified subgoals correspond to the same underlying skill, but are all used individually as skill goals. Furthermore, skills created in this manner are often only transferable to tasks that share identical state spaces, since corresponding subgoals across tasks are not merged into a single skill goal. We show that these problems can be overcome by clustering subgoal data defined in an agent-space and using the resulting clusters as templates for skill termination conditions. Clustering via a Dirichlet process mixture model is used to discover a minimal, sufficient collection of portable skills. 1
3 0.48216179 112 nips-2011-Heavy-tailed Distances for Gradient Based Image Descriptors
Author: Yangqing Jia, Trevor Darrell
Abstract: Many applications in computer vision measure the similarity between images or image patches based on some statistics such as oriented gradients. These are often modeled implicitly or explicitly with a Gaussian noise assumption, leading to the use of the Euclidean distance when comparing image descriptors. In this paper, we show that the statistics of gradient based image descriptors often follow a heavy-tailed distribution, which undermines any principled motivation for the use of Euclidean distances. We advocate for the use of a distance measure based on the likelihood ratio test with appropriate probabilistic models that fit the empirical data distribution. We instantiate this similarity measure with the Gammacompound-Laplace distribution, and show significant improvement over existing distance measures in the application of SIFT feature matching, at relatively low computational cost. 1
4 0.47665623 43 nips-2011-Bayesian Partitioning of Large-Scale Distance Data
Author: David Adametz, Volker Roth
Abstract: A Bayesian approach to partitioning distance matrices is presented. It is inspired by the Translation-invariant Wishart-Dirichlet process (TIWD) in [1] and shares a number of advantageous properties like the fully probabilistic nature of the inference model, automatic selection of the number of clusters and applicability in semi-supervised settings. In addition, our method (which we call fastTIWD) overcomes the main shortcoming of the original TIWD, namely its high computational costs. The fastTIWD reduces the workload in each iteration of a Gibbs sampler from O(n3 ) in the TIWD to O(n2 ). Our experiments show that the cost reduction does not compromise the quality of the inferred partitions. With this new method it is now possible to ‘mine’ large relational datasets with a probabilistic model, thereby automatically detecting new and potentially interesting clusters. 1
5 0.47579101 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs
Author: Armen Allahverdyan, Aram Galstyan
Abstract: We present an asymptotic analysis of Viterbi Training (VT) and contrast it with a more conventional Maximum Likelihood (ML) approach to parameter estimation in Hidden Markov Models. While ML estimator works by (locally) maximizing the likelihood of the observed data, VT seeks to maximize the probability of the most likely hidden state sequence. We develop an analytical framework based on a generating function formalism and illustrate it on an exactly solvable model of HMM with one unambiguous symbol. For this particular model the ML objective function is continuously degenerate. VT objective, in contrast, is shown to have only finite degeneracy. Furthermore, VT converges faster and results in sparser (simpler) models, thus realizing an automatic Occam’s razor for HMM learning. For more general scenario VT can be worse compared to ML but still capable of correctly recovering most of the parameters. 1
6 0.47547624 156 nips-2011-Learning to Learn with Compound HD Models
7 0.47377929 219 nips-2011-Predicting response time and error rates in visual search
8 0.47236499 266 nips-2011-Spatial distance dependent Chinese restaurant processes for image segmentation
9 0.47209224 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation
10 0.47087032 235 nips-2011-Recovering Intrinsic Images with a Global Sparsity Prior on Reflectance
11 0.47031543 276 nips-2011-Structured sparse coding via lateral inhibition
12 0.46982646 227 nips-2011-Pylon Model for Semantic Segmentation
13 0.46775472 258 nips-2011-Sparse Bayesian Multi-Task Learning
14 0.46775061 66 nips-2011-Crowdclustering
15 0.46712959 273 nips-2011-Structural equations and divisive normalization for energy-dependent component analysis
16 0.46649691 180 nips-2011-Multiple Instance Filtering
17 0.46631601 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
18 0.46557063 197 nips-2011-On Tracking The Partition Function
19 0.46546054 68 nips-2011-Demixed Principal Component Analysis
20 0.46347895 301 nips-2011-Variational Gaussian Process Dynamical Systems