nips nips2010 nips2010-150 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: America Chambers, Padhraic Smyth, Mark Steyvers
Abstract: We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. [sent-8, score-0.392]
2 Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. [sent-9, score-0.229]
3 The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. [sent-10, score-0.714]
4 Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. [sent-12, score-0.236]
5 1 Introduction We present a generative probabilistic model for learning concept graphs from text. [sent-15, score-0.273]
6 We define a concept graph as a rooted, directed graph where the nodes represent thematic units (called concepts) and the edges represent relationships between concepts. [sent-16, score-0.774]
7 Concept graphs are useful for summarizing document collections and providing a visualization of the thematic content and structure of large document sets - a task that is difficult to accomplish using only keyword search. [sent-17, score-0.592]
8 An example of a concept graph is Wikipedia’s category graph1 . [sent-18, score-0.439]
9 Figure 1 shows a small portion of the Wikipedia category graph rooted at the category M ACHINE LEARNING2 . [sent-19, score-0.447]
10 From the graph we can quickly infer that the collection of machine learning articles in Wikipedia focuses primarily on evolutionary algorithms and Markov models with less emphasis on other aspects of machine learning such as Bayesian networks and kernel methods. [sent-20, score-0.413]
11 The problem we address in this paper is that of learning a concept graph given a collection of documents where (optionally) we may have concept labels for the documents and an initial graph structure. [sent-21, score-1.124]
12 This is particularly suited for document collections like Wikipedia where the set of articles is changing at such a fast rate that an automatic method for updating the concept graph may be preferable to manual editing or re-learning the hierarchy from scratch. [sent-25, score-0.708]
13 LDA is a probabilistic model for automatically identifying topics within a document collection where a topic is a probability distribution over words. [sent-27, score-0.436]
14 In contrast, methods such as the hierarchical topic model (hLDA) [2] learn a set of topics in the form of a tree structure. [sent-29, score-0.197]
15 The restriction to tree structures however is not well suited for large document collections like Wikipedia. [sent-30, score-0.265]
16 The hierarchical Pachinko allocation model (hPAM) [3] is able to learn a set of topics arranged in a fixedsized graph with a nonparametric version introduced in [4]. [sent-32, score-0.271]
17 In addition, our model provides a formal mechanism for utilizing labeled data and existing concept graph structures. [sent-36, score-0.507]
18 Other methods for creating concept graphs include the use of techniques such as hierarchical clustering, pattern mining and formal concept analysis to construct ontologies from document collections [5, 6, 7]. [sent-37, score-0.693]
19 Our primary novel contribution is the introduction of a flexible probabilistic framework for learning general graph structures from text that is capable of utilizing both unlabeled documents as well as labeled documents and prior knowledge in the form of existing graph structures. [sent-39, score-1.002]
20 We then introduce our generative model and explain how it can be adapted for the case where we have an initial graph structure. [sent-41, score-0.233]
21 The probability of sampling a particular cluster from P(·) given the sequences {xj } and {vj } is not equal to the probability of sampling the same cluster given a permutation of the sequences {xσ(j) } and {vσ(j) }. [sent-54, score-0.211]
22 We construct a prior on graph structures by specifying a distribution at each node (denoted as Pt ) that governs the probability of transitioning from node t to another node in the graph. [sent-64, score-1.073]
23 However we may discover evidence for passing directly to a leaf node such as S TATISTICAL NATURAL L ANGUAGE P ROCESSING (e. [sent-68, score-0.243]
24 Second, making a transition to a new node must have non-zero probability. [sent-71, score-0.243]
25 For example, we may observe new articles related to the topic of Bioinformatics. [sent-72, score-0.272]
26 In this case, we want to add a new node to the graph (B IOINFORMATICS) and assign some probability of transitioning to it from other nodes. [sent-73, score-0.552]
27 For each node t we define a feasible set Ft as the collection of nodes to which t can transition. [sent-79, score-0.494]
28 The feasible set may contain the children of node t or possible child nodes of node t (as discussed above). [sent-80, score-0.699]
29 We add a special node called the ”exit node” to Ft . [sent-82, score-0.243]
30 If we sample the exit node then we exit from the graph instead of transitioning forward. [sent-83, score-0.721]
31 We define Pt as a stick-breaking distribution over the finite set of nodes Ft where the remaining probability mass is assigned to an infinite set of new nodes (nodes that exist but have not yet been observed). [sent-84, score-0.344]
32 |Ft | Pt (·) = ∞ πtj δftj (·) + j=1 πtj δxtj (·) j=|Ft |+1 The first |Ft | atoms of the stick-breaking distribution are the feasible nodes ftj ∈ Ft . [sent-86, score-0.274]
33 The remaining atoms are unidentifiable nodes that have yet to be observed (denoted as xtj for simplicity). [sent-87, score-0.226]
34 In our experiments, we first assign each node to a unique depth and then define Ft as any node at the next lower depth. [sent-91, score-0.486]
35 The choice of Ft determines the type of graph structures that can be learned. [sent-92, score-0.222]
36 More generally, one could extend the definition of Ft to include any node at a lower depth. [sent-95, score-0.243]
37 We use a Metropolis-Hastings sampler proposed by [10] to learn the permutation of feasible nodes with the highest likelihood given the data. [sent-114, score-0.363]
38 As discussed earlier, each node t is associated with a stick-breaking prior Pt . [sent-120, score-0.243]
39 In addition, we associate with each node a multinomial distribution φt over words in the fashion of topic models. [sent-121, score-0.419]
40 First, a path through the graph is sampled from the stick-breaking distributions. [sent-123, score-0.306]
41 The i + 1st node in the path is sampled from Ppdi (·) which is the stick-breaking distribution at the ith node in the path. [sent-125, score-0.605]
42 This process continues until an exit node is sampled. [sent-126, score-0.329]
43 Then for each word xi a level in the path, ldi , is sampled from a truncated discrete distribution. [sent-127, score-0.51]
44 The word xi is generated by the topic at level ldi of the path pd which we denote as pd [ldi ]. [sent-128, score-1.215]
45 In the case where we observe labeled documents and an initial graph structure the paths for document d is restricted to end at the concept label of document d. [sent-129, score-1.091]
46 The motivation is to constrain the length distribution to have the same general functional form across documents (in contrast to the relatively unconstrained multinomial), but to allow the parameters of the distribution to be documentspecific. [sent-132, score-0.197]
47 If word xdi has level ldi = 0 then the word is generated by the topic at the last node on the path and successive levels correspond to earlier nodes in the path. [sent-135, score-1.379]
48 In the case of labeled documents, this matches our belief that a majority of words in the document should be assigned to the concept label itself. [sent-136, score-0.414]
49 We use a collapsed Gibbs sampler [9] to infer the path assignment pd for each document, the level distribution parameter τd for each document, and the level assignment ldi for each word. [sent-138, score-0.908]
50 1 Sampling Paths For each document, we must sample a path pd conditioned on all other paths p−d , the level variables, and the word tokens. [sent-141, score-0.68]
51 p(pd |x, l, p−d , τ ) ∝ p(xd |x−d , l, p) · p(pd |p−d ) (1) The first term in Equation 1 is the probability of all words in the document given the path pd . [sent-143, score-0.543]
52 We compute this probability by marginalizing over the topic distributions φt : λd V Γ(η + Npd [l],v ) p(xd |x−d , l, p) = l=1 v=1 −d Γ(η + Npd [l],v ) Γ(V η + v −d Npd [l],v ) Γ(V η + ∗ v Npd [l],v ) We use λd to denote the length of path pd . [sent-144, score-0.512]
53 The notation Npd [l],v stands for the number of times word type v has been assigned to node pd [l]. [sent-145, score-0.593]
54 The superscript −d means we first decrement the count Npd [l],v for every word in document d. [sent-146, score-0.298]
55 The second term is the conditional probability of the path pd given all other paths p−d . [sent-147, score-0.494]
56 We present the sampling equation under the assumption that there is a maximum number of nodes M allowed at each level. [sent-148, score-0.218]
57 We first consider the probability of sampling a single edge in the path from a node x to one of its feasible nodes {y1 , y2 , . [sent-149, score-0.695]
58 , yM } where the node y1 has the first position in the stickbreaking permutation, y2 has the second position, y3 the third and so on. [sent-152, score-0.271]
59 We denote the number of paths that have gone from x to yi as N(x,yi ) . [sent-153, score-0.223]
60 We denote the number of paths that have gone from x to a node with a strictly higher position in the stick-breaking distribution M than yi as N(x,>yi ) . [sent-154, score-0.494]
61 The probability of selecting node yi is given by: p(x → yi | p−d ) = α + N(x,yi ) α + β + N(x,≥yi ) i−1 β + N(x,>yr ) α + β + N(x,≥yr ) r=1 for i = 1 . [sent-157, score-0.419]
62 M If ym is the last node with a nonzero count N(x,ym ) and m << M it is convenient to compute the probability of transitioning to yi , for i ≤ m, and the probability of transitioning to any node higher than ym . [sent-160, score-0.923]
63 The probability of transitioning to a node higher than ym is given by M p(x → yk |p−d ) = ∆ 1 − k=m+1 β M −m α+β β+N(x,>yr ) m r=1 α+β+N(x,≥yr ) . [sent-161, score-0.425]
64 where ∆ = A similar derivation can be used to compute the probability of sampling a node higher than ym when M is equal to infinity. [sent-162, score-0.394]
65 Now that we have computed the probability of a single edge, we can compute the probability of an entire path pd : λd p(pd |p−d ) = p(pdj → pd,j+1 |p−d ) j=1 4. [sent-163, score-0.402]
66 2 Sampling Levels For the ith word in the dth document we must sample a level ldi conditioned on all other levels l−di , the document paths, the level parameters τ , and the word tokens. [sent-164, score-1.108]
67 p(ldi |x, l−di , p, τ ) = −di η + Npd [ldi ],xdi −di W η + Npd [ldi ],· · (1 − τd )ldi τd (1 − (1 − τd )λd +1 ) The first term is the probability of word type xdi given the topic at node pd [ldi ]. [sent-165, score-0.806]
68 The second term is the probability of the level ldi given the level parameter τd . [sent-166, score-0.475]
69 3 Sampling τ Variables Finally, we must sample the level distribution τd conditioned on the rest of the level parameters τ −d , the level variables, and the word tokens. [sent-168, score-0.34]
70 Consider a node x with feasible nodes {y1 , y2 , . [sent-174, score-0.456]
71 We sample two feasible nodes yi and yj from a uniform distribution3 . [sent-178, score-0.352]
72 Then the probability of swapping the position of nodes yi and yj is given by N(x,yi ) −1 min 1, k=0 ∗ α + β + N(x,>yi ) + k α + β + N(x,>yj ) + k N(x,yj ) −1 · k=0 α + β + N(x,>yj ) + k ∗ α + β + N(x,>yi ) + k ∗ where N(x,>yi ) = N(x,>yi ) − N(x,yj ) . [sent-180, score-0.327]
73 After every new path assignment, we propose one swap for each node in the graph. [sent-182, score-0.362]
74 Figure 4(a) shows a simulated concept graph with 10 nodes drawn according to the 3 In [10] feasible nodes are sampled from the prior probability distribution. [sent-187, score-0.795]
75 The vocabulary size is 1, 000 words and we generate 4, 000 documents with 250 words each. [sent-191, score-0.197]
76 Each edge in the graph is labeled with the number of paths that traverse it. [sent-192, score-0.454]
77 Figures 4(b)-(d) show the learned graph structures as the fraction of labeled data increases from 0 labeled and 4, 000 unlabeled documents to all 4, 000 documents being labeled. [sent-193, score-0.784]
78 In addition to labeling the edges, we label each node based upon the similarity of the learned topic at the node to the topics of the original graph structure. [sent-194, score-0.87]
79 The Gibbs sampler is initialized to a root node when there is no labeled data. [sent-195, score-0.48]
80 With labeled data, the Gibbs sampler is initialized with the correct placement of nodes to levels. [sent-196, score-0.44]
81 The sampler does not observe the edge structure of the graph nor the correct number of nodes at each level (i. [sent-197, score-0.556]
82 With no labeled data, the sampler is unable to recover the relationship between concepts 8 and 10 (due to the relatively small number of documents that contain words from both concepts). [sent-200, score-0.488]
83 With 250 labeled documents, the sampler is able to learn the correct placement of both nodes 8 and 10 (although the topics contain some noise). [sent-201, score-0.465]
84 For both GraphLDA and hPAM, the number of nodes at each level was set to 25. [sent-206, score-0.219]
85 The edge widths correspond to the probability of the edge in the graph 5. [sent-292, score-0.275]
86 We use the aforementioned 518 machine-learning Wikipedia articles, along with their category labels, to learn topic distributions for each node in Figure 1. [sent-294, score-0.476]
87 The sampler is initialized with the correct placement of nodes and each document is initialized to a random path from the root to its category label. [sent-295, score-0.771]
88 After 2, 000 iterations, we fix the path assignments for the Wikipedia articles and introduce a new set of documents. [sent-296, score-0.251]
89 We sample paths for the new collection of documents keeping the paths from the Wikipedia articles fixed. [sent-298, score-0.638]
90 The sampler was allowed to add new nodes to each level to explain any new concepts that occurred in the ICML text set. [sent-299, score-0.479]
91 Figure 5 illustrates a portion of the final graph structure. [sent-300, score-0.227]
92 The nodes in bold are the original nodes from the Wikipedia category graph. [sent-301, score-0.407]
93 The results show that the model is capable of augmenting an existing concept graph with new concepts (e. [sent-302, score-0.459]
94 boosting/ensembles are on the same path as the concepts for SVMs and neural networks). [sent-307, score-0.205]
95 6 Discussion and Conclusion Motivated by the increasing availability of large-scale structured collections of documents such as Wikipedia, we have presented a flexible non-parametric Bayesian framework for learning concept graphs from text. [sent-308, score-0.483]
96 The proposed approach can combine unlabeled data with prior knowledge in the form of labeled documents and existing graph structures. [sent-309, score-0.468]
97 Extensions such as allowing the model to handle multiple paths per document are likely to be worth pursuing. [sent-310, score-0.293]
98 Computing the probability of every path during sampling, where the number of graphs is a product over the number of nodes at each level, is a computational bottleneck in the current inference algorithm and will not scale. [sent-312, score-0.374]
99 The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. [sent-323, score-0.242]
100 Learning concept hierarchies from text using formal concept analysis. [sent-355, score-0.413]
wordName wordTfidf (topN-words)
[('ldi', 0.321), ('wikipedia', 0.259), ('hpam', 0.257), ('node', 0.243), ('graphlda', 0.236), ('pd', 0.223), ('documents', 0.197), ('graph', 0.187), ('hlda', 0.171), ('npd', 0.171), ('document', 0.171), ('ft', 0.165), ('concept', 0.159), ('nodes', 0.157), ('topic', 0.14), ('articles', 0.132), ('word', 0.127), ('paths', 0.122), ('sampler', 0.121), ('path', 0.119), ('category', 0.093), ('irvine', 0.092), ('transitioning', 0.092), ('exit', 0.086), ('concepts', 0.086), ('achine', 0.086), ('labeled', 0.084), ('beta', 0.08), ('yi', 0.073), ('pt', 0.073), ('graphs', 0.068), ('yr', 0.065), ('pachinko', 0.064), ('level', 0.062), ('sampling', 0.061), ('ym', 0.06), ('collections', 0.059), ('topics', 0.057), ('mallet', 0.056), ('thematic', 0.056), ('feasible', 0.056), ('evolutionary', 0.056), ('gibbs', 0.054), ('text', 0.053), ('simulated', 0.049), ('placement', 0.046), ('hastings', 0.046), ('generative', 0.046), ('cimiano', 0.043), ('classifier', 0.043), ('padhraic', 0.043), ('philosophy', 0.043), ('vtj', 0.043), ('xdi', 0.043), ('xtj', 0.043), ('metropolis', 0.042), ('formal', 0.042), ('dirichlet', 0.041), ('restaurant', 0.041), ('portion', 0.04), ('levels', 0.04), ('smyth', 0.039), ('yj', 0.039), ('collection', 0.038), ('keyword', 0.038), ('ihler', 0.038), ('xj', 0.037), ('genetic', 0.037), ('multinomial', 0.036), ('vj', 0.036), ('structures', 0.035), ('utilizing', 0.035), ('ftj', 0.035), ('ontologies', 0.035), ('porteous', 0.035), ('atom', 0.035), ('chinese', 0.034), ('rooted', 0.034), ('blei', 0.034), ('classification', 0.034), ('geometric', 0.032), ('traverse', 0.032), ('lda', 0.032), ('initialized', 0.032), ('di', 0.031), ('probability', 0.03), ('david', 0.03), ('edge', 0.029), ('ian', 0.029), ('accomplish', 0.029), ('permutation', 0.029), ('language', 0.028), ('gone', 0.028), ('position', 0.028), ('relationships', 0.028), ('andrew', 0.027), ('nonparametric', 0.027), ('sample', 0.027), ('capable', 0.027), ('atoms', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 150 nips-2010-Learning concept graphs from text with stick-breaking priors
Author: America Chambers, Padhraic Smyth, Mark Steyvers
Abstract: We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs. 1
2 0.25583079 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
Author: Zoubin Ghahramani, Michael I. Jordan, Ryan P. Adams
Abstract: Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data. 1
3 0.19844104 194 nips-2010-Online Learning for Latent Dirichlet Allocation
Author: Matthew Hoffman, Francis R. Bach, David M. Blei
Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
4 0.19535185 286 nips-2010-Word Features for Latent Dirichlet Allocation
Author: James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, Alex J. Smola
Abstract: We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model. 1
5 0.15995789 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
Author: Eric Wang, Dehong Liu, Jorge Silva, Lawrence Carin, David B. Dunson
Abstract: We consider problems for which one has incomplete binary matrices that evolve with time (e.g., the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure and inter-relationships underlying the matrices, here defined by latent features associated with each axis of the matrix. In addition, it is assumed that documents are available for the entities associated with at least one of the matrix axes. By jointly analyzing the matrices and documents, one may be used to inform the other within the analysis, and the model offers the opportunity to predict matrix values (e.g., votes) based only on an associated document (e.g., legislation). The research presented here merges two areas of machine-learning that have previously been investigated separately: incomplete-matrix analysis and topic modeling. The analysis is performed from a Bayesian perspective, with efficient inference constituted via Gibbs sampling. The framework is demonstrated by considering all voting data and available documents (legislation) during the 220-year lifetime of the United States Senate and House of Representatives. 1
6 0.1399457 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
7 0.13588555 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
8 0.11702031 162 nips-2010-Link Discovery using Graph Feature Tracking
9 0.10252017 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
10 0.09839011 63 nips-2010-Distributed Dual Averaging In Networks
11 0.096902058 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
12 0.093790442 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks
13 0.090895914 117 nips-2010-Identifying graph-structured activation patterns in networks
14 0.088630497 182 nips-2010-New Adaptive Algorithms for Online Classification
15 0.084272891 177 nips-2010-Multitask Learning without Label Correspondences
16 0.083299972 264 nips-2010-Synergies in learning words and their referents
17 0.082273699 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
18 0.082118131 169 nips-2010-More data means less inference: A pseudo-max approach to structured learning
19 0.077786632 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
20 0.07553979 285 nips-2010-Why are some word orders more common than others? A uniform information density account
topicId topicWeight
[(0, 0.212), (1, 0.062), (2, 0.036), (3, 0.028), (4, -0.371), (5, 0.055), (6, 0.156), (7, -0.043), (8, 0.004), (9, 0.031), (10, 0.027), (11, 0.039), (12, -0.079), (13, 0.128), (14, -0.0), (15, -0.007), (16, -0.021), (17, -0.151), (18, -0.18), (19, 0.012), (20, 0.05), (21, -0.002), (22, -0.044), (23, 0.057), (24, 0.006), (25, 0.048), (26, -0.002), (27, 0.053), (28, 0.103), (29, 0.064), (30, 0.059), (31, 0.073), (32, 0.008), (33, 0.044), (34, 0.024), (35, 0.018), (36, 0.038), (37, 0.047), (38, 0.003), (39, -0.036), (40, -0.016), (41, 0.041), (42, 0.007), (43, 0.068), (44, -0.024), (45, 0.104), (46, -0.014), (47, -0.036), (48, -0.005), (49, -0.044)]
simIndex simValue paperId paperTitle
same-paper 1 0.95795625 150 nips-2010-Learning concept graphs from text with stick-breaking priors
Author: America Chambers, Padhraic Smyth, Mark Steyvers
Abstract: We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs. 1
2 0.75348181 286 nips-2010-Word Features for Latent Dirichlet Allocation
Author: James Petterson, Wray Buntine, Shravan M. Narayanamurthy, Tibério S. Caetano, Alex J. Smola
Abstract: We extend Latent Dirichlet Allocation (LDA) by explicitly allowing for the encoding of side information in the distribution over words. This results in a variety of new capabilities, such as improved estimates for infrequently occurring words, as well as the ability to leverage thesauri and dictionaries in order to boost topic cohesion within and across languages. We present experiments on multi-language topic synchronisation where dictionary information is used to bias corresponding words towards similar topics. Results indicate that our model substantially improves topic cohesion when compared to the standard LDA model. 1
3 0.69744974 131 nips-2010-Joint Analysis of Time-Evolving Binary Matrices and Associated Documents
Author: Eric Wang, Dehong Liu, Jorge Silva, Lawrence Carin, David B. Dunson
Abstract: We consider problems for which one has incomplete binary matrices that evolve with time (e.g., the votes of legislators on particular legislation, with each year characterized by a different such matrix). An objective of such analysis is to infer structure and inter-relationships underlying the matrices, here defined by latent features associated with each axis of the matrix. In addition, it is assumed that documents are available for the entities associated with at least one of the matrix axes. By jointly analyzing the matrices and documents, one may be used to inform the other within the analysis, and the model offers the opportunity to predict matrix values (e.g., votes) based only on an associated document (e.g., legislation). The research presented here merges two areas of machine-learning that have previously been investigated separately: incomplete-matrix analysis and topic modeling. The analysis is performed from a Bayesian perspective, with efficient inference constituted via Gibbs sampling. The framework is demonstrated by considering all voting data and available documents (legislation) during the 220-year lifetime of the United States Senate and House of Representatives. 1
4 0.67336988 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
Author: Zoubin Ghahramani, Michael I. Jordan, Ryan P. Adams
Abstract: Many data are naturally modeled by an unobserved hierarchical structure. In this paper we propose a flexible nonparametric prior over unknown data hierarchies. The approach uses nested stick-breaking processes to allow for trees of unbounded width and depth, where data can live at any node and are infinitely exchangeable. One can view our model as providing infinite mixtures where the components have a dependency structure corresponding to an evolutionary diffusion down a tree. By using a stick-breaking approach, we can apply Markov chain Monte Carlo methods based on slice sampling to perform Bayesian inference and simulate from the posterior distribution on trees. We apply our method to hierarchical clustering of images and topic modeling of text data. 1
5 0.65200466 60 nips-2010-Deterministic Single-Pass Algorithm for LDA
Author: Issei Sato, Kenichi Kurihara, Hiroshi Nakagawa
Abstract: We develop a deterministic single-pass algorithm for latent Dirichlet allocation (LDA) in order to process received documents one at a time and then discard them in an excess text stream. Our algorithm does not need to store old statistics for all data. The proposed algorithm is much faster than a batch algorithm and is comparable to the batch algorithm in terms of perplexity in experiments.
6 0.56264865 194 nips-2010-Online Learning for Latent Dirichlet Allocation
7 0.50368422 190 nips-2010-On the Convexity of Latent Social Network Inference
8 0.48700696 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem
9 0.46050921 264 nips-2010-Synergies in learning words and their referents
10 0.45735604 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
11 0.44934803 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
12 0.43521547 117 nips-2010-Identifying graph-structured activation patterns in networks
13 0.41995245 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
14 0.40722555 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
15 0.40641469 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs
16 0.39965773 63 nips-2010-Distributed Dual Averaging In Networks
17 0.39725426 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks
18 0.39358395 162 nips-2010-Link Discovery using Graph Feature Tracking
19 0.39008045 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
20 0.3889403 108 nips-2010-Graph-Valued Regression
topicId topicWeight
[(13, 0.046), (22, 0.019), (27, 0.143), (30, 0.07), (35, 0.012), (45, 0.184), (50, 0.064), (52, 0.028), (58, 0.229), (60, 0.055), (77, 0.037), (78, 0.012), (90, 0.02)]
simIndex simValue paperId paperTitle
1 0.84053868 57 nips-2010-Decoding Ipsilateral Finger Movements from ECoG Signals in Humans
Author: Yuzong Liu, Mohit Sharma, Charles Gaona, Jonathan Breshears, Jarod Roland, Zachary Freudenburg, Eric Leuthardt, Kilian Q. Weinberger
Abstract: Several motor related Brain Computer Interfaces (BCIs) have been developed over the years that use activity decoded from the contralateral hemisphere to operate devices. Contralateral primary motor cortex is also the region most severely affected by hemispheric stroke. Recent studies have identified ipsilateral cortical activity in planning of motor movements and its potential implications for a stroke relevant BCI. The most fundamental functional loss after a hemispheric stroke is the loss of fine motor control of the hand. Thus, whether ipsilateral cortex encodes finger movements is critical to the potential feasibility of BCI approaches in the future. This study uses ipsilateral cortical signals from humans (using ECoG) to decode finger movements. We demonstrate, for the first time, successful finger movement detection using machine learning algorithms. Our results show high decoding accuracies in all cases which are always above chance. We also show that significant accuracies can be achieved with the use of only a fraction of all the features recorded and that these core features are consistent with previous physiological findings. The results of this study have substantial implications for advancing neuroprosthetic approaches to stroke populations not currently amenable to existing BCI techniques. 1
same-paper 2 0.80860341 150 nips-2010-Learning concept graphs from text with stick-breaking priors
Author: America Chambers, Padhraic Smyth, Mark Steyvers
Abstract: We present a generative probabilistic model for learning general graph structures, which we term concept graphs, from text. Concept graphs provide a visual summary of the thematic content of a collection of documents—a task that is difficult to accomplish using only keyword search. The proposed model can learn different types of concept graph structures and is capable of utilizing partial prior knowledge about graph structure as well as labeled documents. We describe a generative model that is based on a stick-breaking process for graphs, and a Markov Chain Monte Carlo inference procedure. Experiments on simulated data show that the model can recover known graph structure when learning in both unsupervised and semi-supervised modes. We also show that the proposed model is competitive in terms of empirical log likelihood with existing structure-based topic models (hPAM and hLDA) on real-world text data sets. Finally, we illustrate the application of the model to the problem of updating Wikipedia category graphs. 1
3 0.73643625 161 nips-2010-Linear readout from a neural population with partial correlation data
Author: Adrien Wohrer, Ranulfo Romo, Christian K. Machens
Abstract: How much information does a neural population convey about a stimulus? Answers to this question are known to strongly depend on the correlation of response variability in neural populations. These noise correlations, however, are essentially immeasurable as the number of parameters in a noise correlation matrix grows quadratically with population size. Here, we suggest to bypass this problem by imposing a parametric model on a noise correlation matrix. Our basic assumption is that noise correlations arise due to common inputs between neurons. On average, noise correlations will therefore reflect signal correlations, which can be measured in neural populations. We suggest an explicit parametric dependency between signal and noise correlations. We show how this dependency can be used to ”fill the gaps” in noise correlations matrices using an iterative application of the Wishart distribution over positive definitive matrices. We apply our method to data from the primary somatosensory cortex of monkeys performing a two-alternativeforced choice task. We compare the discrimination thresholds read out from the population of recorded neurons with the discrimination threshold of the monkey and show that our method predicts different results than simpler, average schemes of noise correlations. 1
4 0.72957706 194 nips-2010-Online Learning for Latent Dirichlet Allocation
Author: Matthew Hoffman, Francis R. Bach, David M. Blei
Abstract: We develop an online variational Bayes (VB) algorithm for Latent Dirichlet Allocation (LDA). Online LDA is based on online stochastic optimization with a natural gradient step, which we show converges to a local optimum of the VB objective function. It can handily analyze massive document collections, including those arriving in a stream. We study the performance of online LDA in several ways, including by fitting a 100-topic topic model to 3.3M articles from Wikipedia in a single pass. We demonstrate that online LDA finds topic models as good or better than those found with batch VB, and in a fraction of the time. 1
5 0.72538292 268 nips-2010-The Neural Costs of Optimal Control
Author: Samuel Gershman, Robert Wilson
Abstract: Optimal control entails combining probabilities and utilities. However, for most practical problems, probability densities can be represented only approximately. Choosing an approximation requires balancing the benefits of an accurate approximation against the costs of computing it. We propose a variational framework for achieving this balance and apply it to the problem of how a neural population code should optimally represent a distribution under resource constraints. The essence of our analysis is the conjecture that population codes are organized to maximize a lower bound on the log expected utility. This theory can account for a plethora of experimental data, including the reward-modulation of sensory receptive fields, GABAergic effects on saccadic movements, and risk aversion in decisions under uncertainty. 1
6 0.72533345 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
7 0.72463697 98 nips-2010-Functional form of motion priors in human motion perception
8 0.72212452 81 nips-2010-Evaluating neuronal codes for inference using Fisher information
9 0.71855122 6 nips-2010-A Discriminative Latent Model of Image Region and Object Tag Correspondence
10 0.71722442 39 nips-2010-Bayesian Action-Graph Games
11 0.71296227 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
12 0.71245396 17 nips-2010-A biologically plausible network for the computation of orientation dominance
13 0.71222556 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
14 0.70967048 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
15 0.70703459 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
16 0.70683771 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles
17 0.70670873 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
18 0.70430648 276 nips-2010-Tree-Structured Stick Breaking for Hierarchical Data
19 0.70364314 121 nips-2010-Improving Human Judgments by Decontaminating Sequential Dependencies
20 0.70240122 155 nips-2010-Learning the context of a category