nips nips2011 nips2011-129 knowledge-graph by maker-knowledge-mining

129 nips-2011-Improving Topic Coherence with Regularized Topic Models

Source: pdf

Author: David Newman, Edwin V. Bonilla, Wray Buntine

Abstract: Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reﬂect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 au Abstract Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. [sent-7, score-0.796]

2 When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. [sent-8, score-0.908]

3 However, when dealing with small collections or noisy text (e. [sent-9, score-0.393]

4 web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. [sent-11, score-0.845]

5 To overcome this, we propose two methods to regularize the learning of topic models. [sent-12, score-0.634]

6 Our regularizers work by creating a structured prior over words that reﬂect broad patterns in the external data. [sent-13, score-0.329]

7 Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. [sent-14, score-1.358]

8 Overall, this work makes topic models more useful across a broader range of text data. [sent-15, score-0.756]

9 1 Introduction Topic modeling holds much promise for improving the ways users search, discover, and organize online content by automatically extracting semantic themes from collections of text documents. [sent-16, score-0.754]

10 Learned topics can be useful in user interfaces for ad-hoc document retrieval [18]; driving faceted browsing [14]; clustering search results [19]; or improving display of search results by increasing result diversity [10]. [sent-17, score-1.215]

11 When the text being modeled is plentiful, clear and well written (e. [sent-18, score-0.18]

12 large collections of abstracts from scientiﬁc literature), learned topics are usually coherent, easily understood, and ﬁt for use in user interfaces. [sent-20, score-0.64]

13 However, topics are not always consistently coherent, and even with relatively well written text, one can learn topics that are a mix of concepts or hard to understand [1, 6]. [sent-21, score-0.775]

14 This problem is exacerbated for content that is sparse or noisy, such as blog posts, tweets, or web search result snippets. [sent-22, score-0.436]

15 Take for instance the task of learning categories in clustering search engine results. [sent-23, score-0.169]

16 A few searches with Carrot2, Yippee, or WebClust quickly demonstrate that consistently learning meaningful topic facets is a difﬁcult task [5]. [sent-24, score-0.642]

17 Our goal in this paper is to improve the coherence, interpretability and ultimate usability of learned topics. [sent-25, score-0.348]

18 To achieve this we propose Q UAD -R EG and C ONV-R EG, two new methods for regularizing topic models, which produce more coherent and interpretable topics. [sent-26, score-0.918]

19 Our work is predicated on recent evidence that a pointwise mutual information-based score (PMI-Score) is highly correlated with human-judged topic coherence [15, 16]. [sent-27, score-1.136]

20 We develop two Bayesian regularization formulations that are designed to improve PMI-Score. [sent-28, score-0.094]

21 We experiment with ﬁve search result datasets from 7M Blog posts, four search result datasets from 1M News articles, and four datasets of Google search results. [sent-29, score-0.462]

22 Using these thirteen datasets, our experiments demonstrate that both regularizers consistently improve topic coherence and interpretability, as measured separately by PMI-Score and human judgements. [sent-30, score-1.307]

23 To the best of our knowledge, our models are the ﬁrst to address the problem of learning topics when dealing with limited and/or noisy text content. [sent-31, score-0.601]

24 This work opens up new application areas for topic modeling. [sent-32, score-0.584]

25 1 2 Topic Coherence and PMI-Score Topics learned from a statistical topic model are formally a multinomial distribution over words, and are often displayed by printing the 10 most probable words in the topic. [sent-33, score-0.874]

26 These top-10 words usually provide sufﬁcient information to determine the subject area and interpretation of a topic, and distinguish one topic from another. [sent-34, score-0.682]

27 However, topics learned on sparse or noisy text data are often less coherent, difﬁcult to interpret, and not particularly useful. [sent-35, score-0.676]

28 Some of these noisy topics can be vaguely interpretable, but contain (in the top-10 words) one or two unrelated words – while other topics can be practically incoherent. [sent-36, score-0.898]

29 In this paper we wish to improve topic models learned on document collections where the text data is sparse and/or noisy. [sent-37, score-1.06]

30 We postulate that using additional (possibly external) data will regularize the learning of the topic models. [sent-38, score-0.658]

31 Topic coherence – meaning semantic coherence – is a human judged quality that depends on the semantics of the words, and cannot be measured by model-based statistical measures that treat the words as exchangeable tokens. [sent-40, score-1.092]

32 Fortunately, recent work has demonstrated that it is possible to automatically measure topic coherence with near-human accuracy [16, 15] using a score based on pointwise mutual information (PMI). [sent-41, score-1.117]

33 In that work they showed (using 6000 human evaluations) that the PMI-Score broadly agrees with human-judged topic coherence. [sent-42, score-0.657]

34 The PMI-Score is motivated by measuring word association between all pairs of words in the top-10 topic words. [sent-43, score-0.705]

35 PMI-Score is deﬁned as follows: 1 PMI-Score(w) = PMI(wi , wj ), ij ∈ {1 . [sent-44, score-0.024]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('topic', 0.546), ('coherence', 0.364), ('topics', 0.327), ('coherent', 0.21), ('posts', 0.168), ('browsing', 0.154), ('text', 0.153), ('blog', 0.144), ('faceted', 0.127), ('pmi', 0.127), ('interpretable', 0.126), ('collections', 0.119), ('words', 0.112), ('themes', 0.112), ('thirteen', 0.112), ('regularizers', 0.111), ('interpretability', 0.1), ('search', 0.097), ('pointwise', 0.091), ('eg', 0.091), ('newman', 0.087), ('learned', 0.082), ('web', 0.069), ('improve', 0.068), ('document', 0.068), ('diversity', 0.067), ('semantic', 0.067), ('noisy', 0.065), ('regularize', 0.064), ('consistently', 0.063), ('datasets', 0.057), ('usability', 0.056), ('edwin', 0.056), ('buntine', 0.056), ('predicated', 0.056), ('dealing', 0.056), ('improving', 0.055), ('content', 0.054), ('external', 0.052), ('extracting', 0.052), ('snippets', 0.051), ('printing', 0.048), ('abstracts', 0.048), ('nicta', 0.048), ('exacerbated', 0.048), ('postulate', 0.048), ('mutual', 0.048), ('judged', 0.046), ('bonilla', 0.046), ('interfaces', 0.044), ('engine', 0.044), ('human', 0.043), ('ultimate', 0.042), ('organize', 0.04), ('user', 0.04), ('articles', 0.039), ('promise', 0.038), ('unrelated', 0.038), ('irvine', 0.038), ('australian', 0.038), ('opens', 0.038), ('automatically', 0.037), ('exchangeable', 0.037), ('regularizing', 0.036), ('broadly', 0.036), ('semantics', 0.035), ('probable', 0.034), ('driving', 0.034), ('searches', 0.033), ('broader', 0.033), ('agrees', 0.032), ('fortunately', 0.031), ('score', 0.031), ('mix', 0.031), ('discover', 0.03), ('creating', 0.029), ('practically', 0.029), ('news', 0.029), ('clustering', 0.028), ('google', 0.028), ('display', 0.028), ('multinomial', 0.028), ('written', 0.027), ('users', 0.027), ('valuable', 0.027), ('interpret', 0.027), ('formulations', 0.026), ('less', 0.025), ('evaluations', 0.025), ('broad', 0.025), ('retrieval', 0.025), ('useful', 0.024), ('displayed', 0.024), ('overcome', 0.024), ('word', 0.024), ('wj', 0.024), ('treat', 0.024), ('usually', 0.024), ('sparse', 0.024), ('measuring', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 129 nips-2011-Improving Topic Coherence with Regularized Topic Models

Author: David Newman, Edwin V. Bonilla, Wray Buntine

2 0.45488369 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation

Author: David Sontag, Dan Roy

Abstract: We consider the computational complexity of probabilistic inference in Latent Dirichlet Allocation (LDA). First, we study the problem of ﬁnding the maximum a posteriori (MAP) assignment of topics to words, where the document’s topic distribution is integrated out. We show that, when the e↵ective number of topics per document is small, exact inference takes polynomial time. In contrast, we show that, when a document has a large number of topics, ﬁnding the MAP assignment of topics to words in LDA is NP-hard. Next, we consider the problem of ﬁnding the MAP topic distribution for a document, where the topic-word assignments are integrated out. We show that this problem is also NP-hard. Finally, we brieﬂy discuss the problem of sampling from the posterior, showing that this is NP-hard in one restricted setting, but leaving open the general question. 1

3 0.31940514 281 nips-2011-The Doubly Correlated Nonparametric Topic Model

Author: Dae I. Kim, Erik B. Sudderth

Abstract: Topic models are learned via a statistical model of variation within document collections, but designed to extract meaningful semantic structure. Desirable traits include the ability to incorporate annotations or metadata associated with documents; the discovery of correlated patterns of topic usage; and the avoidance of parametric assumptions, such as manual speciﬁcation of the number of topics. We propose a doubly correlated nonparametric topic (DCNT) model, the ﬁrst model to simultaneously capture all three of these properties. The DCNT models metadata via a ﬂexible, Gaussian regression on arbitrary input features; correlations via a scalable square-root covariance representation; and nonparametric selection from an unbounded series of potential topics via a stick-breaking construction. We validate the semantic structure and predictive performance of the DCNT using a corpus of NIPS documents annotated by various metadata. 1

4 0.21570884 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

Author: Xianxing Zhang, Lawrence Carin, David B. Dunson

Abstract: The nested Chinese restaurant process is extended to design a nonparametric topic-model tree for representation of human choices. Each tree path corresponds to a type of person, and each node (topic) has a corresponding probability vector over items that may be selected. The observed data are assumed to have associated temporal covariates (corresponding to the time at which choices are made), and we wish to impose that with increasing time it is more probable that topics deeper in the tree are utilized. This structure is imposed by developing a new “change point

5 0.21079685 110 nips-2011-Group Anomaly Detection using Flexible Genre Models

Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider

Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1

6 0.11876939 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation

7 0.1093654 156 nips-2011-Learning to Learn with Compound HD Models

8 0.09334261 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval

9 0.064462572 259 nips-2011-Sparse Estimation with Structured Dictionaries

10 0.063555807 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data

11 0.053556085 14 nips-2011-A concave regularization technique for sparse mixture models

12 0.050395671 157 nips-2011-Learning to Search Efficiently in High Dimensions

13 0.048098836 126 nips-2011-Im2Text: Describing Images Using 1 Million Captioned Photographs

14 0.046698555 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

15 0.033123888 54 nips-2011-Co-regularized Multi-view Spectral Clustering

16 0.032202154 274 nips-2011-Structure Learning for Optimization

17 0.031505138 141 nips-2011-Large-Scale Category Structure Aware Image Categorization

18 0.030584818 150 nips-2011-Learning a Distance Metric from a Network

19 0.030499235 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

20 0.030453164 176 nips-2011-Multi-View Learning of Word Embeddings via CCA

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.102), (1, 0.059), (2, -0.039), (3, 0.025), (4, -0.042), (5, -0.454), (6, 0.16), (7, 0.196), (8, -0.234), (9, 0.068), (10, 0.155), (11, 0.218), (12, -0.027), (13, 0.072), (14, 0.07), (15, -0.032), (16, 0.084), (17, 0.056), (18, -0.03), (19, 0.057), (20, -0.12), (21, 0.019), (22, 0.054), (23, 0.027), (24, 0.044), (25, -0.015), (26, -0.063), (27, -0.033), (28, 0.024), (29, -0.058), (30, -0.041), (31, -0.016), (32, 0.048), (33, -0.056), (34, -0.007), (35, -0.012), (36, 0.005), (37, -0.019), (38, 0.032), (39, -0.002), (40, -0.027), (41, 0.023), (42, -0.016), (43, 0.006), (44, -0.014), (45, -0.011), (46, -0.048), (47, -0.059), (48, -0.021), (49, -0.008)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99235094 129 nips-2011-Improving Topic Coherence with Regularized Topic Models

Author: David Newman, Edwin V. Bonilla, Wray Buntine

2 0.90763837 281 nips-2011-The Doubly Correlated Nonparametric Topic Model

Author: Dae I. Kim, Erik B. Sudderth

3 0.88968706 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation

Author: David Sontag, Dan Roy

4 0.77496815 110 nips-2011-Group Anomaly Detection using Flexible Genre Models

Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider

5 0.64476657 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

Author: Xianxing Zhang, Lawrence Carin, David B. Dunson

6 0.608441 116 nips-2011-Hierarchically Supervised Latent Dirichlet Allocation

7 0.45161313 14 nips-2011-A concave regularization technique for sparse mixture models

8 0.36709559 160 nips-2011-Linear Submodular Bandits and their Application to Diversified Retrieval

9 0.3616572 156 nips-2011-Learning to Learn with Compound HD Models

10 0.22251956 176 nips-2011-Multi-View Learning of Word Embeddings via CCA

11 0.21242757 216 nips-2011-Portmanteau Vocabularies for Multi-Cue Image Representation

12 0.18901265 125 nips-2011-Identifying Alzheimer's Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis

13 0.15998252 74 nips-2011-Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

14 0.15805906 157 nips-2011-Learning to Search Efficiently in High Dimensions

15 0.15450026 274 nips-2011-Structure Learning for Optimization

16 0.14555529 167 nips-2011-Maximum Covariance Unfolding : Manifold Learning for Bimodal Data

17 0.14300381 151 nips-2011-Learning a Tree of Metrics with Disjoint Visual Features

18 0.14198093 142 nips-2011-Large-Scale Sparse Principal Component Analysis with Application to Text Data

19 0.1351601 150 nips-2011-Learning a Distance Metric from a Network

20 0.13504961 27 nips-2011-Advice Refinement in Knowledge-Based SVMs

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.01), (4, 0.017), (20, 0.026), (31, 0.031), (33, 0.042), (43, 0.054), (45, 0.178), (57, 0.039), (65, 0.023), (67, 0.366), (74, 0.049), (83, 0.02), (99, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81026495 147 nips-2011-Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors

Author: Hsiu-chin Lin, Vickie Baracos, Russell Greiner, Chun-nam J. Yu

Abstract: An accurate model of patient survival time can help in the treatment and care of cancer patients. The common practice of providing survival time estimates based only on population averages for the site and stage of cancer ignores many important individual differences among patients. In this paper, we propose a local regression method for learning patient-speciﬁc survival time distribution based on patient attributes such as blood tests and clinical assessments. When tested on a cohort of more than 2000 cancer patients, our method gives survival time predictions that are much more accurate than popular survival analysis models such as the Cox and Aalen regression models. Our results also show that using patient-speciﬁc attributes can reduce the prediction error on survival time by as much as 20% when compared to using cancer site and stage only. 1

same-paper 2 0.78872663 129 nips-2011-Improving Topic Coherence with Regularized Topic Models

Author: David Newman, Edwin V. Bonilla, Wray Buntine

3 0.63312167 56 nips-2011-Committing Bandits

Author: Loc X. Bui, Ramesh Johari, Shie Mannor

Abstract: We consider a multi-armed bandit problem where there are two phases. The ﬁrst phase is an experimentation phase where the decision maker is free to explore multiple options. In the second phase the decision maker has to commit to one of the arms and stick with it. Cost is incurred during both phases with a higher cost during the experimentation phase. We analyze the regret in this setup, and both propose algorithms and provide upper and lower bounds that depend on the ratio of the duration of the experimentation phase to the duration of the commitment phase. Our analysis reveals that if given the choice, it is optimal to experiment Θ(ln T ) steps and then commit, where T is the time horizon.

4 0.58092445 227 nips-2011-Pylon Model for Semantic Segmentation

Author: Victor Lempitsky, Andrea Vedaldi, Andrew Zisserman

Abstract: Graph cut optimization is one of the standard workhorses of image segmentation since for binary random ﬁeld representations of the image, it gives globally optimal results and there are efﬁcient polynomial time implementations. Often, the random ﬁeld is applied over a ﬂat partitioning of the image into non-intersecting elements, such as pixels or super-pixels. In the paper we show that if, instead of a ﬂat partitioning, the image is represented by a hierarchical segmentation tree, then the resulting energy combining unary and boundary terms can still be optimized using graph cut (with all the corresponding beneﬁts of global optimality and efﬁciency). As a result of such inference, the image gets partitioned into a set of segments that may come from different layers of the tree. We apply this formulation, which we call the pylon model, to the task of semantic segmentation where the goal is to separate an image into areas belonging to different semantic classes. The experiments highlight the advantage of inference on a segmentation tree (over a ﬂat partitioning) and demonstrate that the optimization in the pylon model is able to ﬂexibly choose the level of segmentation across the image. Overall, the proposed system has superior segmentation accuracy on several datasets (Graz-02, Stanford background) compared to previously suggested approaches. 1

5 0.5405612 186 nips-2011-Noise Thresholds for Spectral Clustering

Author: Sivaraman Balakrishnan, Min Xu, Akshay Krishnamurthy, Aarti Singh

Abstract: Although spectral clustering has enjoyed considerable empirical success in machine learning, its theoretical properties are not yet fully developed. We analyze the performance of a spectral algorithm for hierarchical clustering and show that on a class of hierarchically structured similarity matrices, this algorithm can tolerate noise that grows with the number of data points while still perfectly recovering the hierarchical clusters with high probability. We additionally improve upon previous results for k-way spectral clustering to derive conditions under which spectral clustering makes no mistakes. Further, using minimax analysis, we derive tight upper and lower bounds for the clustering problem and compare the performance of spectral clustering to these information theoretic limits. We also present experiments on simulated and real world data illustrating our results. 1

6 0.48710606 105 nips-2011-Generalized Lasso based Approximation of Sparse Coding for Visual Recognition

7 0.48583403 214 nips-2011-PiCoDes: Learning a Compact Code for Novel-Category Recognition

8 0.48343074 113 nips-2011-Hierarchical Matching Pursuit for Image Classification: Architecture and Fast Algorithms

9 0.47877121 293 nips-2011-Understanding the Intrinsic Memorability of Images

10 0.47797358 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model

11 0.47759309 244 nips-2011-Selecting Receptive Fields in Deep Networks

12 0.47653225 252 nips-2011-ShareBoost: Efficient multiclass learning with feature sharing

13 0.47605193 169 nips-2011-Maximum Margin Multi-Label Structured Prediction

14 0.47441414 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison

15 0.47345188 78 nips-2011-Efficient Methods for Overlapping Group Lasso

16 0.47330406 220 nips-2011-Prediction strategies without loss

17 0.47319433 190 nips-2011-Nonlinear Inverse Reinforcement Learning with Gaussian Processes

18 0.47238418 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

19 0.47227898 143 nips-2011-Learning Anchor Planes for Classification

20 0.47225362 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time