emnlp emnlp2012 emnlp2012-33 emnlp2012-33-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jennifer Gillenwater ; Alex Kulesza ; Ben Taskar
Abstract: We propose a novel probabilistic technique for modeling and extracting salient structure from large document collections. As in clustering and topic modeling, our goal is to provide an organizing perspective into otherwise overwhelming amounts of information. We are particularly interested in revealing and exploiting relationships between documents. To this end, we focus on extracting diverse sets of threads—singlylinked, coherent chains of important documents. To illustrate, we extract research threads from citation graphs and construct timelines from news articles. Our method is highly scalable, running on a corpus of over 30 million words in about four minutes, more than 75 times faster than a dynamic topic model. Finally, the results from our model more closely resemble human news summaries according to several metrics and are also preferred by human judges.
[Ahmed and Xing2010] A. Ahmed and E. Xing. 2010. Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream. In Proc. UAI. [Allan et al.2001] J. Allan, R. Gupta, and V. Khandelwal. 2001. Temporal Summaries of New Topics. In Proc. SIGIR. 719 [Blei and Lafferty2006] D. Blei and J. Lafferty. Dynamic Topic Models. In Proc. ICML. 2006. [Wayne2000] C. Wayne. 2000. Multilingual Topic De- tection and Tracking: Successful Research Enabled [Chieu and Lee2004] H. Chieu and Y. Lee. 2004. Query Based Event Extraction along a Timeline. In Proc. SIGIR. [Erkan and Radev2004] [ 2004. LexRank: G. Erkan and D.R. Radev. Graph-Based Lexical Central- ity as Salience in Text Summarization. Journal of Artificial Intelligence Research, 22(1) :457–479. [Graff and Cieri2009] D. Graff and C. Cieri. 2009. English Gigaword. [Hesterberg et al.2003] T. Hesterberg, S. Monaghan, D. Moore, A. Clipson, and R. Epstein. 2003. Bootstrap Methods and Permutation Tests. [Johnson and Lindenstrauss1984] W. B. Johnson and J. Lindenstrauss. 1984. Extensions of Lipschitz Mappings into a Hilbert Space. Contemporary Mathematics, 26: 189–206. [Kulesza and Taskar2010] A. Kulesza and B. Taskar. 2010. Structured Determinantal Point Processes. In Proc. NIPS. [Kulesza and Taskar201 1] A. Kulesza and B. Taskar. 2011. k-DPPs: Fixed-Size Determinantal Point Processes. In Proc. ICML. [Leskovec et al.2009] J. Leskovec, L. Backstrom, and J. Kleinberg. 2009. Meme-tracking and the Dynamics of the News Cycle. In Proc. KDD. [Lin2004] C.Y. Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Proc. WAS. [Magen and Zouzias2008] A. Magen and A. Zouzias. 2008. Near Optimal Dimensionality Reductions that Preserve Volumes. Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques, pages 523–534. [McCallum et al.2000] A. McCallum, K. Nigam, J. Rennie, and K. Seymore. 2000. Automating the Construction of Internet Portals with Machine Learning. Information Retrieval Journal, 3:127– 163. [Mei and Zhai2005] W. Mei and C. Zhai. 2005. Discovering Evolutionary Theme Patterns From Text: An Exploration of Temporal Text Mining. In Proc. KDD. [Shahaf and Guestrin2010] D. Shahaf and C. Guestrin. 2010. Connecting the Dots Between News Articles. In Proc. KDD. [Shahaf et al.2012] D. Shahaf, C. Guestrin, and E. Horvitz. 2012. Trains of Thought: Generating Information Maps. In Proc. WWW. [Swan and Jensen2000] R. Swan and D. Jensen. 2000. TimeMines: Constructing Timelines with Statistical Models of Word Usage. In Proc. KDD. 720 by Corpora and Evaluation. In Proc. LREC. [Yan et al.2011] R. Yan, X. Wan, J. Otterbacher, L. Kong, X. Li, and Y. Zhang. 2011. Evolutionary Timeline Summarization: A Balanced Optimization Framework via Iterative Substitution. In Proc. SIGIR.