acl acl2010 acl2010-8 acl2010-8-reference knowledge-graph by maker-knowledge-mining

8 acl-2010-A Hybrid Hierarchical Model for Multi-Document Summarization

Source: pdf

Author: Asli Celikyilmaz ; Dilek Hakkani-Tur

Abstract: Scoring sentences in documents given abstract summaries created by humans is important in extractive multi-document summarization. In this paper, we formulate extractive summarization as a two step learning problem building a generative model for pattern discovery and a regression model for inference. We calculate scores for sentences in document clusters based on their latent characteristics using a hierarchical topic model. Then, using these scores, we train a regression model based on the lexical and structural characteristics of the sentences, and use the model to score sentences of new documents to form a summary. Our system advances current state-of-the-art improving ROUGE scores by ∼7%. Generated summaries are less rbeydu ∼n7d%an.t a Gnedn more dc sohuemremnatr bieasse adre upon manual quality evaluations.

reference text

R. Barzilay and L. Lee. Catching the drift: Probabilistic content models with applications to generation and summarization. In In Proc. HLTNAACL’04, 2004. D. Blei, T. Griffiths, M. Jordan, and J. Tenenbaum. Hierarchical topic models and the nested chinese restaurant process. In In Neural Information Processing Systems [NIPS], 2003a. D. Blei, T. Griffiths, and M. Jordan. The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. In Journal of ACM, 2009. D. M. Blei, A. Ng, and M. Jordan. Latent dirichlet allocation. In Jrnl. Machine Learning Research, 3:993-1022, 2003b. S.R.K. Branavan, H. Chen, J. Eisenstein, and R. Barzilay. Learning document-level semantic properties from free-text annotations. In Journal of Artificial Intelligence Research, volume 34, 2009. J.M. Conroy, J.D. Schlesinger, and D.P. O’Leary. Topic focused multi-cument summarization using an approximate oracle score. In In Proc. ACL’06, 2006. H. Daum e´III and D. Marcu. Bayesian query focused summarization. In Proc. ACL-06, 2006. H. Drucker, C.J.C. Burger, L. Kaufman, A. Smola, and V. Vapnik. Support vector regression machines. In NIPS 9, 1997. A. Haghighi and L. Vanderwende. Exploring content models for multi-document summarization. In NAACL HLT-09, 2009. T. Joachims. Making large-scale svm learning practical. In In Advances in Kernel Methods Support Vector Learning. MIT Press., 1999. C.-Y. Lin. Rouge: A package for automatic evaluation of summaries. In In Proc. ACL Workshop on Text Summarization Branches Out, 2004. 823 C.-Y. Lin and E.H. Hovy. Automatic evaluation of summaries using n-gram co-occurance statistics. In Proc. HLT-NAACL, Edmonton, Canada, 2003. C. Manning and H. Schuetze. Foundations of statistical natural language processing. In MIT Press. Cambridge, MA, 1999. A. Nenkova and L. Vanderwende. The impact of frequency on summarization. In Tech. Report MSR-TR-2005-101, Microsoft Research, Redwood, Washington, 2005. D.R. Radev, H. Jing, M. Stys, and D. Tam. Centroid-based summarization for multiple documents. In In Int. Jrnl. Information Processing and Management, 2004. D. Shen, J.T. Sun, H. Li, Q. Yang, and Z. Chen. Document summarization using conditional random fields. In Proc. IJCAI’07, 2007. J. Tang, L. Yao, and D. Chens. Multi-topic based query-oriented summarization. In SIAM International Conference Data Mining, 2009. I. Titov and R. McDonald. A joint model of text and aspect ratings for sentiment summarization. In ACL-08:HLT, 2008. K. Toutanova, C. Brockett, M. Gamon, J. Jagarlamudi, H. Suzuki, and L. Vanderwende. The phthy summarization system: Microsoft research at duc 2007. In Proc. DUC, 2007. J.Y. Yeh, H.-R. Ke, W.P. Yang, and I-H. Meng. Text summarization using a trainable summarizer and latent semantic analysis. In Information Processing and Management, 2005. 824