acl acl2011 acl2011-4 acl2011-4-reference knowledge-graph by maker-knowledge-mining

4 acl-2011-A Class of Submodular Functions for Document Summarization


Source: pdf

Author: Hui Lin ; Jeff Bilmes

Abstract: We design a class of submodular functions meant for document summarization tasks. These functions each combine two terms, one which encourages the summary to be representative of the corpus, and the other which positively rewards diversity. Critically, our functions are monotone nondecreasing and submodular, which means that an efficient scalable greedy optimization scheme has a constant factor guarantee of optimality. When evaluated on DUC 2004-2007 corpora, we obtain better than existing state-of-art results in both generic and query-focused document summarization. Lastly, we show that several well-established methods for document summarization correspond, in fact, to submodular function optimization, adding further evidence that submodular functions are a natural fit for document summarization.


reference text

F. Bach. 2010. Structured sparsity-inducing norms through submodular functions. Advances in Neural Information Processing Systems. J. Carbonell and J. Goldstein. 1998. The use of MMR, diversity-based reranking for reordering documents and producing summaries. In Proc. of SIGIR. A. Celikyilmaz and D. Hakkani-t u¨r. 2010. A hybrid hierarchical model for multi-document summarization. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 815–824, Uppsala, Sweden, July. Association for Computational Linguistics. H. Daume´, J. Langford, and D. Marcu. 2009. Searchbased structured prediction. Machine learning, 75(3):297–325. H. Daume´ III and D. Marcu. 2006. Bayesian queryfocused summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, page 3 12. C. Fellbaum. 1998. WordNet: An electronic lexical database. The MIT press. E. Filatova and V. Hatzivassiloglou. 2004. Event-based extractive summarization. In Proceedings of ACL Workshop on Summarization, volume 111. A. Haghighi and L. Vanderwende. 2009. Exploring content models for multi-document summarization. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 362–370, Boulder, Colorado, June. Association for Computational Linguistics. J. Jagarlamudi, P. Pingali, and V. Varma. 2006. Query independent sentence scoring approach to DUC 2006. In DUC 2006. S. Jegelka and J. A. Bilmes. 2011. Submodularity beyond submodular energies: coupling edges in graph cuts. In Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, June. D. Kempe, J. Kleinberg, and E. Tardos. 2003. Maximizing the spread of influence through a social network. In Proceedings of the 9th Conference on SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). S. Khuller, A. Moss, and J. Naor. 1999. The budgeted maximum coverage problem. Information Processing Letters, 70(1):39–45. V. Kolmogorov and R. Zabin. 2004. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2): 147–159. 519 A. Krause and C. Guestrin. 2005. Near-optimal nonmyopic value of information in graphical models. In Proc. of Uncertainty in AI. A. Krause, H.B. McMahan, C. Guestrin, and A. Gupta. 2008. Robust submodular observation selection. Journal of Machine Learning Research, 9:2761–2801 . H. Lin and J. Bilmes. 2010. Multi-document summarization via budgeted maximization of submodular functions. In North American chapter of the Association for Computational Linguistics/Human Language Technology Conference (NAACL/HLT-2010), Los Angeles, CA, June. H. Lin and J. Bilmes. 2011. Word alignment via submodular maximization over matroids. In The 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT), Portland, OR, June. H. Lin, J. Bilmes, and S. Xie. 2009. Graph-based submodular selection for extractive summarization. In Proc. IEEE Automatic Speech Recognition and Understanding (ASRU), Merano, Italy, December. C.-Y. Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out: Proceedings of the ACL-04 Workshop. L. Lova´sz. 1983. Submodular functions and convexity. Mathematical programming-The state of the art,(eds. A. Bachem, M. Grotschel and B. Korte) Springer, pages 235–257. M. Minoux. 1978. Accelerated greedy algorithms for maximizing submodular set functions. Optimization Techniques, pages 234–243. M. Narasimhan and J. Bilmes. 2005. A submodularsupermodular procedure with applications to discriminative structure learning. In Proc. Conf. Uncertainty in Artifical Intelligence, Edinburgh, Scotland, July. Morgan Kaufmann Publishers. M. Narasimhan and J. Bilmes. 2007. Local search for balanced submodular clusterings. In Twentieth International Joint Conference on Artificial Intelligence (IJCAI07), Hyderabad, India, January. H. Narayanan. 1997. Submodular functions and electrical networks. North-Holland. G.L. Nemhauser, L.A. Wolsey, and M.L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions I. Mathematical Programming, 14(1):265– 294. P. Pingali, K. Rahul, and V. Varma. 2007. IIIT Hyderabad at DUC 2007. Proceedings of DUC 2007. V. Qazvinian, D.R. Radev, and A. Ozgu¨r. 2010. Citation Summarization Through Keyphrase Extraction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 895– 903. A. Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In EMNLP, volume 1, pages 133–142. K. Riedhammer, B. Favre, and D. Hakkani-Tu¨r. 2010. Long story short-Global unsupervised models for keyphrase based meeting summarization. Speech Communication. C. Shen and T. Li. 2010. Multi-document summarization via the minimum dominating set. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 984–992, Beijing, China, August. Coling 2010 Organizing Committee. M. Sviridenko. 2004. A note on maximizing a submodular set function subject to a knapsack constraint. Operations Research Letters, 32(1):41–43. H. Takamura and M. Okumura. 2009. Text summarization model based on maximum coverage problem and its variant. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pages 781–789. Association for Computational Linguistics. K. Toutanova, C. Brockett, M. Gamon, J. Jagarlamudi, H. Suzuki, and L. Vanderwende. 2007. The PYTHY summarization system: Microsoft research at DUC 2007. In the proceedings of Document Understanding Conference. D. Wang, S. Zhu, T. Li, and Y. Gong. 2009. Multidocument summarization using sentence-based topic models. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 297–300, Suntec, Singapore, August. Association for Computational Linguistics. L.A. Wolsey. 1982. An analysis of the greedy algorithm for the submodular set covering problem. Combinatorica, 2(4):385–393. P. Zhao, G. Rocha, and B. Yu. 2009. Grouped and hierarchical model selection through composite absolute penalties. Annals of Statistics, 37(6A):3468–3497. 520