acl acl2011 acl2011-51 acl2011-51-reference knowledge-graph by maker-knowledge-mining

51 acl-2011-Automatic Headline Generation using Character Cross-Correlation

Source: pdf

Author: Fahad Alotaiby

Abstract: Arabic language is a morphologically complex language. Affixes and clitics are regularly attached to stems which make direct comparison between words not practical. In this paper we propose a new automatic headline generation technique that utilizes character cross-correlation to extract best headlines and to overcome the Arabic language complex morphology. The system that uses character cross-correlation achieves ROUGE-L score of 0. 19384 while the exact word matching scores only 0. 17252 for the same set of documents. 1

reference text

Bonnie Dorr, David Zajic and Richard Schwartz. Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation. In Proceedings of the HLT-NAACL 2003 Text Summarization Workshop and Document Understanding Conference (DUC 2003), Edmonton, Alberta, 2003. Chin-Yew Lin, ROUGE: a Package for Automatic Evaluation of Summaries. In Proceedings of the Workshop on Text Summarization Branches Out, pages 56-60, Barcelona, Spain, July, 2004a. Chin-Yew Lin, Looking for a few Good Metrics: ROUGE and its Evaluation, In Working Notes of NTCIR-4 (Vol. Supl. 2), 2004b. Document Understanding Conference, http://duc.nist.gov/duc2004/tasks.html, 2004. Fahad Alotaiby, Ibrahim Alkharashi and Salah Foda. Processing large Arabic text corpora: Preliminary analysis and results. In Proceedings of the Second International Conference on Arabic Language Resources and Tools, pages 78-82, Cairo, Egypt, 2009. Fouad Douzidia and Guy Lapalme, Lakhas, an Arabic summarization system. In Proceedings of Document Understanding Conference (DUC), Boston, MA, USA, 2004. David Graff. Arabic Gigaword Third Edition. Linguistic Data Consortium. Philadelphia, USA, 2007. Kishore Papineni, Salim Roukos, Todd Ward, and WeiJing Zhu, BLEU: a Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 2002. Mark Wasson. Using Lead Text for news Summaries: Evaluation Results and Implications for Commercial Summarization Applications. In Proceedings of the 17th International Conference on Computational li- guistics, Montreal, Canada, 1998. 121 Rong Jin, and Alex G. Hauptmann, A New Probabilistic Model for Title Generation, The 19th International Conference on Computational Linguistics, Academia Sinica, Taipei, Taiwan, 2002. Tim Buckwalter. Issues in Arabic Orthography and Morphology Analysis. In Proceedings of the Workshop on Computational Approaches to Arabic Scriptbased Languages, Geneva, Switzerland, 2004. Zajic. D., Dorr. B. and Richard Schwartz. Automatic Headline Generation for Newspaper Stories. In Workshop on Automatic Summarization, pages. 7885, Philadelphia, PA, 2002.