emnlp emnlp2011 emnlp2011-94 emnlp2011-94-reference knowledge-graph by maker-knowledge-mining

94 emnlp-2011-Modelling Discourse Relations for Arabic


Source: pdf

Author: Amal Al-Saif ; Katja Markert

Abstract: We present the first algorithms to automatically identify explicit discourse connectives and the relations they signal for Arabic text. First we show that, for Arabic news, most adjacent sentences are connected via explicit connectives in contrast to English, making the treatment of explicit discourse connectives for Arabic highly important. We also show that explicit Arabic discourse connectives are far more ambiguous than English ones, making their treatment challenging. In the second part of the paper, we present supervised algorithms to address automatic discourse connective identification and discourse relation recognition. Our connective identifier based on gold standard syntactic features achieves almost human performance. In addition, an identifier based solely on simple lexical and automatically derived morphological and POS features performs with high reliability, essential for languages that do not have high-quality parsers yet. Our algorithm for recognizing discourse relations performs significantly better than a baseline based on the connective surface string alone and therefore reduces the ambiguity in explicit connective interpretation.


reference text

M. Abdl al latif, A. Umar, and M. Zahran. 1997. Alnhw AlAsAsi. Dar Alfker Al-Arabi, Cairo, Egypt. A. AlSaif and K. Markert. 2010. The leeds arabic discourse treebank: Annotating discourse connectives for arabic. In Language Resources and Evaluation Conference (LREC). J. Baldridge and A. Lascarides. 2005. Probabilistic headdriven parsing for discourse structure. In Proc. Of Conll 2005. S. Blair-Goldensohn, K McKeown, and O. Rambow. 2007. Building and refining rhetorical-semantic relation models. In Proc. of HLT-NAACL 2007. L. Carlson, D. Marcu, and M. Okurewski. 2002. Rst discourse treebank. Linguistic Data Consortium. D. duVerle and H. Prendinger. 2009. A novel discourse parser based on support vector machine classification. In Proc. of ACL 2009. R. Elwell and J. Baldridge. 2008. Discourse connective argument identification with connective specific rankers. In Proc. of the International Conference on Semantic Computing. R. Girju. 2003. Automatic detection of causal relations for questions answering. In Proc. of the ACL 2003 Workshop on Multilingual Summarisation and Question Answering, pages 76–83. M.A.K. Halliday and R. Hasan. 1976. Cohesion in English. Longman London. J.R. Hobbs. 1985. On the coherence and structure of discourse. Center for the Study of Language and Information, Stanford, Calif. A. Knott and T. Sanders. 1998. The classification of coherence relations and their linguistic markers: An exploration of two languages. Journal of Pragmatics, 30(2): 135–175. Z. Lin, M. Kan, and H.T. Ng. 2009. Recognizing implicit discourse relations in the penn discourse treebank. In Proc. of EMNLP 2009, pages 343–351. A. Louis and A. Nenkova. 2010. Creating local coherence: An empirical assessment. In Proc. of NAACL 2010. M. Maamouri and A. Bies. 2004. Developing an Arabic treebank: Methods, guidelines, procedures, and tools. In Proceedings of the Workshop on Computa- tional Approaches to Arabic Script-based Languages (COLING), Geneva. W.C. Mann and S.A. Thompson. 1988. Rhetorical structure theory: Toward a functional theory of text organization. Text, 8(3):243–281 . D. Marcu and A. Echihabi. 2002. An unsupervised approach to recognizing discourse relations. In Proc. of ACL 2002. 746 D. Marcu. 2000. The theory and practice of discourse parsing and summarization. MIT Press. E. Miltsakaki, N. Dinesh, R. Prasad, A. Joshi, and B. Webber. 2005. Experiments on sense annotation and sense disambiguation of discourse connectives. In Proc. of the Workshop on Treebanks and Linguistic Theories. E. Pitler and A. Nenkova. 2008. Revisiting readability: A unified framework for predicting text quality. In Proc. of EMNLP 2008, pages 186–195. E. Pitler and A. Nenkova. 2009. Using syntax to disambiguate explicit discourse connectives. In Proc of ACL-IJCNLP 2009 (Short Papers), pages 13–16. E. Pitler, M. Raghupathy, H. Mehta, A. Nenkova, A. Lee, and A. Joshi. 2008. Easily identifiable discourse relations. In Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), Manchester, UK, August. E. Pitler, A. Louis, and A. Nenkova. 2009. Automatic sense prediction for implicit discourse relations in text. In Proc. of ACL-IJCNLP 2009, pages 683–691. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. 2008a. The Penn discourse treebank 2.0. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). R. Prasad, S. Husain, D.M. Sharma, and A. Joshi. 2008b. Towards an Annotated Corpus of Discourse Relations in Hindi. In The Third International Joint Conference on Natural Language Processing, pages 7–12. Citeseer. K.C. Ryding. 2005. A reference grammar of modern standard Arabic. Cambridge Univ Pr. S. Siegel and N.J. Castellan. 1956. Nonparametric statistics for the behavioral sciences. McGraw-Hill New York. S. Somasundaran, J. Wiebe, and J. Ruppenhofer. 2008. Discourse-level opinion interpretation. In Proc. of Coling 2008. R. Soricut and D. Marcu. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proc of HLT-NAACL 2003. C. Sporleder and A. Lascarides. 2008. Using automatically labelled examples to classify rhetorical rela- tions: An assessment. Natural Language Engineering, 14:369–416. and C. Tan. 2010. Kernel-based discourse relation recognition with temporal ordering information. In Proc. of ACL 2010, pages 710–719. B. Webber, A. Knott, M. Stone, and A. Joshi. 1999. Discourse relations: A structural and presuppositional account using lexicalised TAG. In Proceedings of W. Wang, J. Su, the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, page 48. Association for Computational Linguistics. B. Wellner and J. Pustejovski. 2007. Automatically identifying the arguments of discourse connectives. In Proc. of EMNLP 2007, pages 92–101. B. Wellner, J. Pustejovski, A. Havasi, A. Rumshisky, and R. Suair. 2006. Classification of discourse coherence relations: An exploratory study using multiple knowledge sources. In Proc. of SIGDIAL2006. Janyce Wiebe, Theresa Wilson, and Claire Cardie. 2005. Annotating expressions of opinions and emotions in language. Language Resources and Evaluation. W. Wright. 2008. A grammar of the Arabic language. Bibliobazaar. Nianwen Xue. 2005. Annotating discourse connectives in the chinese treebank. In CorpusAnno ’05: Proceedings of the Workshop on Frontiers in Corpus Annotations II, pages 84–91, Morristown, NJ, USA. Association for Computational Linguistics. D. Zeyrek and B. Webber. 2008. A discourse resource for turkish: Annotating discourse connectives in the metu corpus. Proceedings of IJCNLP-2008. Hyderabad, India. Z. Zhou, Y. Xu, Z. Niu, M. Lan, . Su, and Tan. C. 2010. Predicting discourse connectives for implicit discourse relation recognition. In Proc. of Coling 2010, pages 1507–1514. 747