emnlp emnlp2011 emnlp2011-105 emnlp2011-105-reference knowledge-graph by maker-knowledge-mining

105 emnlp-2011-Predicting Thread Discourse Structure over Technical Web Forums

Source: pdf

Author: Li Wang ; Marco Lui ; Su Nam Kim ; Joakim Nivre ; Timothy Baldwin

Abstract: Online discussion forums are a valuable means for users to resolve specific information needs, both interactively for the participants and statically for users who search/browse over historical thread data. However, the complex structure of forum threads can make it difficult for users to extract relevant information. The discourse structure of web forum threads, in the form of labelled dependency relationships between posts, has the potential to greatly improve information access over web forum archives. In this paper, we present the task of parsing user forum threads to determine the labelled dependencies between posts. Three methods, including a dependency parsing approach, are proposed to jointly classify the links (relationships) between posts and the dialogue act (type) of each link. The proposed methods significantly surpass an informed baseline. We also experiment with “in situ” classification of evolving threads, and establish that our best methods are able to perform equivalently well over partial threads as complete threads.

reference text

L e´on Bottou. 2011. CRFSGD software. http : / / leon .bott ou .org/pro j e ct s / sgd. Xin Cao, Gao Cong, Bin Cui, Christian S. Jensen, and Ce Zhang. 2009. The use of categorization information in language models for question retrieval. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), pages 265–274, Hong Kong, China. Vitor R. Carvalho and William W. Cohen. 2005. On the collective classification of email ”speech acts”. In Proceedings of 28th International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2005), pages 345–352. Jeffrey Chan and Conor Hayes. 2010. Decomposing discussion forums using user roles. In Proceedings of the WebSci10: Extending the Frontiers of Society On-Line (WebSci10), pages 1–8, Raleigh, USA. Jeffrey Chan, Conor Hayes, and Elizabeth M. Daly. 2010. Decomposing discussion forums using user roles. In Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media (ICWSM 2010), pages 215–8, Washington, USA. Chih-Chung Chang and Chih-Jen Lin. 2011. LIB- SVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(3):27: 1–27:27. Software available at http : / / www .cs ie .ntu . edu .tw/ ˜ c j l in/ l ibsvm. William W. Cohen, Vitor R. Carvalho, and Tom M. Mitchell. 2004. Learning to classify email into “speech acts”. In Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pages 309–316, Barcelona, Spain. Gao Cong, Long Wang, Chin-Yew Lin, Young-In Song, and Yueheng Sun. 2008. Finding question-answer pairs from online forums. In Proceedings of31stInternationalACM-SIGIR Conference on Research andDevelopment in Information Retrieval (SIGIR’08), pages 467–474, Singapore. Daniel Dahlmeier, Hwee Tou Ng, and Tanja Schultz. 2009. Joint learning of preposition senses and semantic roles of prepositional phrases. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009), pages 450–458, Singapore. Association for Computational Linguistics. Shilin Ding, Gao Cong, Chin-Yew Lin, and Xiaoyan Zhu. 2008. Using conditional random fields to extract context and answers of questions from online forums. In Proceedings of the 46th Annual Meeting of the ACL: HLT (ACL 2008), pages 710–718, Columbus, USA. Jason Eisner and Noah A. Smith. 2005. Parsing with soft and hard constraints on dependency length. In Proceedings of the Ninth International Workshop on Parsing Technology, pages 30–41, Vancouver, Canada. Jonathan L. Elsas and Jaime G. Carbonell. 2009. It pays to be picky: An evaluation of thread retrieval in online forums. In Proceedings of 32nd International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’09), pages 714–715, Boston, USA. Micha Elsner and Eugene Charniak. 2008. You talking to me? a corpus and algorithm for conversation disentanglement. In Proceedings of the 46th Annual Meeting of the ACL: HLT (ACL 2008), pages 834–842, Columbus, USA. Jenny Rose Finkel and Christopher D. Manning. 2009. Joint parsing and named entity recognition. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2009), pages 326–334, Boulder, Colorado. Association for Computational Linguistics. Blaz Fortuna, Eduarda Mendes Rodrigues, and Natasa Milic-Frayling. 2007. Improving the classification of newsgroup messages through social network analysis. In Proceedings of the 16th ACM Conference on Information and Knowledge Management (CIKM 2007), pages 877–880, Lisbon, Portugal. Barbara J. Grosz and Candace L. Sidner. 1986. Atten- tion, intention and the structure of discourse. Computational Linguistics, 12(3): 175–204. Edward Ivanovic. 2008. Automatic instant messaging dialogue using statistical models and dialogue acts. Master’s thesis, University of Melbourne. Su Nam Kim, Lawrence Cavedon, and Timothy Baldwin. 2010a. Classifying dialogue acts in one-on-one live chats. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), pages 862–871, Boston, USA. 23 Su Nam Kim, Li Wang, and Timothy Baldwin. 2010b. Tagging and linking web forum posts. In Proceedings ofthe 14th Conference on ComputationalNatural Language Learning (CoNLL-2010), pages 192–202, Uppsala, Sweden. Sandra K ¨ubler, Ryan McDonald, and Joakim Nivre. 2009. Dependency parsing. Synthesis Lectures on Human Language Technologies, 2(1): 1–127. John Lafferty, Andrew McCallum, and Fernando Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282–289, Williamstown, USA. Andrew Lampert, Robert Dale, and C ´ecile Paris. 2008. The nature of requests and commitments in email mes- sages. In Proceedings of the AAAI 2008 Workshop on Enhanced Messaging, pages 42–47, Chicago, USA. Robert Leaman, Laura Wojtulewicz, Ryan Sullivan, Annie Skariah, Jian Yang, and Graciela Gonzalez. 2010. Towards internet-age pharmacovigilance: Extracting adverse drug reactions from user posts in healthrelated social networks. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing (ACL 2010), pages 117–125, Uppsala, Sweden. Oliver Lemon, Alex Gruenstein, and Stanley Peters. 2002. Collaborative activities and multi-tasking in dialogue systems. Traitement Automatique des Langues (TAL), Special Issue on Dialogue, 43(2): 13 1–154. Chen Lin, Jiang-Ming Yang, Rui Cai, Xin-Jing Wang, Wei Wang, and Lei Zhang. 2009. Modeling semantics and structure of discussion threads. In Proceedings of the 18th International Conference on the World Wide Web (WWW 2009), pages 1103–1 104, Madrid, Spain. Marco Lui and Timothy Baldwin. 2009. You are what you post: User-level features in threaded discourse. In Proceedings of the 14th Australasian Document Computing Symposium (ADCS 2009), Sydney, Australia. Marco Lui and Timothy Baldwin. 2010. Classifying user forum participants: Separating the gurus from the hacks, and other tales of the internet. In Proceedings of the 2010 Australasian Language Technology Workshop (ALTW 2010), pages 49–57, Melbourne, Australia. Ryan McDonald and Joakim Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL 2007), pages 122–131, Prague, Czech Republic. Ryan McDonald and Fernando Pereira. 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pages 81–88, Trento, Italy. Ryan McDonald, Fernando Pereira, Kiril Ribarov, and Jan Hajic. 2005. Non-projective dependency parsing using spanning tree algorithms. In Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing, pages 523–530, Vancouver, Canada. Gabriel Murray, Steve Renals, Jean Carletta, and Johanna Moore. 2006. Incorporating speaker and discourse features into speech summarization. In Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, pages 367– 374. Klemens Muthmann, Wojciech M. Barczy´ nski, Falk Brauer, and Alexander L ¨oser. 2009. Near-duplicate detection for web-forums. In Proceedings of the 2009 International Database Engineering & Applications Symposium (IDEAS 2009), pages 142–15 1, Cetraro, Italy. Joakim Nivre, Johan Hall, Jens Nilsson, Atanas Chanev, G ¨ulsen Eryigit, Sandra K ¨ubler, Svetoslav Marinov, and Erwin Marsi. 2007. MaltParser: A languageindependent system for data-driven dependency parsing. Natural Language Engineering, 13(02):95–135. Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. In Proceedings of the 8th International Workshop on Parsing Technologies (IWPT 03), pages 149–160, Nancy, France. Joakim Nivre. 2004. Incrementality in deterministic dependency parsing. In Proceedings of the ACL Workshop Incremental Parsing: Bringing Engineering and Cognition Together (ACL-2004), pages 50–57, Barcelona, Spain. Carolyn Penstein Ros e´, Barbara Di Eugenio, Lori S. Levin, and Carol Van Ess-Dykema. 1995. Discourse processing of dialogues with multiple threads. In Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pages 3 1–38, Cambridge, USA. Kenji Sagae and Jun’ichi Tsujii. 2008. Shift-reduce dependency DAG parsing. In Proceedings of the 22nd International Conference on Computational Lin- guistics (COLING 2008), pages 753–760, Manchester, UK. Kenji Sagae. 2009. Analysis of discourse structure with syntactic dependencies and data-driven shift-reduce parsing. In Proceedings of the 11th International Conference on Parsing Technologies (IWPT-09), pages 81– 84, Paris, France. Anne Schuth, Maarten Marx, and Maarten de Rijke. 2007. Extracting the discussion structure in comments 24 on news-articles. In Proceedings of the 9th Annual ACM International Workshop on Web Information and Data Management, pages 97–104, Lisboa, Portugal. Jangwon Seo, W. Bruce Croft, and David A. Smith. 2009. Online community search using thread structure. In Proceedings of the 18th ACM Conference on Information and Knowledge Management (CIKM 2009), pages 1907–1910, Hong Kong, China. Elinzabeth Shriberg, Raj Dhillon, Sonali Bhagat, Jeremy Ang, and Hannah Carvey. 2004. The ICSI meeting recorder dialog act (MRDA) corpus. In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue, pages 97–100, Cambridge, USA. Parikshit Sondhi, Manish Gupta, ChengXiang Zhai, and Julia Hockenmaier. 2010. Shallow information extraction from medical forum data. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Posters Volume, pages 1158–1 166, Beijing, China. Radu Soricut and Daniel Marcu. 2003. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL 2003), pages 149–156, Edmonton, Canada. Andreas Stolcke, Klaus Ries, Noah Coccaro, Elizabeth Shriberg, Rebecca Bates, Daniel Jurafsky, Pail Taylor, Rachel Martin, Carol Van Ess-Dykema, and Marie Meteer. 2000. Dialogue act modeling for automatic tagging and recognition of conversational speech. Computational Linguistics, 26(3):339–373. Charles Sutton and Andrew McCallum. 2005. Joint parsing and semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pages 225–228, Ann Arbor, Michigan. Association for Computational Linguistics. Pasi Tapanainen and Timo Jarvinen. 1997. A nonprojective dependency parser. In Proceedings of the Fifth Conference on Applied Natural Language Processing, pages 64–71, Washington, USA. Nayer Wanas, Motaz El-Saban, Heba Ashour, and Waleed Ammar. 2008. Automatic scoring of online discussion posts. In Proceeding of the 2nd ACM workshop on Information credibility on the web (WICOW ’08), pages 19–26, Napa Valley, USA. Yi-Chia Wang and Carolyn P. Ros e´. 2010. Making conversational structure explicit: identification of initiation-response pairs within online discussions. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL HLT 2010), pages 673–676. Yi-Chia Wang, Mahesh Joshi, and Carolyn Ros e´. 2007. A feature based approach to leveraging context for classifying newsgroup style discussion segments. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions (ACL 2007), pages 73–76, Prague, Czech Republic. Yi-Chia Wang, Mahesh Joshi, William W. Cohen, and Carolyn Ros e´. 2008. Recovering implicit thread structure in newsgroup style conversations. In Proceedings of the Second International Conference on Weblogs and Social Media (ICWSM 2008), pages 152– 160, Seattle, USA. V. Warnke, R. Kompe, H. Niemann, and E. N o¨th. 1997. Integrated dialog act segmentation and classification using prosodic features and language models. In Proc. Eurospeech, volume 1, pages 207–210. Markus Weimer and Iryna Gurevych. 2007. Predicting the perceived quality of web forum posts. In Proceedings of the 2007 International Conference on Recent Advances in Natural Language Processing (RANLP 2007), pages 643–648, Borovets, Bulgaria. Markus Weimer, Iryna Gurevych, and Max M ¨uhlh a¨user. 2007. Automatically assessing the post quality in online discussions on software. In Proceedings of the 45th Annual Meeting of the ACL: Interactive Poster and Demonstration Sessions, pages 125–128, Prague, Czech Republic. Armin Weinberger and Frank Fischer. 2006. A framework to analyze argumentative knowledge construction in computer-supported collaborative learning. Computers & Education, 46:71–95, January. Florian Wolf and Edward Gibson. 2005. Representing discourse coherence: A corpus-based study. Computational Linguistics, 3 1(2):249–287. Wensi Xi, Jesper Lind, and Eric Brill. 2004. Learning effective ranking functions for newsgroup search. In Proceedings of 27th International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2004), pages 394–401 . Sheffield, UK. Alexander Yeh. 2000. More accurate tests for the sta- tistical significance of result differences. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pages 947–953, Saarbr¨ ucken, Germany. 25