acl acl2013 acl2013-62 acl2013-62-reference knowledge-graph by maker-knowledge-mining

62 acl-2013-Automatic Term Ambiguity Detection

Source: pdf

Author: Tyler Baldwin ; Yunyao Li ; Bogdan Alexe ; Ioana R. Stanoi

Abstract: While the resolution of term ambiguity is important for information extraction (IE) systems, the cost of resolving each instance of an entity can be prohibitively expensive on large datasets. To combat this, this work looks at ambiguity detection at the term, rather than the instance, level. By making a judgment about the general ambiguity of a term, a system is able to handle ambiguous and unambiguous cases differently, improving throughput and quality. To address the term ambiguity detection problem, we employ a model that combines data from language models, ontologies, and topic modeling. Results over a dataset of entities from four product domains show that the proposed approach achieves significantly above baseline F-measure of 0.96.

reference text

Bogdan Alexe, Mauricio A. Hern a´ndez, Kirsten Hildrum, Rajasekar Krishnamurthy, Georgia Koutrika, Meenakshi Nagarajan, Haggai Roitman, Michal Shmueli-Scheuer, Ioana Roxana Stanoi, Chitra Venkatramani, and Rohit Wagle. 2012. Surfacing time-critical insights from social media. In SIGMOD Conference, pages 657–660. S o¨ren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Dbpedia: a nucleus for a web of open data. In Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference, ISWC’07/ASWC’07, pages 722– 735, Berlin, Heidelberg. Springer-Verlag. David Blei, Andrew Ng, and Micheal I. Jordan. 2003. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, January. Samuel Brody and Mirella Lapata. 2009. Bayesian word sense induction. In Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’09, pages 103–1 11, Stroudsburg, PA, USA. Association for Computational Linguistics. Marine Carpuat and Dekai Wu. 2007. Improving statistical machine translation using word sense disambiguation. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 61–72. Yee Seng Chan, Hwee Tou Ng, and David Chiang. 2007. Word sense disambiguation improves statistical machine translation. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pages 33–40, Prague, Czech Republic, June. Association for Computational Linguistics. Jinying Chen and Martha Palmer. 2004. Chinese verb sense discrimination using an em clustering model with rich linguistic features. In Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, ACL ’04, Stroudsburg, PA, USA. Association for Computational Linguistics. Laura Chiticariu, Rajasekar Krishnamurthy, Yunyao Li, Sriram Raghavan, Frederick Reiss, and Shivakumar Vaithyanathan. 2010. SystemT: An algebraic approach to declarative information extraction. In ACL, pages 128–137. Mark Davies. 2008-. The corpus of contemporary american english: 450 million words, 1990present. Avialable online at: http : / / corpus . byu .edu / coca / . Ioannis P. Klapaftis and Suresh Manandhar. 2010. Word sense induction & disambiguation using hierarchical random graphs. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’ 10, pages 745–755, Stroudsburg, PA, USA. Association for Computational Linguistics. Jey Han Lau, Paul Cook, Diana McCarthy, David Newman, and Timothy Baldwin. 2012. Word sense induction for novel sense detection. In Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, EACL ’ 12, pages 591–601, Stroudsburg, PA, USA. Association for Computational Linguistics. David Nadeau and Satoshi Sekine. 2007. A survey of named entity recognition and classification. Linguisticae Investigationes, 30(1):3–26. Roberto Navigli and Giuseppe Crisafulli. 2010. Inducing word senses to improve web search result clustering. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’ 10, pages 116–126, Stroudsburg, PA, USA. Association for Computational Linguistics. Roberto Navigli. 2009. Word sense disambiguation: A survey. ACM Comput. Surv., 41(2): 10: 1–10:69, February. Patrick Pantel and Dekang Lin. 2002. Discovering word senses from text. In Proceedings of the eighth 808 ACM SIGKDD international conference on Knowledge discovery and data mining, KDD ’02, pages 613–619, New York, NY, USA. ACM. Tim Van de Cruys. 2008. Using three way data for word sense discrimination. In Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1, COLING ’08, pages 929– 936, Stroudsburg, PA, USA. Association for Computational Linguistics. Yu Wang and Eugene Agichtein. 2010. Query ambiguity revisited: Clickthrough measures for distinguishing informational and ambiguous queries. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pages 361–364, Los Angeles, California, June. Association for Computational Linguistics. Zhi Zhong and Hwee Tou Ng. 2012. Word sense disambiguation improves information retrieval. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 273–282, Jeju Island, Korea, July. Association for Computational Linguistics. 809