emnlp emnlp2013 emnlp2013-123 emnlp2013-123-reference knowledge-graph by maker-knowledge-mining

123 emnlp-2013-Learning to Rank Lexical Substitutions

Source: pdf

Author: Gyorgy Szarvas ; Robert Busa-Fekete ; Eyke Hullermeier

Abstract: The problem to replace a word with a synonym that fits well in its sentential context is known as the lexical substitution task. In this paper, we tackle this task as a supervised ranking problem. Given a dataset of target words, their sentential contexts and the potential substitutions for the target words, the goal is to train a model that accurately ranks the candidate substitutions based on their contextual fitness. As a key contribution, we customize and evaluate several learning-to-rank models to the lexical substitution task, including classification-based and regression-based approaches. On two datasets widely used for lexical substitution, our best models signifi- cantly advance the state-of-the-art.

reference text

Eneko Agirre, Daniel Cer, Mona Diab, and Aitor Gonzalez-Agirre. 2012. Semeval-2012 task 6: A pilot on semantic textual similarity. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics – Volume 1: Proceedings of the main conference and the shared task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012), pages 385–393, Montr ´eal, Canada. D. Benbouzid, R. Busa-Fekete, N. Casagrande, F.-D. Collin, and B. K e´gl. 2012. MultiBoost: a multipurpose boosting package. Journal of Machine Learning Research, 13:549–553. Chris Biemann. 2012. Creating a System for Lexical Substitutions from Scratch using Crowdsourcing. Language Resources and Evaluation: Special Issue on Collaboratively Constructed Language Resources, 46(2). R. Busa-Fekete, B. K e´gl, T. E´ltet˝ o, and Gy. Szarvas. 2011. Ranking by calibrated AdaBoost. In (JMLR W&CP;), volume 14, pages 37–48. R. Busa-Fekete, B. K e´gl, T. E´ltet˝ o, and Gy. Szarvas. 2013. Tune and mix: learning to rank using ensembles of calibrated multi-class classifiers. Machine Learning, 93(2–3):261–292. Ching-Yun Chang and Stephen Clark. 2010. Practical linguistic steganography using contextual synonym substitution and vertex colour coding. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1194–1203, Cambridge, MA. Ido Dagan, Oren Glickman, Alfio Gliozzo, Efrat Marmorshtein, and Carlo Strapparava. 2006. Direct word sense matching for lexical substitution. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pages 449–456, Sydney, Australia. Georgiana Dinu and Mirella Lapata. 2010. Measuring distributional similarity in context. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1162–1 172, Cambridge, MA. Katrin Erk and Sebastian Pad o´. 2010. Exemplar-based models for word meaning in context. In Proceedings of the ACL 2010 Conference Short Papers, pages 92– 97, Uppsala, Sweden. Y. Freund, R. Iyer, R. E. Schapire, and Y. Singer. 2003. An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4:933– 969. 1931 J. Friedman. 1999. Greedy function approximation: a gradient boosting machine. Technical report, Dept. of Statistics, Stanford University. Bela Gipp, Norman Meuschke, and Joeran Beel. 2011. Comparative Evaluation of Text- and Citation-based Plagiarism Detection Approaches using GuttenPlag. In Proceedings of 11th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL’11), pages 255–258, Ottawa, Canada. ACM New York, NY, USA. Available at http://sciplore.org/pub/. Claudio Giuliano, Alfio Gliozzo, and Carlo Strapparava. 2007. FBK-irst: Lexical substitution task exploiting domain and syntagmatic coherence. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 145–148, Prague, Czech Republic. Samer Hassan, Andras Csomai, Carmen Banea, Ravi Sinha, and Rada Mihalcea. 2007. UNT: SubFinder: Combining knowledge sources for automatic lexical substitution. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval2007), pages 410–413, Prague, Czech Republic. T. Joachims. 2006. Training linear svms in linear time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD). B. K e´gl and R. Busa-Fekete. 2009. Boosting products of base classifiers. In International Conference on Machine Learning, volume 26, pages 497–504, Montreal, Canada. Kazuaki Kishida. 2005. Property of Average Precision and Its Generalization: An Examination of Evaluation Indicator for Information Retrieval Experiments. NII technical report. National Institute of Informatics. P. Li, C. Burges, and Q. Wu. 2007. McRank: Learning to rank using multiple classification and gradient boosting. In Advances in Neural Information Processing Systems, volume 19, pages 897–904. The MIT Press. David Martinez, Su Nam Kim, and Timothy Baldwin. 2007. MELB-MKB: Lexical substitution system based on relatives in context. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 237–240, Prague, Czech Republic. Diana McCarthy and Roberto Navigli. 2007. Semeval2007 task 10: English lexical substitution task. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007), pages 48–53, Prague, Czech Republic. J. Platt. 2000. Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In A.J. Smola, P. Bartlett, B. Schoelkopf, and D. Schuurmans, editors, Advances in Large Margin Classifiers, pages 61–74. MIT Press. R. E. Schapire and Y. Singer. 1999. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297–336. Ravi Sinha and Rada Mihalcea. 2009. Combining lexical resources for contextual synonym expansion. In Proceedings of the International Conference RANLP2009, pages 404–410, Borovets, Bulgaria. Gy¨ orgy Szarvas, Chris Biemann, and Iryna Gurevych. 2013. Supervised all-words lexical substitution using delexicalized features. In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT 2013), June. Stefan Thater, Hagen F ¨urstenau, and Manfred Pinkal. 2010. Contextualizing semantic representations using syntactically enriched vector models. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 948–957, Uppsala, Sweden. Stefan Thater, Hagen F ¨urstenau, and Manfred Pinkal. 2011. Word meaning in context: A simple and effective vector model. In Proceedings of the Fifth International Joint Conference on Natural Language Processing : IJCNLP 2011, pages 1134–1 143, Chiang Mai, Thailand. MP, ISSN 978-974-466-564-5. Umut Topkara, Mercan Topkara, and Mikhail J. Atallah. 2006. The hiding virtues of ambiguity: quantifiably resilient watermarking of natural language text through synonym substitutions. In Proceedings of the 8th workshop on Multimedia and security, pages 164– 174, New York, NY, USA. ACM. H. Valizadegan, R. Jin, R. Zhang, and J. Mao. 2009. Learning to rank by optimizing NDCG measure. In Advances in Neural Information Processing Systems 22, pages 1883–1891. Q. Wu, C. J. C. Burges, K. M. Svore, and J. Gao. 2010. Adapting boosting for information retrieval measures. Inf. Retr., 13(3):254–270. Deniz Yuret. 2007. Ku: Word sense disambiguation by substitution. In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval2007), pages 207–214, Prague, Czech Republic, June. Association for Computational Linguistics. 1932