acl acl2013 acl2013-92 acl2013-92-reference knowledge-graph by maker-knowledge-mining

92 acl-2013-Context-Dependent Multilingual Lexical Lookup for Under-Resourced Languages


Source: pdf

Author: Lian Tze Lim ; Lay-Ki Soon ; Tek Yong Lim ; Enya Kong Tang ; Bali Ranaivo-Malancon

Abstract: Current approaches for word sense disambiguation and translation selection typically require lexical resources or large bilingual corpora with rich information fields and annotations, which are often infeasible for under-resourced languages. We extract translation context knowledge from a bilingual comparable corpora of a richer-resourced language pair, and inject it into a multilingual lexicon. The multilin- gual lexicon can then be used to perform context-dependent lexical lookup on texts of any language, including under-resourced ones. Evaluations on a prototype lookup tool, trained on a English–Malay bilingual Wikipedia corpus, show a precision score of 0.65 (baseline 0.55) and mean reciprocal rank score of 0.81 (baseline 0.771). Based on the early encouraging results, the context-dependent lexical lookup tool may be developed further into an intelligent reading aid, to help users grasp the gist of a second or foreign language text.


reference text

Satanjeev Banerjee and Ted Pedersen. 2003. Extended gloss overlaps as a measure of semantic relatedness. In Proceedings of the 18th International Joint Conference on Artificial Intelligence, pages 805–810. Pierpaolo Basile and Giovanni Semeraro. 2010. UBA: Using automatic translation and Wikipedia for crosslingual lexical substitution. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), pages 242–247, Uppsala, Sweden. Scott C. Deerwester, Susan T. Dumais, Thomas K. Landauer, George W. Furnas, and Richard A. Harshman. 1990. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6):391–407. Susan T. Dumais, Michael L. Littman, and Thomas K. Landauer. 1997. Automatic cross-language retrieval using latent semantic indexing. In AAAI97 Spring Symposium Series: Cross Language Text and Speech Retrieval, pages 18–24, Stanford University. Nancy Ide, Tomaz Erjavec, and Dan Tufi s¸. 2002. Sense discrimination with parallel corpora. In Proceedings of the SIGLEX/SENSEVAL Workshop on Word Sense Disambiguation: Recent Successes and Future Directions, pages 54–60, Philadelphia, USA. Els Lefever and Véronique Hoste. 2010. SemEval2010 Task 3: Cross-lingual word sense disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden. Hang Li and Cong Li. 2004. Word translation disambiguation using bilingual bootstrapping. Computational Linguistics, 30(1): 1–22. Lian Tze Lim, Bali Ranaivo-Malançon, and Enya Kong Tang. 2011. Low cost construction of a multilingual lexicon from bilingual lists. Polibits, 43:45–51 . Bernardo Magnini, Carlo Strapparava, Giovanni Pezzulo, and Alfio Gliozzo. 2001 . Using domain information for word sense disambiguation. In Proceedings of the 2nd International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), pages 111–1 14, Toulouse, France. Lipta Mahapatra, Meera Mohan, Mitesh M. Khapra, and Pushpak Bhattacharyya. 2010. OWNS: Crosslingual word sense disambiguation using weighted overlap counts and Wordnet based similarity measures. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden. Rada Mihalcea, Ravi Sinha, and Diana McCarthy. 2010. SemEval-2010 Task 2: Cross-lingual lexical substitution. In Proceedings of the 5th International Workshop on Semantic Evaluation (SemEval 2010), Uppsala, Sweden. Hwee Tou Ng, Bin Wang, and Yee Seng Chan. 2003. Exploiting parallel texts for word sense disambiguation: An empirical study. In Proceedings of the 41stAnnual Meeting ofthe Associationfor Computational Linguistics, pages 455–462, Sapporo, Japan. Gyula Papp. 2009. Vector-based unsupervised word sense disambiguation for large number of contexts. In Václav Matoušek and Pavel Mautner, editors, Text, Speech and Dialogue, volume 5729 of Lecture Notes in Computer Science, pages 109–1 15. Springer Berlin Heidelberg. Bahareh Sarrafzadeh, Nikolay Yakovets, Nick Cercone, and Aijun An. 2011. Cross-lingual word sense disambiguation for languages with scarce resources. In Proceedings of the 24th Canadian Conference on Advances in Artificial Intelligence, pages 347–358, St. John’s, Canada. Hinrich Schütze. 1998. Automatic word sense discrimination. Computational Linguistics, 24(1):97–123. Kiyoaki Shirai and Tsunekazu Yagi. 2004. Learning a robust word sense disambiguation model using hypernyms in definition sentences. In Proceedings of the 20th International Conference on Computational Linguistics (COLING 2004), pages 917– 923, Geneva, Switzerland. Association for Computational Linguistics. Ming Zhou, Yuan Ding, and Changning Huang. 2001. Improviging translation selection with a new translation model trained by independent monolingual corpora. Computational Linguistics and Chinese language Processing, 6(1): 1–26. 299