acl acl2011 acl2011-304 acl2011-304-reference knowledge-graph by maker-knowledge-mining

304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD


Source: pdf

Author: Mitesh M. Khapra ; Salil Joshi ; Arindam Chatterjee ; Pushpak Bhattacharyya

Abstract: Recent work on bilingual Word Sense Disambiguation (WSD) has shown that a resource deprived language (L1) can benefit from the annotation work done in a resource rich language (L2) via parameter projection. However, this method assumes the presence of sufficient annotated data in one resource rich language which may not always be possible. Instead, we focus on the situation where there are two resource deprived languages, both having a very small amount of seed annotated data and a large amount of untagged data. We then use bilingual bootstrapping, wherein, a model trained using the seed annotated data of L1 is used to annotate the untagged data of L2 and vice versa using parameter projection. The untagged instances of L1 and L2 which get annotated with high confidence are then added to the seed data of the respective languages and the above process is repeated. Our experiments show that such a bilingual bootstrapping algorithm when evaluated on two different domains with small seed sizes using Hindi (L1) and Marathi (L2) as the language pair performs better than monolingual bootstrapping and significantly reduces annotation cost.


reference text

Eneko Agirre and German Rigau. 1996. Word sense disambiguation using conceptual density. In In Proceedings of the 16th International Conference on Computational Linguistics (COLING). Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. pages 92– 100. Morgan Kaufmann Publishers. Mitesh M. Khapra, Sapan Shah, Piyush Kedia, and Pushpak Bhattacharyya. 2009. Projecting parameters for multilingual word sense disambiguation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 459–467, Singapore, August. Association for Computational Linguistics. Mitesh Khapra, Saurabh Sohoney, Anup Kulkarni, and Pushpak Bhattacharyya. 2010. Value for money: Balancing annotation effort, lexicon building and accuracy for multilingual wsd. In Proceedings of the 23rd International Conference on Computational Linguistics. Yoong Keok Lee, Hwee Tou Ng, and Tee Kiah Chia. 2004. Supervised word sense disambiguation with support vector machines and multiple knowledge Proceedings of Senseval-3: Third International Workshop on the Evaluation of Systems for the Semantic Analysis of Text, pages 137–140. Michael Lesk. 1986. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In In Proceedings of the 5th annual international conference on Systems docusources. In mentation. Li and Cong Li. 2004. Word translation disambiguation using bilingual bootstrapping. Comput. Linguist., 30: 1–22, March. Hang Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll. 2004. Finding predominant word senses in untagged text. In ACL ’04: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics, page 279, Morristown, NJ, USA. Association for Computational Linguistics. Rada Mihalcea. 2005. Large vocabulary unsupervised word sense disambiguation with graph-based algorithms for sequence data labeling. In In Proceedings of the Joint Human Language Technology and Empirical Methods in Natural Language Processing Conference (HLT/EMNLP), pages 411–418. Rajat Mohanty, Pushpak Bhattacharyya, Prabhakar Pande, Shraddha Kalele, Mitesh Khapra, and Aditya Sharma. 2008. Synset based multilingual dictionary: Insights, applications and challenges. In Global Wordnet Conference. Hwee Tou Ng and Hian Beng Lee. 1996. Integrating multiple knowledge sources to disambiguate word senses: An exemplar-based approach. In In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics (ACL), pages 40–47. D. Walker and R. Amsler. 1986. The use of machine readable dictionaries in sublanguage analysis. In In Analyzing Language in Restricted Domains, Grishman and Kittredge (eds), LEA Press, pages 69–83. David Yarowsky. 1995. Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on Association for Computational Linguistics, pages 189–196, Morristown, NJ, USA. Association for Computational Linguistics. 569