acl acl2010 acl2010-50 acl2010-50-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daphna Shezaf ; Ari Rappoport
Abstract: Bilingual lexicons are fundamental resources. Modern automated lexicon generation methods usually require parallel corpora, which are not available for most language pairs. Lexicons can be generated using non-parallel corpora or a pivot language, but such lexicons are noisy. We present an algorithm for generating a high quality lexicon from a noisy one, which only requires an independent corpus for each language. Our algorithm introduces non-aligned signatures (NAS), a cross-lingual word context similarity score that avoids the over-constrained and inef- ficient nature of alignment-based methods. We use NAS to eliminate incorrect translations from the generated lexicon. We evaluate our method by improving the quality of noisy Spanish-Hebrew lexicons generated from two pivot English lexicons. Our algorithm substantially outperforms other lexicon generation methods.
Kisuh Ahn and Matthew Frampton. 2006. Automatic generation of translation dictionaries using intermediary languages. In EACL 2006 Workshop on Cross- Language Knowledge Induction. Francis Bond, Ruhaida Binti Sulong, Takefumi Yamazaki, and Kentaro Ogura. 2001. Design and construction of a machine-tractable japanese-malay dictionary. In MT Summit VIII: Machine Translation in the Information Age, Proceedings, pages 53–58. Kenneth W. Church and Patrick Hanks. 1990. Word association norms, mutual information, and lexicography. Computational Linguistics, 16:22–29. Elad Dinur, Dmitry Davidov, and Ari Rappoport. 2009. Unsupervised concept discovery in hebrew using simple unsupervised word prefix segmentation for hebrew and arabic. In EACL 2009 Workshop on Computational Approaches to Semitic Languages. Pascale Fung. 1998. A statistical view on bilingual lexicon extraction:from parallel corpora to nonparallel corpora. In The Third Conference of the Association for Machine Translation in the Americas. Nikesh Garera, Chris Callison-Burch, and David Yarowsky. 2009. Improving translation lexicon induction from monolingual corpora via dependency contexts and part-of-speech equivalences. In CoNLL. Hiroyuki Kaji and Toshiko Aizono. 1996. Extracting word correspondences from bilingual corpora based on word co-occurrence information. In COLING. Hiroyuki Kaji, Shin’ichi Tamamura, and Dashtseren Erdenebat. 2008. Automatic construction of a japanese-chinese dictionary via english. In LREC. Philipp Koehn and Kevin Knight. 2002. Learning a translation lexicon from monolingual corpora. In Proceedings of ACL Workshop on Unsupervised Lexical Acquisition. Adam Lopez. 2008. Statistical machine translation. ACM Computing Surveys, 40(3): 1–49. Mausam, Stephen Soderland, Oren Etzioni, Daniel S. Weld, Michael Skinner, and Jeff Bilmes. 2009. Compiling a massive, multilingual dictionary via probabilistic inference. In Proceedings of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing. Kyonghee Paik, Satoshi Shirai, and Hiromi Nakaiwa. 2004. Automatic construction of a transfer dictionary considering directionality. In COLING, Multilingual Linguistic Resources Workshop. Viktor Pekar, Ruslan Mitkov, Dimitar Blagoev, and Andrea Mulloni. 2006. Finding translations for lowfrequency words in comparable corpora. Machine Translation, 20:247 266. – Prolog. 2003. Practical Bilingual Dictionary: Spanish-Hebew/Hebrew-Spanish. Israel. Reinhard Rapp. 1999. Automatic identification of word translations from unrelated english and german corpora. In ACL. 106 Charles Schafer and David Yarowsky. 2002. Inducing translation lexicons via diverse similarity measures and bridge languages. In CoNLL. Hana Skoumalova. 2001. Bridge dictionaries as bridges between languages. International Journal of Corpus Linguistics, 6:95–105. Kumiko Tanaka and Hideya Iwasaki. 1996. Extraction of lexical translations from non-aligned corpora. In Conference on Computational linguistics. Kumiko Tanaka and Kyoji Umemura. 1994. Construction of a bilingual dictionary intermediated by a third language. In Conference on Computational Linguistics. Jerzy Tomaszczyk. 1998. The bilingual dictionary under review. In ZuriLEX’86. 107