acl acl2010 acl2010-180 acl2010-180-reference knowledge-graph by maker-knowledge-mining

180 acl-2010-On Jointly Recognizing and Aligning Bilingual Named Entities

Source: pdf

Author: Yufeng Chen ; Chengqing Zong ; Keh-Yih Su

Abstract: We observe that (1) how a given named entity (NE) is translated (i.e., either semantically or phonetically) depends greatly on its associated entity type, and (2) entities within an aligned pair should share the same type. Also, (3) those initially detected NEs are anchors, whose information should be used to give certainty scores when selecting candidates. From this basis, an integrated model is thus proposed in this paper to jointly identify and align bilingual named entities between Chinese and English. It adopts a new mapping type ratio feature (which is the proportion of NE internal tokens that are semantically translated), enforces an entity type consistency constraint, and utilizes additional monolingual candidate certainty factors (based on those NE anchors). The experi- ments show that this novel approach has substantially raised the type-sensitive F-score of identified NE-pairs from 68.4% to 81.7% (42.1% F-score imperfection reduction) in our Chinese-English NE alignment task.

reference text

Al-Onaizan, Yaser, and Kevin Knight. 2002. Translating Named Entities Using Monolingual and Bilingual resources. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 400-408. Berger, Adam L., Stephen A. Della Pietra and Vincent J. Della Pietra. 1996. A Maximum Entropy Approach to Natural Language Processing. Computational Linguistics, 22(1):39-72, March. Chen, Hsin-His, Changhua Yang and Ying Lin. 2003. Learning Formulation and Transformation Rules for Multilingual Named Entities. In Proceedings of the ACL 2003 Workshop on Multilingual and Mixed-language Named Entity Recognition, pages 1-8. Feng, Donghui, Yajuan Lv and Ming Zhou. 2004. A New Approach for English-Chinese Named Entity Alignment. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2004), pages 372-379. Huang, Fei, Stephan Vogel and Alex Waibel. 2003. Automatic Extraction of Named Entity Translingual Equivalence Based on Multi-Feature Cost Minimization. In Proceedings of ACL’03, Workshop on Multilingual and Mixed-language Named Entity Recognition. Sappora, Japan. Ji, Heng and Ralph Grishman. 2006. Analysis and Repair of Name Tagger Errors. In Proceedings of COLING/ACL 06, Sydney, Australia. Lee, Chun-Jen, Jason S. Chang and Jyh-Shing R. Jang. 2006. Alignment of Bilingual Named Entities in Parallel Corpora Using Statistical Models and Multiple Knowledge Sources. ACM Transactions on Asian Language Information Processing (TALIP), 5(2): 121-145. Moore, R. C.. 2003. Learning Translations of NamedEntity Phrases from Parallel Corpora. In Proceedings of 10th Conference of the European Chapter of ACL, Budapest, Hungary. Och, Franz Josef. 2003. Minimum Error Rate Training in Statistical Machine Translation. In Proceed- ings of the 41st Annual Conference of the Association for Computational Linguistics (ACL). July 810, 2003. Sapporo, Japan. Pages: 160-167. Stolcke, A. 2002. SRILM -- An Extensible Language Modeling Toolkit. Proc. Intl. Conf. on Spoken Language Processing, vol. 2, pp. 901-904, Denver. Wu, Youzheng, Jun Zhao and Bo Xu. 2005. Chinese Named Entity Recognition Model Based on Multiple Features. In Proceedings of HLT/EMNLP 2005, pages 427-434. Zhang, Ying, Stephan Vogel, and Alex Waibel, 2004. Interpreting BLEU/NIST Scores: How Much Improvement Do We Need to Have a Better System? In Proceedings of the 4th International Conference on Language Resources and Evaluation, pages 2051--2054. 639