acl acl2013 acl2013-259 acl2013-259-reference knowledge-graph by maker-knowledge-mining

259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning


Source: pdf

Author: Xiaojun Quan ; Chunyu Kit ; Yan Song

Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.


reference text

Ricardo Baeza-Yates and Berthier Ribeiro-Neto. 2011. Modern Information Retrieval: The Concepts and Technology Behind Search, 2nd ed., Harlow: Addison-Wesley. Jewel B. Barlow, Moghen M. Monahemi, and Dianne P. O’Leary. 1992. Constrained matrix Sylvester equations. In SIAM Journal on Matrix Analysis and Applications, 13(1): 1-9. Peter F. Brown, Jennifer C. Lai, Robert L. Mercer. 1991 . Aligning sentences in parallel corpora. In Proceedings of ACL’91, pages 169-176. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263- 311. Stanley F. Chen. 1993. Aligning sentences in bilingual corpora using lexical information. In Proceedings of ACL’93, pages 9-16. Yonggang Deng, Shankar Kumar, and William Byrne. 2007. Segmentation and alignment of parallel text for statistical machine translation. Natural Language Engineering, 13(3): 235-260. William A. Gale, Kenneth Ward Church. 1991 . A Program for aligning sentences in bilingual corpora. In Proceedings of ACL’91, pages 177-184. Martin Kay and Martin R ¨oscheisen. 1993. Texttranslation alignment. Computational Linguistics, 19(1): 121-142. Chunyu Kit, Jonathan J. Webster, King Kui Sin, Haihua Pan, and Heng Li. 2004. Clause alignment for bilingual HK legal texts: A lexical-based approach. International Journal of Corpus Linguistics, 9(1):2951. Chunyu Kit, Xiaoyue Liu, King Kui Sin, and Jonathan J. Webster. 2005. Harvesting the bitexts of the laws of Hong Kong from the Web. In The 5th Workshop on Asian Language Resources, pages 71-78. Judith L. Klavans and Evelyne Tzoukermann. 1990. The bicord system: Combining lexical information from bilingual corpora and machine readable dictionaries. In Proceedings of COLING’90, pages 174179. Philippe Langlais, Michel Simard, and Jean V ´eronis. 1998. Methods and practical issues in evaluating alignment techniques. In Proceedings of COLINGACL’98, pages 711-717. Zhanyi Liu, Haifeng Wang, Hua Wu, and Sheng Li. 2010. Improving statistical machine translation with monolingual collocation. In Proceedings of ACL 2010, pages 825-833. Xiaoyi Ma. 2006. Champollion: A robust parallel text sentence aligner. In LREC 2006, pages 489-492. Peng Li, Maosong Sun, Ping Xue. 2010. FastChampollion: a fast and robust sentence alignment algorithm. In Proceedings of ACL 2010: Posters, pages 710-718. Robert C. Moore. 2002. Fast and accurate sentence alignment of bilingual corpora. In Proceedings of AMTA 2002, page 135-144. Jian-Yun Nie, Michel Simard, Pierre Isabelle and Richard Durand. 1999. Cross-language information retrieval based on parallel texts and automatic mining of parallel texts from the Web. In Proceedings of SIGIR’99, pages 74-81. Martin F. Porter. 1980. An algorithm for suffix stripping. Program, 14(3): 130-137. Jinsong Su, Hua Wu, Haifeng Wang, Yidong Chen, Xiaodong Shi, Huailin Dong, Qun Liu. 2012. Translation model adaptation for statistical machine translation with monolingual topic information. In Proceedings of ACL 2012, Vol. 1, pages 459-468. Ben Taskar, Simon Lacoste-Julien and Dan Klein. 2005. A discriminative matching approach to word alignment. In Proceedings of HLT/EMNLP 2005, pages 73-80. D ´aniel Varga, P e´ter Hal a´csy, Andr a´s Kornai, Viktor Nagy, L a´szl o´ N ´emeth, Viktor Tr´ on. 2005. Parallel corpora for medium density languages. In Proceedings of RANLP 2005, pages 590-596. Dekai Wu. 1994. Aligning a parallel English-Chinese corpus statistically with lexical criteria. In Proceedings of ACL’94, pages 80-87. Dekai Wu. 2010. Alignment. Handbook of Natural Language Processing, 2nd ed., CRC Press. Dengyong Zhou, Olivier Bousquet, Thomas N. Lal, Jason Weston, Bernhard Schlkopf. 2004. Learning with local and global consistency. Advances in Neural Information Processing Systems, 16:321-328. Xiaojin Zhu, Zoubin Ghahramani and John Lafferty. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of ICML 2003, pages 912-919. 630