emnlp emnlp2013 emnlp2013-167 emnlp2013-167-reference knowledge-graph by maker-knowledge-mining

167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

Source: pdf

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

Abstract: We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model’s alignment score approaches the state ofthe art.

reference text

Jes u´s Andr e´s-Ferrer and Alfons Juan. 2009. A phrasebased hidden semi-markov approach to machine translation. In Procedings of European Association for Machine Translation (EAMT), Barcelona, Spain, May. European Association for Machine Translation. Nicholas Andrews, Jason Eisner, and Mark Dredze. 2012. Name phylogeny: a generative model of string variation. In Proceedings of EMNLP 2012. Mohit Bansal, Chris Quirk, and Robert Moore. 2011. Gappy phrasal alignment by agreement. In Proceedings of ACL, Portland, Oregon, June. Regina Barzilay and Lillian Lee. 2003. Learning to paraphrase: An unsupervised approach using multiplesequence alignment. In Proceedings of NAACL, pages 16–23. Phil Blunsom and Trevor Cohn. 2006. Discriminative word alignment with conditional random fields. In Proceedings of ACL2006, pages 65–72. Chris Brockett. 2007. Aligning the RTE 2006 corpus. Technical report, Microsoft Research. Peter F Brown, Vincent J Della Pietra, Stephen A Della Pietra, and Robert L Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, 19(2):263–3 11. Nathanael Chambers, Daniel Cer, Trond Grenager, David Hall, Chloe Kiddon, Bill MacCartney, MarieCatherine de Marneffe, Daniel Ramage, Eric Yeh, and Christopher D Manning. 2007. Learning alignments and leveraging natural logic. In Proceedings of the ACL-PASCAL Workshop on Textual Entailment and Paraphrasing, pages 165–170. Trevor Cohn, Chris Callison-Burch, and Mirella Lapata. 2008. Constructing corpora for the development and evaluation of paraphrase systems. Computational Linguistics, 34(4):597–614, December. Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan, and TatSeng Chua. 2005. Question answering passage retrieval using dependency relations. In Proceedings 599 of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’05, pages 400–407, New York, NY, USA. ACM. Dipanjan Das and Noah A. Smith. 2009. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 468–476, Suntec, Singapore, August. Association for Computational Linguistics. Marie-Catherine de Marneffe, Bill MacCartney, Trond Grenager, Daniel Cer, Anna Rafferty, and Christopher D Manning. 2006. Learning to distinguish valid textual entailments. In Second Pascal RTE Challenge Workshop. Yonggang Deng and William Byrne. 2008. HMM word and phrase alignment for statistical machine translation. Audio, Speech, and Language Processing, IEEE Transactions on, 16(3):494–507. Michael Denkowski and Alon Lavie. 2011. Meteor 1.3: Automatic Metric for Reliable Optimization and Evaluation of Machine Translation Systems. In Proceedings of the EMNLP 2011 Workshop on Statistical Machine Translation. Bill Dolan, Chris Quirk, and Chris Brockett. 2004. Unsupervised construction of large paraphrase corpora: exploiting massively parallel news sources. In Proceedings of COLING, Stroudsburg, PA, USA. Christiane Fellbaum. 1998. WordNet: An Electronic Lexical Database. Juri Ganitkevitch, Benjamin Van Durme, and Chris Callison-Burch. 2013. PPDB: The Paraphrase Database. In Proceedings of NAACL-HLT, pages 758– 764. Kevin Gimpel and Noah A. Smith. 2010. Softmaxmargin CRFs: training log-linear models with cost functions. In NAACL 2010, pages 733–736. Lushan Han, Abhay Kashyap, Tim Finin, James Mayfield, and Jonathan Weese. 2013. UMBC-EBIQUITYCORE: Semantic Textual Similarity Systems. In Proceedings of the Second Joint Conference on Lexical and Computational Semantics. Michael Heilman and Noah A. Smith. 2010. Tree edit models for recognizing textual entailments, paraphrases, and answers to questions. In Proceedings of NAACL 2010, pages 1011–1019, Los Angeles, California, June. Philipp Koehn. 2010. Statistical Machine Translation. Cambridge University Press, New York, NY, USA. Milen Kouylekov and Bernardo Magnini. 2005. Recognizing textual entailment with tree edit distance algorithms. In PASCAL Challenges on RTE, pages 17–20. John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, ICML ’01, pages 282–289, San Francisco, CA, USA. Morgan Kaufmann Publishers Inc. Percy Liang, Ben Taskar, and Dan Klein. 2006. Alignment by agreement. In Proceedings of NAACL. Bill MacCartney and Christopher D Manning. 2008. Modeling semantic containment and exclusion in natural language inference. In Proceedings of ACL 2008, pages 521–528. Bill MacCartney, Michel Galley, and Christopher D Manning. 2008. A phrase-based alignment model for natural language inference. In Proceedings of EMNLP, pages 802–81 1. Daniel Marcu and William Wong. 2002. A phrase-based, joint probability model for statistical machine translation. In Proceedings of EMNLP-2002, pages 133–139. Yashar Mehdad. 2009. Automatic cost estimation for tree edit distance using particle swarm optimization. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 289–292. Franz Josef Och and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational linguistics, 29(1): 19–5 1. Mari Ostendorf, Vassilios V Digalakis, and Owen A Kimball. 1996. From HMM’s to segment models: a unified view of stochastic modeling for speech recognition. IEEE Transactions on Speech and Audio Processing, 4(5):360–378. Bo Pang, Kevin Knight, and Daniel Marcu. 2003. Syntax-based alignment of multiple translations: Extracting paraphrases and generating new sentences. In Proceedings of NAACL, pages 102–109. Vasin Punyakanok, Dan Roth, and Wen T. Yih. 2004. Mapping Dependencies Trees: An Application to Question Answering. In Proceedings of the 8th International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, Florida. Michael Roth and Anette Frank. 2012. Aligning predicates across monolingual comparable texts using graph-based clustering. In Proceedings of EMNLPCoNLL, pages 171–182, Jeju Island, Korea, July. Sarawagi Sarawagi and William Cohen. 2004. Semimarkov conditional random fields for information extraction. Advances in Neural Information Processing Systems, 17: 1185–1 192. Kapil Thadani and Kathleen McKeown. 2011. Optimal and syntactically-informed decoding for monolingual phrase-based alignment. In Proceedings of ACL short. 600 Kapil Thadani, Scott Martin, and Michael White. 2012. A joint phrasal and dependency model for paraphrase alignment. In Proceedings of COLING 2012: Posters, pages 1229–1238, Mumbai, India, December. The COLING 2012 Organizing Committee. Stephan Vogel, Hermann Ney, and Christoph Tillmann. 1996. HMM-based word alignment in statistical translation. In Proceedings of the 16th conference on Com- putational linguistics - Volume 2, COLING ’96, pages 836–841. Stephen Wan, Mark Dras, Robert Dale, and C ´ecile Paris. 2006. Using dependency-based features to take the “para-farce” out of paraphrase. In Proceedings of the Australasian Language Technology Workshop. Mengqiu Wang and Christopher D. Manning. 2010. Probabilistic tree-edit models with structured latent variables for textual entailment and question answering. In Proceedings of COLING, pages 1164–1 172, Stroudsburg, PA, USA. Mengqiu Wang, Noah A. Smith, and Teruko Mitamura. 2007. What is the Jeopardy Model? A QuasiSynchronous Grammar for QA. In Proceedings of EMNLP-CoNLL, pages 22–32, Prague, Czech Republic, June. Xuchen Yao, Benjamin Van Durme, Chris CallisonBurch, and Peter Clark. 2013a. A Lightweight and High Performance Monolingual Word Aligner. In Proceedings of ACL 2013 short, Sofia, Bulgaria. Xuchen Yao, Benjamin Van Durme, Peter Clark, and Chris Callison-Burch. 2013b. Answer Extraction as Sequence Tagging with Tree Edit Distance. In Proceedings of NAACL 2013.