emnlp emnlp2011 emnlp2011-96 emnlp2011-96-reference knowledge-graph by maker-knowledge-mining

96 emnlp-2011-Multilayer Sequence Labeling


Source: pdf

Author: Ai Azuma ; Yuji Matsumoto

Abstract: In this paper, we describe a novel approach to cascaded learning and inference on sequences. We propose a weakly joint learning model on cascaded inference on sequences, called multilayer sequence labeling. In this model, inference on sequences is modeled as cascaded decision. However, the decision on a sequence labeling sequel to other decisions utilizes the features on the preceding results as marginalized by the probabilistic models on them. It is not novel itself, but our idea central to this paper is that the probabilistic models on succeeding labeling are viewed as indirectly depending on the probabilistic models on preceding analyses. We also propose two types of efficient dynamic programming which are required in the gradient-based optimization of an objective function. One of the dynamic programming algorithms resembles back propagation algorithm for mul- tilayer feed-forward neural networks. The other is a generalized version of the forwardbackward algorithm. We also report experiments of cascaded part-of-speech tagging and chunking of English sentences and show effectiveness of the proposed method.


reference text

M. Berz. 1992. Automatic differentiation as nonarchimedean analysis. In Computer Arithmetic and Enclosure, pages 439–450. R.C. Bunescu. 2008. Learning with probabilistic features for improved pipeline models. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pages 670–679. M. Collins and N. Duffy. 2002. New ranking algorithms for parsing and tagging: Kernels over discrete structures, and the voted perceptron. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 263–270. Association for Computational Linguistics. G.F. Corliss, C. Faure, and A. Griewank. 2002. Automatic differentiation of algorithms: from simulation to optimization. Springer Verlag. J.R. Finkel, C.D. Manning, and A.Y. Ng. 2006. Solving the problem of cascading errors: Approximate bayesian inference for linguistic annotation pipelines. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 618– 626. J. Lafferty, A. McCallum, and F. Pereira. 2001. Con- ditional random fields: Probabilistic models for menting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 282–289. Z. Li and J. Eisner. 2009. First-and second-order expectation semirings with applications to minimum-risk training on translation forests. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1-Volume 1, pages 40– 51. M.P. Marcus, M.A. Marcinkiewicz, and B. Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational linguistics, 19(2):330. L.A. Ramshaw and M.P. Marcus. 1995. Text chunking using transformation-based learning. In Proceedings of the Third ACL Workshop on Very Large Corpora, pages 82–94. Cambridge MA, USA. S. Sarawagi and W.W. Cohen. 2005. Semi-markov conditional random fields for information extraction. Advances in Neural Information Processing Systems, 17: 1185–1 192. C. Sutton, A. McCallum, and K. Rohanimanesh. 2007. Dynamic conditional random fields: Factorized probabilistic models for labeling and segmenting sequence data. The Journal of Machine Learning Research, 8:693–723. B. Taskar, C. Guestrin, and D. Koller. 2003. Max-margin Markov networks. In Advances in Neural Information Processing Systems 16. RE Wengert. 1964. A simple automatic derivative evaluation program. Communications of the ACM, 7(8):464. 637 seg-