nips nips2002 nips2002-69 nips2002-69-reference knowledge-graph by maker-knowledge-mining

69 nips-2002-Discriminative Learning for Label Sequences via Boosting

Source: pdf

Author: Yasemin Altun, Thomas Hofmann, Mark Johnson

Abstract: This paper investigates a boosting approach to discriminative learning of label sequences based on a sequence rank loss function. The proposed method combines many of the advantages of boosting schemes with the efficiency of dynamic programming methods and is attractive both, conceptually and computationally. In addition, we also discuss alternative approaches based on the Hamming loss for label sequences. The sequence boosting algorithm offers an interesting alternative to methods based on HMMs and the more recently proposed Conditional Random Fields. Applications areas for the presented technique range from natural language processing and information extraction to computational biology. We include experiments on named entity recognition and part-of-speech tagging which demonstrate the validity and competitiveness of our approach. 1

reference text

[1] M. Collins. Discriminative reranking for natural language parsing. In Proceedings 17th International Conference on Machine Learning, pages 175- 182. Morgan Kaufmann , San Francisco , CA, 2000.

[2] M. Collins. Ranking algorithms for named- entity extraction: Boosting and the voted perceptron. In Proceedings 40th Annual Meeting of the Association for Computational Linguistics (ACL), pages 489- 496, 2002.

[3] R. Durbin , S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.

[4] J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28:337- 374, 2000.

[5] S. Kakade, Y.W. Teh, and S. Roweis. An alternative objective function for Markovian fields. In Proceedings 19th International Conference on Machine Learning, 2002.

[6] J . Lafferty, A. McCallum, and F . Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. 18th International Conf. on Machine Learning, pages 282- 289. Morgan Kaufmann, San Francisco, CA, 200l.

[7] C. Manning and H. Schiitze. Foundations of Statistical Natural Language Processing. MIT Press, 1999.

[8] T. Minka. Algorithms for maximum-likelihood logistic regression. Technical report , CMU, Department of Statistics, TR 758 , 200l.

[9] R. Schapire and Y. Singer. Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3):297- 336, 1999.