emnlp emnlp2010 emnlp2010-30 emnlp2010-30-reference knowledge-graph by maker-knowledge-mining

30 emnlp-2010-Confidence in Structured-Prediction Using Confidence-Weighted Models

Source: pdf

Author: Avihai Mejer ; Koby Crammer

Abstract: Confidence-Weighted linear classifiers (CW) and its successors were shown to perform well on binary and multiclass NLP problems. In this paper we extend the CW approach for sequence learning and show that it achieves state-of-the-art performance on four noun phrase chucking and named entity recognition tasks. We then derive few algorithmic approaches to estimate the prediction’s correctness of each label in the output sequence. We show that our approach provides a reliable relative correctness information as it outperforms other alternatives in ranking label-predictions according to their error. We also show empirically that our methods output close to absolute estimation of error. Finally, we show how to use this information to improve active learning.

reference text

[Cesa-Bianchi and Lugosi2006] N. Cesa-Bianchi and G. Lugosi. 2006. Prediction, Learning, and Games. Cambridge University Press, New York, NY, USA. [Collins2002] M. Collins. 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In EMNLP. [Crammer et al.2005] K. Crammer, R. Mcdonald, and F. Pereira. 2005. Scalable large-margin online learning for structured classification. Tech. report, Dept. of CIS, U. of Penn. [Crammer et al.2008] K. Crammer, M. Dredze, and F. Pereira. 2008. Exact confidence-weighted learning. In NIPS 22. [Crammer et al.2009a] K. Crammer, M. Dredze, and A. Kulesza. 2009a. Multi-class confidence weighted algorithms. In EMNLP. [Crammer et al.2009b] K. Crammer, A. Kulesza, and M. Dredze. 2009b. Adaptive regularization of weighted vectors. In NIPS 23. [Culotta and McCallum2004] A. Culotta and A. McCallum. 2004. Confidence estimation for information extraction. In HLT-NAACL, pages 109–1 12. [Dredze and Crammer2008] M. Dredze and K. Crammer. 2008. Active learning with confidence. In ACL. [Dredze et al.2008] M. Dredze, K. Crammer, and F. Pereira. 2008. Confidence-weighted linear classification. In ICML. [Kim et al.2000] E.F. Tjong Kim, S. Buchholz, and K. Sang. 2000. Introduction to the conll-2000 shared task: Chunking. 981 [Kristjansson et al.2004] T. Kristjansson, A. Culotta, P. Viola, and A. McCallum. 2004. Interactive information extraction with constrained conditional random fields. In AAAI, pages 412–418. [Lafferty et al.2001] J. Lafferty, A. McCallum, and F. Pereira. 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. [McCallum2002] Andrew McCallum. 2002. MALLET: A machine learning for language toolkit. http : / / mal let . c s .uma s s .edu. [McDonald et al.2005a] R.T. McDonald, K. Crammer, and F. Pereira. 2005a. Flexible text segmentation with structured multilabel classification. In HLT/EMNLP. [McDonald et al.2005b] Ryan T. McDonald, Koby Crammer, and Fernando C. N. Pereira. 2005b. Online largemargin training of dependency parsers. In ACL. [Scheffer et al.2001] Tobias Scheffer, Christian Decomain, and Stefan Wrobel. 2001. Active hidden markov models for information extraction. In IDA, pages 309–3 18, London, UK. Springer-Verlag. [Sha and Pereira2003] Fei Sha and Fernando Pereira. 2003. Shallow parsing with conditional random fields. In Proc. of HLT-NAACL, pages 213–220. [Shimizu and Haas2006] N. Shimizu and A. Haas. 2006. Exact decoding for jointly labeling and chunking sequences. In COLING/ACL, pages 763–770. [Taskar et al.2003] B. Taskar, C. Guestrin, and D. Koller. 2003. Max-margin markov networks. In nips. [Tjong and Sang2002] Erik F. Tjong and K. Sang. 2002. Introduction to the conll-2002 shared task: Languageindependent named entity recognition. In CoNLL. [Tjong et al.2003] E.F. Tjong, K. Sang, and F. De Meulder. 2003. Introduction to the conll-2003 shared task: Language-independent named entity recognition. In CoNLL, pages 142–147. [Tong and Koller2001] S. Tong and D. Koller. 2001. Support vector machine active learning with applica- tions to text classification. In JMLR, pages 999–1006. [Ueffing and Ney2007] Nicola Ueffing and Hermann Ney. 2007. Word-level confidence estimation for machine translation. Comput. Linguist., 33(1):9–40. [Wick et al.2009] M. Wick, K. Rohanimanesh, A. Culotta, and A. McCallum. 2009. Samplerank: Learning preferences from atomic gradients. In NIPS Workshop on Advances in Ranking.