jmlr jmlr2008 jmlr2008-56 jmlr2008-56-reference knowledge-graph by maker-knowledge-mining

56 jmlr-2008-Magic Moments for Structured Output Prediction

Source: pdf

Author: Elisa Ricci, Tijl De Bie, Nello Cristianini

Abstract: Most approaches to structured output prediction rely on a hypothesis space of prediction functions that compute their output by maximizing a linear scoring function. In this paper we present two novel learning algorithms for this hypothesis class, and a statistical analysis of their performance. The methods rely on efﬁciently computing the ﬁrst two moments of the scoring function over the output space, and using them to create convex objective functions for training. We report extensive experimental results for sequence alignment, named entity recognition, and RNA secondary structure prediction. Keywords: structured output prediction, discriminative learning, Z-score, discriminant analysis, PAC bound

reference text

Y. Altun, T. Hofmann, and M. Johnson. Discriminative learning for label sequences via boosting. In Advances in Neural Information Processing Systems (NIPS), pages 977-984, Vancouver, British Columbia, 2003. 2843 R ICCI , D E B IE AND C RISTIANINI Algorithm 5 Extra formulas for afﬁne gap penalties 1 µe (i, j) := π(i, j) (µe (i − 1, j)π(i − 1, j) + π(i − 2, j) + µe (i, j − 1)π(i, j − 1) +π(i, j − 2) + µe (i − 1, j − 1)π(i − 1, j − 1)) 1 µo (i, j) := π(i, j) (µo (i − 1, j)π(i − 1, j) + π(i − 1, j) − π(i − 2, j)+ µo (i, j − 1)π(i, j − 1) + π(i, j − 1) − π(i, j − 2) + µo (i − 1, j − 1)π(i − 1, j − 1)) 1 voo (i, j) := π(i, j) (voo (i − 1, j − 1)π(i − 1, j − 1) + voo (i − 1, j)π(i − 1, j) +2(µo (i − 1, j)π(i − 1, j) − µo (i − 2, j)π(i − 2, j) − π(i − 2, j) + π(i − 3, j)) +π(i − 1, j) − π(i − 2, j) + voo (i, j − 1)π(i, j − 1) + 2(µo (i, j − 1)π(i, j − 1) −µo (i, j − 2)π(i, j − 2) − π(i, j − 2) + π(i, j − 3)) + π(i, j − 1) − π(i, j − 2) 1 vee (i, j) := π(i, j) (vee (i − 1, j − 1)π(i − 1, j − 1) + vee (i − 1, j)π(i − 1, j) +2µe (i − 2, j)π(i − 2, j) + 2π(i − 3, j) + π(i − 2, j) + vee (i, j − 1)π(i, j − 1) +2µe (i, j − 2)π(i, j − 2) + 2π(i, j − 3) + π(i, j − 2)) 1 vmo (i, j) := π(i, j) ((vmo (i − 1, j) + µm (i − 1, j))π(i − 1, j) − µm (i − 2, j)π(i − 2, j) +(vmo (i, j − 1) + µm (i, j − 1))π(i, j − 1) − µm (i, j − 2)π(i, j − 2) +(vmo (i − 1, j − 1) + Mµo (i − 1, j − 1))π(i − 1, j − 1)) 1 vme (i, j) := π(i, j) ((vme (i − 1, j − 1) + Mµe (i − 1, j − 1))π(i − 1, j − 1) +vme (i − 1, j)π(i − 1, j) + µm (i − 2, j)π(i − 2, j) + vme (i, j − 1)π(i, j − 1) +µe (i, j − 2)π(i, j − 2)) 1 veo (i, j) := π(i, j) (veo (i − 1, j − 1)π(i − 1, j − 1) + veo (i, j − 1)π(i, j − 1) +µo (i, j − 2)π(i, j − 2) + π(i, j − 2) − 2π(i, j − 3) + µe (i, j − 1)π(i, j − 1) −µe (i, j − 2)π(i, j − 2) + veo (i − 1, j)π(i − 1, j) + µo (i − 2, j)π(i − 2, j) +π(i − 2, j) − 2π(i − 3, j) + µe (i − 1, j)π(i − 1, j) − µe (i − 2, j)π(i − 2, j)) Y. Altun, T. Hofmann, and A. J. Smola. Gaussian process classiﬁcation for segmenting and annotating sequences. In Proceedings of the Twenty-ﬁrst International Conference on Machine Learning (ICML), Banff, Alberta, Canada, 2004. Y. Altun, I. Tsochantaridis, T. Hofmann. Hidden Markov support vector machines. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 3-10, Washington, DC, USA, 2003. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463-482, 2002. H.S. Booth, J.H. Maindonald, S.R. Wilson, and J.E. Gready. An efﬁcient Z-score algorithm for assessing sequence alignments. Journal of Computational Biology, 11(4):616-25, 2004. 2844 M AGIC M OMENTS FOR S TRUCTURED O UTPUT P REDICTION M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1-8, 2002. M. Collins. Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 489-496, 2002. C. B. Do, D. A. Woods, and S. Batzoglou. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22:e90-8, 2006. R.F. Doolittle. Similar amino acid sequences: chance or common ancestry. Science, 214:149-159, 1981. R. D. Dowell and S. R. Eddy. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 5(71):1-14, 2004. S. Elizalde and K. Woods. Bounds on the number of inference functions of a graphical model. Statistica Sinica, 17:1395-1415, 2007. Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efﬁcient boosting algorithm for combining preferences. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 170-178, Madison, Wisconson, USA, 1998. S. Grifﬁths-Jones, A. Bateman, M. Marshall, A. Khanna, and S.R. Eddy. Rfam: an RNA family database. Nucleic Acids Research, 31(1):439-441, 2003. D. Gusﬁeld, K. Balasubramanian, and D. Naor. Parametric optimization of sequence alignment. Algorithmica, 12:312-326, 1994. D. Gusﬁeld and P. Stelling. Parametric and inverse-parametric sequence alignment with XPARAL. Methods in Enzymology, 266:481-494, 1996. T. Joachims, T. Galor, and R. Elber, Learning to align sequences: a maximum-margin approach, In New Algorithms for Macromolecular Simulation, B. Leimkuhler, LNCS Vol. 49, Springer, 2005. J. Kececioglu and E. Kim, Simple and fast inverse alignment, In Proceedings of the Tenth ACM Conference on Research in Computational Molecular Biology (RECOMB), pages 441-455, Venice, Italy, 2006. T. Kudo. CRF++: Yet another CRF toolkit, 2005. [http://crfpp.sourceforge.net]. J. Lafferty, A. McCallum, and F. Pereira. Conditional random ﬁelds: probabilistic models for segmenting and labeling data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pages 282-289, Williamstown, MA, USA, 2001. J. Lafferty, X. Zhu, and Y. Liu. Kernel conditional random ﬁelds: representation and clique selection. In Proceedings of the Twenty-ﬁrst International Conference on Machine Learning (ICML), pages 64-71, Banff, Alberta, Canada, 2004. 2845 R ICCI , D E B IE AND C RISTIANINI C. D. Manning and H. Schuetze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, 1999. D. McAllester. Generalization bounds and consistency for structured labeling in predicting structured data. Predicting Structured Data, edited by G. Bakir, T. Hofmann, B. Scholkopf, A. Smola, B. Taskar, and S. V. N. Vishwanathan, MIT Press, 2007. A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 591-598, Stanford, CA, USA, 2000. C. McDiarmid. On the method of bounded differences, London Mathematical Society Lecture Note Series, 141:148-188, 1989. S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48:443-453, 1970. L. Pachter and B. Sturmfels. Parametric inference for biological sequence analysis. In Proceedings of the National Academy of Sciences USA, 101(46):16138-16143, 2004. L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, 1989. E. Ricci, T. De Bie, and N. Cristianini. Learning to align: a statistical approach. In Proceedings of the Seventh International Symposium on Intelligent Data Analysis (IDA), pages 25-36, Ljubljana, Slovenia, 2007. K. Sato and Y. Sakakibara. RNA secondary structural alignment with conditional random ﬁelds. Bioinformatics, 21(Suppl 2):ii237-ii242, 2005. R. Schapire and Y. Singer. Improved boosting algorithms using conﬁdencerated predictions. Machine Learning, 37(3):297-336, 1999. F. Sun, D. Fernandez-Baca, and W. Yu. Inverse parametric sequence alignment. Journal of Algorithms, 53(1):36-54, 2004. B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In In Advances in Neural Information Processing Systems (NIPS), Vancouver and Whistler, British Columbia, Canada, 2003. B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning. Max-margin parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1-8, Barcelona, Spain, 2004. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research, 6(9):1453-1484, 2005. V. N. Vapnik. Statistical Learning Theory. Wiley & Sons, Inc., 1998. D. H. Younger. Recognition and parsing of context-free languages in time n 3 . Information and Control, 2(10):189-208, 1967. 2846