jmlr jmlr2008 jmlr2008-56 jmlr2008-56-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Elisa Ricci, Tijl De Bie, Nello Cristianini
Abstract: Most approaches to structured output prediction rely on a hypothesis space of prediction functions that compute their output by maximizing a linear scoring function. In this paper we present two novel learning algorithms for this hypothesis class, and a statistical analysis of their performance. The methods rely on efficiently computing the first two moments of the scoring function over the output space, and using them to create convex objective functions for training. We report extensive experimental results for sequence alignment, named entity recognition, and RNA secondary structure prediction. Keywords: structured output prediction, discriminative learning, Z-score, discriminant analysis, PAC bound
Y. Altun, T. Hofmann, and M. Johnson. Discriminative learning for label sequences via boosting. In Advances in Neural Information Processing Systems (NIPS), pages 977-984, Vancouver, British Columbia, 2003. 2843 R ICCI , D E B IE AND C RISTIANINI Algorithm 5 Extra formulas for affine gap penalties 1 µe (i, j) := π(i, j) (µe (i − 1, j)π(i − 1, j) + π(i − 2, j) + µe (i, j − 1)π(i, j − 1) +π(i, j − 2) + µe (i − 1, j − 1)π(i − 1, j − 1)) 1 µo (i, j) := π(i, j) (µo (i − 1, j)π(i − 1, j) + π(i − 1, j) − π(i − 2, j)+ µo (i, j − 1)π(i, j − 1) + π(i, j − 1) − π(i, j − 2) + µo (i − 1, j − 1)π(i − 1, j − 1)) 1 voo (i, j) := π(i, j) (voo (i − 1, j − 1)π(i − 1, j − 1) + voo (i − 1, j)π(i − 1, j) +2(µo (i − 1, j)π(i − 1, j) − µo (i − 2, j)π(i − 2, j) − π(i − 2, j) + π(i − 3, j)) +π(i − 1, j) − π(i − 2, j) + voo (i, j − 1)π(i, j − 1) + 2(µo (i, j − 1)π(i, j − 1) −µo (i, j − 2)π(i, j − 2) − π(i, j − 2) + π(i, j − 3)) + π(i, j − 1) − π(i, j − 2) 1 vee (i, j) := π(i, j) (vee (i − 1, j − 1)π(i − 1, j − 1) + vee (i − 1, j)π(i − 1, j) +2µe (i − 2, j)π(i − 2, j) + 2π(i − 3, j) + π(i − 2, j) + vee (i, j − 1)π(i, j − 1) +2µe (i, j − 2)π(i, j − 2) + 2π(i, j − 3) + π(i, j − 2)) 1 vmo (i, j) := π(i, j) ((vmo (i − 1, j) + µm (i − 1, j))π(i − 1, j) − µm (i − 2, j)π(i − 2, j) +(vmo (i, j − 1) + µm (i, j − 1))π(i, j − 1) − µm (i, j − 2)π(i, j − 2) +(vmo (i − 1, j − 1) + Mµo (i − 1, j − 1))π(i − 1, j − 1)) 1 vme (i, j) := π(i, j) ((vme (i − 1, j − 1) + Mµe (i − 1, j − 1))π(i − 1, j − 1) +vme (i − 1, j)π(i − 1, j) + µm (i − 2, j)π(i − 2, j) + vme (i, j − 1)π(i, j − 1) +µe (i, j − 2)π(i, j − 2)) 1 veo (i, j) := π(i, j) (veo (i − 1, j − 1)π(i − 1, j − 1) + veo (i, j − 1)π(i, j − 1) +µo (i, j − 2)π(i, j − 2) + π(i, j − 2) − 2π(i, j − 3) + µe (i, j − 1)π(i, j − 1) −µe (i, j − 2)π(i, j − 2) + veo (i − 1, j)π(i − 1, j) + µo (i − 2, j)π(i − 2, j) +π(i − 2, j) − 2π(i − 3, j) + µe (i − 1, j)π(i − 1, j) − µe (i − 2, j)π(i − 2, j)) Y. Altun, T. Hofmann, and A. J. Smola. Gaussian process classification for segmenting and annotating sequences. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML), Banff, Alberta, Canada, 2004. Y. Altun, I. Tsochantaridis, T. Hofmann. Hidden Markov support vector machines. In Proceedings of the Twentieth International Conference on Machine Learning (ICML), pages 3-10, Washington, DC, USA, 2003. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463-482, 2002. H.S. Booth, J.H. Maindonald, S.R. Wilson, and J.E. Gready. An efficient Z-score algorithm for assessing sequence alignments. Journal of Computational Biology, 11(4):616-25, 2004. 2844 M AGIC M OMENTS FOR S TRUCTURED O UTPUT P REDICTION M. Collins. Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1-8, 2002. M. Collins. Ranking algorithms for named-entity extraction: boosting and the voted perceptron. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pages 489-496, 2002. C. B. Do, D. A. Woods, and S. Batzoglou. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22:e90-8, 2006. R.F. Doolittle. Similar amino acid sequences: chance or common ancestry. Science, 214:149-159, 1981. R. D. Dowell and S. R. Eddy. Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction. BMC Bioinformatics, 5(71):1-14, 2004. S. Elizalde and K. Woods. Bounds on the number of inference functions of a graphical model. Statistica Sinica, 17:1395-1415, 2007. Y. Freund, R. D. Iyer, R. E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In Proceedings of the Fifteenth International Conference on Machine Learning (ICML), pages 170-178, Madison, Wisconson, USA, 1998. S. Griffiths-Jones, A. Bateman, M. Marshall, A. Khanna, and S.R. Eddy. Rfam: an RNA family database. Nucleic Acids Research, 31(1):439-441, 2003. D. Gusfield, K. Balasubramanian, and D. Naor. Parametric optimization of sequence alignment. Algorithmica, 12:312-326, 1994. D. Gusfield and P. Stelling. Parametric and inverse-parametric sequence alignment with XPARAL. Methods in Enzymology, 266:481-494, 1996. T. Joachims, T. Galor, and R. Elber, Learning to align sequences: a maximum-margin approach, In New Algorithms for Macromolecular Simulation, B. Leimkuhler, LNCS Vol. 49, Springer, 2005. J. Kececioglu and E. Kim, Simple and fast inverse alignment, In Proceedings of the Tenth ACM Conference on Research in Computational Molecular Biology (RECOMB), pages 441-455, Venice, Italy, 2006. T. Kudo. CRF++: Yet another CRF toolkit, 2005. [http://crfpp.sourceforge.net]. J. Lafferty, A. McCallum, and F. Pereira. Conditional random fields: probabilistic models for segmenting and labeling data. In Proceedings of the Eighteenth International Conference on Machine Learning (ICML), pages 282-289, Williamstown, MA, USA, 2001. J. Lafferty, X. Zhu, and Y. Liu. Kernel conditional random fields: representation and clique selection. In Proceedings of the Twenty-first International Conference on Machine Learning (ICML), pages 64-71, Banff, Alberta, Canada, 2004. 2845 R ICCI , D E B IE AND C RISTIANINI C. D. Manning and H. Schuetze. Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA, 1999. D. McAllester. Generalization bounds and consistency for structured labeling in predicting structured data. Predicting Structured Data, edited by G. Bakir, T. Hofmann, B. Scholkopf, A. Smola, B. Taskar, and S. V. N. Vishwanathan, MIT Press, 2007. A. McCallum, D. Freitag, and F. Pereira. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, pages 591-598, Stanford, CA, USA, 2000. C. McDiarmid. On the method of bounded differences, London Mathematical Society Lecture Note Series, 141:148-188, 1989. S. B. Needleman and C. D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology, 48:443-453, 1970. L. Pachter and B. Sturmfels. Parametric inference for biological sequence analysis. In Proceedings of the National Academy of Sciences USA, 101(46):16138-16143, 2004. L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257-286, 1989. E. Ricci, T. De Bie, and N. Cristianini. Learning to align: a statistical approach. In Proceedings of the Seventh International Symposium on Intelligent Data Analysis (IDA), pages 25-36, Ljubljana, Slovenia, 2007. K. Sato and Y. Sakakibara. RNA secondary structural alignment with conditional random fields. Bioinformatics, 21(Suppl 2):ii237-ii242, 2005. R. Schapire and Y. Singer. Improved boosting algorithms using confidencerated predictions. Machine Learning, 37(3):297-336, 1999. F. Sun, D. Fernandez-Baca, and W. Yu. Inverse parametric sequence alignment. Journal of Algorithms, 53(1):36-54, 2004. B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In In Advances in Neural Information Processing Systems (NIPS), Vancouver and Whistler, British Columbia, Canada, 2003. B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning. Max-margin parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1-8, Barcelona, Spain, 2004. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables, Journal of Machine Learning Research, 6(9):1453-1484, 2005. V. N. Vapnik. Statistical Learning Theory. Wiley & Sons, Inc., 1998. D. H. Younger. Recognition and parsing of context-free languages in time n 3 . Information and Control, 2(10):189-208, 1967. 2846