nips nips2006 nips2006-108 nips2006-108-reference knowledge-graph by maker-knowledge-mining

108 nips-2006-Large Scale Hidden Semi-Markov SVMs

Source: pdf

Author: Gunnar Rätsch, Sören Sonnenburg

Abstract: We describe Hidden Semi-Markov Support Vector Machines (SHM SVMs), an extension of HM SVMs to semi-Markov chains. This allows us to predict segmentations of sequences based on segment-based features measuring properties such as the length of the segment. We propose a novel technique to partition the problem into sub-problems. The independently obtained partial solutions can then be recombined in an efﬁcient way, which allows us to solve label sequence learning problems with several thousands of labeled sequences. We have tested our algorithm for predicting gene structures, an important problem in computational biology. Results on a well-known model organism illustrate the great potential of SHM SVMs in computational biology. 1

reference text

[1] Y. Altun, T. Hofmann, and A. Smola. Gaussian process classiﬁcation for segmenting and annotating sequences. In Proc. ICML 2004, 2004.

[2] Y. Altun, D. McAllester, and M. Belkin. Maximum margin semi-supervised learning for structured variables. In Proc. NIPS 2005, 2006.

[3] Y. Altun, I. Tsochantaridis, and T. Hofmann. Hidden Markov support vector machines. In T. Fawcett, editor, Proc. 20th Int. Conf. Mach. Learn., pages 3–10, 2003.

[4] B. Brejova, D.G. Brown, M. Li, and T. Vinar. ExonHunter: a comprehensive approach to gene ﬁnding. Bioinformatics, 21(Suppl 1):i57–i65, 2005.

[5] C. Burge and S. Karlin. Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology, 268:78–94, 1997.

[6] X. Ge. Segmental Semi-Markov Models and Applications to Sequence Analysis. PhD thesis, University of California, Irvine, 2002.

[7] J. Janssen and N. Limnios. Semi-Markov Models and Applications. Kluwer Academic, 1999.

[8] I. Korf. Gene ﬁnding in novel genomes. BMC Bioinformatics, 5(59), 2004.

[9] D. Kulp, D. Haussler, M.G. Reese, and F.H. Eeckman. A generalized hidden markov model for the recognition of human genes in DNA. ISMB 1996, pages 134–141, 1996.

[10] B. Lewin. Genes VII. Oxford University Press, New York, 2000.

[11] L.R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–285, February 1989.

[12] G. R¨ tsch and S. Sonnenburg. Accurate splice site prediction for Caenorhabditis elegans. In B. Sch¨ lkopf, a o K. Tsuda, and J.-P. Vert, editors, Kernel Methods in Computational Biology. MIT Press, 2004.

[13] G. R¨ tsch, S. Sonnenburg, J. Srinivasan, H. Witte, K.-R. M¨ ller, R. Sommer, and B. Sch¨ lkopf. Improving a u o the C. elegans genome annotation using machine learning. PLoS Computational Biology, 2007. In press.

[14] S. Sarawagi and W.W. Cohen. Semi-markov conditional random ﬁelds for information extraction. In Proc. NIPS 2004, 2005.

[15] G.D. Stormo and D. Haussler. Optimally parsing a sequence into different classes based on multiple types of information. In Proc. ISMB 1994, pages 369–375, Menlo Park, CA, 1994. AAAI/MIT Press.

[16] R. Sutton, D. Precup, and S. Singh. Between mdps and semi-mdps: A framework for temporal abstraction in reinforcement learrning. Artiﬁcial Intelligence, 112:181–211, 1999.

[17] B. Taskar, C. Guestrin, and D. Koller. Max-margin markov networks. In Proc. NIPS 2003, 16, 2004.

[18] I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun. Large margin methods for structured output spaces. Journal for Machine Learning Research, 6, September 2005.

[19] V.N. Vapnik. The nature of statistical learning theory. Springer Verlag, New York, 1995.

[20] A. J. Viterbi. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Informat. Theory, IT-13:260–269, Apr 1967.

[21] X.H. Zhang, K.A. Heller, I. Hefter, C.S. Leslie, and L.A. Chasin. Sequence information for the splicing of human pre-mRNA identiﬁed by SVM classiﬁcation. Genome Res, 13(12):2637–50, 2003.