nips nips2009 nips2009-56 nips2009-56-reference knowledge-graph by maker-knowledge-mining

56 nips-2009-Conditional Neural Fields


Source: pdf

Author: Jian Peng, Liefeng Bo, Jinbo Xu

Abstract: Conditional random fields (CRF) are widely used for sequence labeling such as natural language processing and biological sequence analysis. Most CRF models use a linear potential function to represent the relationship between input features and output. However, in many real-world applications such as protein structure prediction and handwriting recognition, the relationship between input features and output is highly complex and nonlinear, which cannot be accurately modeled by a linear function. To model the nonlinear relationship between input and output we propose a new conditional probabilistic graphical model, Conditional Neural Fields (CNF), for sequence labeling. CNF extends CRF by adding one (or possibly more) middle layer between input and output. The middle layer consists of a number of gate functions, each acting as a local neuron or feature extractor to capture the nonlinear relationship between input and output. Therefore, conceptually CNF is much more expressive than CRF. Experiments on two widely-used benchmarks indicate that CNF performs significantly better than a number of popular methods. In particular, CNF is the best among approximately 10 machine learning methods for protein secondary structure prediction and also among a few of the best methods for handwriting recognition.


reference text

[1] Fei Sha and O. Pereira. Shallow parsing with conditional random fields. In Proceedings of Human Language Technology-NAACL 2003.

[2] D. T. Jones. Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, 292(2):195–202, September 1999.

[3] Feng Zhao, Shuaicheng Li, Beckett W. Sterner, and Jinbo Xu. Discriminative learning for protein conformation sampling. Proteins, 73(1):228–240, October 2008.

[4] Feng Zhao, Jian Peng, Joe Debartolo, Karl F. Freed, Tobin R. Sosnick, and Jinbo Xu. A probabilistic graphical model for ab initio folding. In RECOMB 2’09: Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology, pages 59– 73, Berlin, Heidelberg, 2009. Springer-Verlag.

[5] Sy Bor Wang, Ariadna Quattoni, Louis-Philippe Morency, and David Demirdjian. Hidden conditional random fields for gesture recognition. In CVPR 2006.

[6] Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. In Proceedings of the IEEE, 1989.

[7] John D. Lafferty, Andrew McCallum, and Fernando C. N. Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In ICML 2001.

[8] Ben Taskar, Carlos Guestrin, and Daphne Koller. Max-margin markov networks. In NIPS 2003. 8

[9] Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and Yasemin Altun. Support vector machine learning for interdependent and structured output spaces. In ICML 2004.

[10] Nam Nguyen and Yunsong Guo. Comparisons of sequence labeling algorithms and extensions. In ICML 2007.

[11] Yan Liu, Jaime Carbonell, Judith Klein-Seetharaman, and Vanathi Gopalakrishnan. Comparison of probabilistic combination methods for protein secondary structure prediction. Bioinformatics, 20(17), November 2004.

[12] D. C. Liu and J. Nocedal. On the limited memory bfgs method for large scale optimization. Mathematical Programming, 45(3), 1989.

[13] Richard H. Byrd, Jorge Nocedal, and Robert B. Schnabel. Representations of quasi-newton matrices and their use in limited memory methods. Mathematical Programming, 63(2), 1994.

[14] David J. C. Mackay. A practical bayesian framework for backpropagation networks. Neural Computation, 4:448–472, 1992.

[15] Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press, November 1995.

[16] John Lafferty, Xiaojin Zhu, and Yan Liu. Kernel conditional random fields: representation and clique selection. In ICML 2004.

[17] Yoshua Bengio, R´ jean Ducharme, Pascal Vincent, and Christian Janvin. A neural probabilistic e language model. Journal of Machine Learning Research, 3:1137–1155, 2003.

[18] Ilya Sutskever, Geoffrey E Hinton, and Graham Taylor. The recurrent temporal restricted boltzmann machine. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, NIPS 2009.

[19] Barbara Hammer. Recurrent networks for structured data - a unifying approach and its properties. Cognitive Systems Research, 2002.

[20] Alex Graves and Juergen Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, NIPS 2009.

[21] S. F. Altschul, T. L. Madden, A. A. Sch¨ ffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman. a Gapped blast and psi-blast: a new generation of protein database search programs. Nucleic Acids Research, 25, September 1997.

[22] James A. Cuff and Geoffrey J. Barton. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Genetics, 34, 1999.

[23] Wolfgang Kabsch and Christian Sander. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22(12):2577–2637, December 1983.

[24] H. Kim and H. Park. Protein secondary structure prediction based on an improved support vector machines approach. Protein Engineering, 16(8), August 2003.

[25] Wei Chu, Zoubin Ghahramani, and David. A graphical model for protein secondary structure prediction. In ICML 2004.

[26] Sujun Hua and Zhirong Sun. A novel method of protein secondary structure prediction with high segment overlap measure: Support vector machine approach. Journal of Molecular Biology, 308, 2001.

[27] George Karypis. Yasspp: Better kernels and coding schemes lead to improvements in protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 64(3):575– 586, 2006.

[28] O. Dor and Y. Zhou. Achieving 80% ten-fold cross-validated accuracy for secondary structure prediction by large-scale training. Proteins: Structure, Function, and Bioinformatics, 66, March 2007. 9