nips nips2008 nips2008-239 nips2008-239-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Olivier Chapelle, Chuong B. Do, Choon H. Teo, Quoc V. Le, Alex J. Smola
Abstract: Large-margin structured estimation methods minimize a convex upper bound of loss functions. While they allow for efficient optimization algorithms, these convex formulations are not tight and sacrifice the ability to accurately model the true loss. We present tighter non-convex bounds based on generalizing the notion of a ramp loss from binary classification to structured estimation. We show that a small modification of existing optimization algorithms suffices to solve this modified problem. On structured prediction tasks such as protein sequence alignment and web page ranking, our algorithm leads to improved accuracy. 1
[1] K. Crammer, and Y. Singer. On the Learnability and Design of Output Codes for Multiclass Problems. In COLT 2000, pages 35–46. Morgan Kaufmann, 2000.
[2] R. Collobert, F.H. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In ICML 2006, pages 201–208. ACM, 2006.
[3] C. B. Do, S. S. Gross, and S. Batzoglou. CONTRAlign: discriminative training for protein sequence alignment. In RECOMB, pages 160–174, 2006.
[4] C. B. Do, D. A. Woods, and S. Batzoglou. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics, 22(14):e90–e98, 2006.
[5] S. R. Eddy. Non-coding RNA genes and the modern RNA world. Nature Reviews Genetics, 2(12):919– 929, 2001.
[6] Y. Freund, R. Iyer, R.E. Schapire, and Y. Singer. An efficient boosting algorithm for combining preferences. In ICML 1998, pages 170–178., 1998.
[7] S. Griffiths-Jones, S. Moxon, M. Marshall, A. Khanna, S. R. Eddy, and A. Bateman. Rfam: annotating non-coding RNAs in complete genomes. Nucl. Acids Res., 33:D121–D124, 2005.
[8] R. Herbrich, T. Graepel, and K. Obermayer. Large margin rank boundaries for ordinal regression. In Advances in Large Margin Classifiers, pages 115–132, 2000. MIT Press.
[9] T. Hoang. DC optimization: Theory, methods, and applications. In R. Horst and P. Pardalos, editors, Handbook of Global Optimization, Kluwer.
[10] T. Joachims. Optimizing search engines using clickthrough data. In KDD. ACM, 2002.
[11] T. Joachims, T. Galor, and R. Elber. Learning to align sequences: A maximum-margin approach. In New Algorithms for Macromolecular Simulation, LNCS 49, 57–68. Springer, 2005.
[12] Q. Le and A.J. Smola. Direct optimization of ranking measures. NICTA-TR, 2007.
[13] T.-Y. Liu, J. Xu, T. Qin, W. Xiong, and H. Li. Letor: Benchmark dataset for research on learning to rank for information retrieval. In LR4IR, 2007.
[14] J. Pei and N. V. Grishin. MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information. Nucl. Acids Res., 34(16):4364–4374, 2006.
[15] N. Ratliff, J. Bagnell, and M. Zinkevich. (online) subgradient methods for structured prediction. In AISTATS, 2007.
[16] B. Rost. Twilight zone of protein sequence alignments. Protein Eng., 12(2):85–94, 1999.
[17] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In Proc. Intl. Conf. Machine Learning, 2007.
[18] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In NIPS 16, pages 25–32, 2004. MIT Press.
[19] C.H. Teo, Q. Le, A.J. Smola, and S.V.N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In KDD. ACM, 2007.
[20] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6:1453–1484, 2005.
[21] M. Weimer, A. Karatzoglou, Q. Le, and A. Smola. Cofi rank - maximum margin matrix factorization for collaborative ranking. In NIPS 20. MIT Press, 2008.
[22] A.L. Yuille and A. Rangarajan. The concave-convex procedure. Neural Computation, 15:915–936, 2003.
[23] A. Nedic and D. P. Bertsekas. Incremental subgradient methods for nondifferentiable optimization. Siam J. on Optimization, 12:109–138, 2001. 8