nips nips2009 nips2009-80 nips2009-80-reference knowledge-graph by maker-knowledge-mining

80 nips-2009-Efficient and Accurate Lp-Norm Multiple Kernel Learning

Source: pdf

Author: Marius Kloft, Ulf Brefeld, Pavel Laskov, Klaus-Robert Müller, Alexander Zien, Sören Sonnenburg

Abstract: Learning linear combinations of multiple kernels is an appealing strategy when the right choice of features is unknown. Previous approaches to multiple kernel learning (MKL) promote sparse kernel combinations to support interpretability. Unfortunately, 1 -norm MKL is hardly observed to outperform trivial baselines in practical applications. To allow for robust kernel mixtures, we generalize MKL to arbitrary p -norms. We devise new insights on the connection between several existing MKL formulations and develop two efﬁcient interleaved optimization strategies for arbitrary p > 1. Empirically, we demonstrate that the interleaved optimization strategies are much faster compared to the traditionally used wrapper approaches. Finally, we apply p -norm MKL to real-world problems from computational biology, showing that non-sparse MKL achieves accuracies that go beyond the state-of-the-art. 1

reference text

[1] T. Abeel, Y. V. de Peer, and Y. Saeys. Towards a gold standard for promoter prediction evaluation. Bioinformatics, 2009.

[2] F. R. Bach. Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res., 9:1179– 1225, 2008.

[3] F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the smo algorithm. In Proc. 21st ICML. ACM, 2004.

[4] V. B. Bajic, S. L. Tan, Y. Suzuki, and S. Sugano. Promoter prediction analysis on the whole human genome. Nature Biotechnology, 22(11):1467–1473, 2004.

[5] S. Boyd and L. Vandenberghe. Convex Optimization. Cambrigde University Press, Cambridge, UK, 2004.

[6] O. Chapelle and A. Rakotomamonjy. Second order optimization of kernel parameters. In Proc. of the NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels, 2008.

[7] C. Cortes, A. Gretton, G. Lanckriet, M. Mohri, and A. Rostamizadeh. Proceedings of the NIPS Workshop on Kernel Learning: Automatic Selection of Optimal Kernels, 2008.

[8] R. Hettich and K. O. Kortanek. Semi-inﬁnite programming: theory, methods, and applications. SIAM Rev., 35(3):380–429, 1993.

[9] S. Ji, L. Sun, R. Jin, and J. Ye. Multi-label multiple kernel learning. In Advances in Neural Information Processing Systems, 2009.

[10] G. Lanckriet, N. Cristianini, L. E. Ghaoui, P. Bartlett, and M. I. Jordan. Learning the kernel matrix with semi-deﬁnite programming. JMLR, 5:27–72, 2004.

[11] H. Leeb and B. M. P¨ tscher. Sparse estimators and the oracle property, or the return of hodges’ estimator. o Journal of Econometrics, 142:201–211, 2008.

[12] C. Longworth and M. J. F. Gales. Combining derivative and parametric kernels for speaker veriﬁcation. IEEE Transactions in Audio, Speech and Language Processing, 17(4):748–757, 2009.

[13] C. A. Micchelli and M. Pontil. Learning the kernel function via regularization. Journal of Machine Learning Research, 6:1099–1125, 2005.

[14] Y. Nardi and A. Rinaldo. On the asymptotic properties of the group lasso estimator for linear models. Electron. J. Statist., 2:605–633, 2008.

[15] S. Olhede, M. Pontil, and J. Shawe-Taylor. Proceedings of the PASCAL2 Workshop on Sparsity in Machine Learning and Statistics, 2009.

[16] B. A. Olshausen and D. J. Field. Emergence of simple-cell receptive ﬁeld properties by learning a sparse code for natural images. Nature, 381:607–609, 1996.

[17] C. S. Ong and A. Zien. An Automated Combination of Kernels for Predicting Protein Subcellular Localization. In Proc. of the 8th Workshop on Algorithms in Bioinformatics, 2008.

[18] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. More efﬁciency in multiple kernel learning. In ICML, pages 775–782, 2007.

[19] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. Journal of Machine Learning Research, 9:2491–2521, 2008.

[20] B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o

[21] S. Sonnenburg, G. R¨ tsch, C. Sch¨ fer, and B. Sch¨ lkopf. Large Scale Multiple Kernel Learning. Journal a a o of Machine Learning Research, 7:1531–1565, July 2006.

[22] S. Sonnenburg, A. Zien, and G. R¨ tsch. ARTS: Accurate Recognition of Transcription Starts in Human. a Bioinformatics, 22(14):e472–e480, 2006.

[23] Y. Suzuki, R. Yamashita, K. Nakai, and S. Sugano. dbTSS: Database of human transcriptional start sites and full-length cDNAs. Nucleic Acids Research, 30(1):328–331, 2002.

[24] M. Szafranski, Y. Grandvalet, and A. Rakotomamonjy. Composite kernel learning. In Proceedings of the International Conference on Machine Learning, 2008.

[25] M. Varma and D. Ray. Learning the discriminative power-invariance trade-off. In IEEE 11th International Conference on Computer Vision (ICCV), pages 1–8, 2007.

[26] Z. Xu, R. Jin, I. King, and M. Lyu. An extended level method for efﬁcient multiple kernel learning. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1825–1832. 2009.

[27] A. Zien and C. S. Ong. Multiclass multiple kernel learning. In Proceedings of the 24th international conference on Machine learning (ICML), pages 1191–1198. ACM, 2007. 9