nips nips2012 nips2012-330 nips2012-330-reference knowledge-graph by maker-knowledge-mining

330 nips-2012-Supervised Learning with Similarity Functions


Source: pdf

Author: Purushottam Kar, Prateek Jain

Abstract: We address the problem of general supervised learning when data can only be accessed through an (indefinite) similarity function between data points. Existing work on learning with indefinite kernels has concentrated solely on binary/multiclass classification problems. We propose a model that is generic enough to handle any supervised learning task and also subsumes the model previously proposed for classification. We give a “goodness” criterion for similarity functions w.r.t. a given supervised learning task and then adapt a well-known landmarking technique to provide efficient algorithms for supervised learning using “good” similarity functions. We demonstrate the effectiveness of our model on three important supervised learning problems: a) real-valued regression, b) ordinal regression and c) ranking where we show that our method guarantees bounded generalization error. Furthermore, for the case of real-valued regression, we give a natural goodness definition that, when used in conjunction with a recent result in sparse vector recovery, guarantees a sparse predictor with bounded generalization error. Finally, we report results of our learning algorithms on regression and ordinal regression tasks using non-PSD similarity functions and demonstrate the effectiveness of our algorithms, especially that of the sparse landmark selection algorithm that achieves significantly higher accuracies than the baseline methods while offering reduced computational costs. 1


reference text

[1] Bernhard Sch¨ lkopf and Alex J. Smola. Learning with Kernels : Support Vector Machines, Regularization, o Optimization, and Beyond. MIT Press, 2002.

[2] Bernard Haasdonk. Feature Space Interpretation of SVMs with Indefinite Kernels. IEEE Transactions on Pattern Analysis and Machince Intelligence, 27(4):482–492, 2005.

[3] Cheng Soon Ong, Xavier Mary, St´ phane Canu, and Alexander J. Smola. Learning with non-positive e Kernels. In 21st Annual International Conference on Machine Learning, 2004.

[4] Yihua Chen, Maya R. Gupta, and Benjamin Recht. Learning Kernels from Indefinite Similarities. In 26th Annual International Conference on Machine Learning, pages 145–152, 2009.

[5] Ronny Luss and Alexandre d’Aspremont. Support Vector Machine Classification with Indefinite Kernels. In 21st Annual Conference on Neural Information Processing Systems, 2007.

[6] Maria-Florina Balcan and Avrim Blum. On a Theory of Learning with Similarity Functions. In 23rd Annual International Conference on Machine Learning, pages 73–80, 2006.

[7] Liwei Wang, Cheng Yang, and Jufu Feng. On Learning with Dissimilarity Functions. In 24th Annual International Conference on Machine Learning, pages 991–998, 2007.

[8] Purushottam Kar and Prateek Jain. Similarity-based Learning via Data Driven Embeddings. In 25th Annual Conference on Neural Information Processing Systems, 2011.

[9] Nathan Srebro. How Good Is a Kernel When Used as a Similarity Measure? In 20th Annual Conference on Computational Learning Theory, pages 323–335, 2007.

[10] Shai Shalev-Shwartz, Nathan Srebro, and Tong Zhang. Trading Accuracy for Sparsity in Optimization Problems with Sparsity Constraints. SIAM Journal on Optimization, 20(6):2807–2832, 2010.

[11] Nathan Srebro Shai Ben-David, Ali Rahimi. Generalization Bounds for Indefinite Kernel Machines. In NIPS 2008 Workshop: New Challenges in Theoretical Machine Learning, 2008.

[12] Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, and Luca Cazzanti. Similarity-based Classification: Concepts and Algorithms. Journal of Machine Learning Research, 10:747–776, 2009.

[13] Sham M. Kakade, Karthik Sridharan, and Ambuj Tewari. On the Complexity of Linear Prediction : Risk Bounds, Margin Bounds, and Regularization. In 22nd Annual Conference on Neural Information Processing Systems, 2008.

[14] Chia-Hua Ho and Chih-Jen Lin. Large-scale Linear Support Vector Regression. http://www.csie. ntu.edu.tw/˜cjlin/papers/linear-svr.pdf, retrieved on May 18, 2012, 2012.

[15] Maria-Florina Balcan, Avrim Blum, and Nathan Srebro. Improved Guarantees for Learning via Similarity Functions. In 21st Annual Conference on Computational Learning Theory, pages 287–298, 2008.

[16] Wei Chu and S. Sathiya Keerthi. Support Vector Ordinal Regression. Neural Computation, 19(3):792– 815, 2007.

[17] Shivani Agarwal. Generalization Bounds for Some Ordinal Regression Algorithms. In 19th International Conference on Algorithmic Learning Theory, pages 7–21, 2008.

[18] A. Frank and Arthur Asuncion. UCI Machine Learning Repository. http://archive.ics.uci. edu/ml, 2010. University of California, Irvine, School of Information and Computer Sciences.

[19] StatLib Dataset Repository. http://lib.stat.cmu.edu/datasets/. Carnegie Mellon University.

[20] Delve Dataset Repository. http://www.cs.toronto.edu/˜delve/data/datasets.html. University of Toronto.

[21] Kilian Q. Weinberger and Gerald Tesauro. Metric Learning for Kernel Regression. In 11th International Conference on Artificial Intelligence and Statistics, pages 612–619, 2007. 9