nips nips2012 nips2012-361 nips2012-361-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Koby Crammer, Tal Wagner
Abstract: We introduce a large-volume box classification for binary prediction, which maintains a subset of weight vectors, and specifically axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains “simple” weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efficiently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of 30 NLP datasets and binarized USPS optical character recognition datasets. 1
[1] J. Bi and T. Zhang. Support vector classification with input data uncertainty. In NIPS, 2004.
[2] J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification. In ACL, 2007.
[3] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, September 1995.
[4] K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weighted vectors. In NIPS, 2009.
[5] K. Crammer, M. Mohri, and F. Pereira. Gaussian margin machines. In AISTATS, 2009.
[6] M. Dredze, K. Crammer, and F. Pereira. Confidence-weighted linear classification. In ICML, 2008.
[7] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. In COLT, 2010.
[8] J. Duchi, S. Shalev-Shwartz, Y. Singer, and A. Tewari. Composite objective mirror descent. In COLT, pages 250–264, 2010.
[9] Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Euro-COLT, pages 23–37, 1995.
[10] P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. Pac-bayesian learning of linear classifiers. In ICML, 2009.
[11] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.
[12] R. Herbrich, T. Graepel, and C. Campbell. Robust Bayes point machines. In ESANN 2000, pages 49–54, 2000.
[13] R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines. JMLR, 1:245–279, 2001.
[14] P.J. Huber. Robust estimation of a location parameter. Annals of Statistics, 53:73101, 1964.
[15] T. Jaakkola and M. Jordan. A variational approach to Bayesian logistic regression models and their extensions. In Workshop on Artificial Intelligence and Statistics, 1997.
[16] G. Lanckriet, L. Ghaoui, C. Bhattacharyya, and M. Jordan. A robust minimax approach to classification. JMLR, 3:555–582, 2002.
[17] J. Langford and M. Seeger. Bounds for averaging classifiers. Technical report, CMU-CS-01102, 2002.
[18] J. Langford and J. Shawe-Taylor. PAC-bayes and margins. In NIPS, 2002.
[19] D. McAllester. PAC-Bayesian model averaging. In COLT, 1999.
[20] J. Nath, C. Bhattacharyya, and M. Murty. Clustering based large margin classification: A scalable approach using SOCP formulation. In KDD, 2006.
[21] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.
[22] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularizao tion, Optimization and Beyond. MIT Press, 2002.
[23] M. Seeger. PAC-Bayesian generalization bounds for gaussian processes. JMLR, 3:233–269, 2002.
[24] P. Shivaswamy and T. Jebara. Ellipsoidal kernel machines. In AISTATS, 2007.
[25] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.
[26] M.H. Wright. The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull. Amer. Math. Soc., 42:39–56, 2005. 9