nips nips2012 nips2012-361 nips2012-361-reference knowledge-graph by maker-knowledge-mining

361 nips-2012-Volume Regularization for Binary Classification

Source: pdf

Author: Koby Crammer, Tal Wagner

Abstract: We introduce a large-volume box classiﬁcation for binary prediction, which maintains a subset of weight vectors, and speciﬁcally axis-aligned boxes. Our learning algorithm seeks for a box of large volume that contains “simple” weight vectors which most of are accurate on the training set. Two versions of the learning process are cast as convex optimization problems, and it is shown how to solve them efﬁciently. The formulation yields a natural PAC-Bayesian performance bound and it is shown to minimize a quantity directly aligned with it. The algorithm outperforms SVM and the recently proposed AROW algorithm on a majority of 30 NLP datasets and binarized USPS optical character recognition datasets. 1

reference text

[1] J. Bi and T. Zhang. Support vector classiﬁcation with input data uncertainty. In NIPS, 2004.

[2] J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classiﬁcation. In ACL, 2007.

[3] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, September 1995.

[4] K. Crammer, A. Kulesza, and M. Dredze. Adaptive regularization of weighted vectors. In NIPS, 2009.

[5] K. Crammer, M. Mohri, and F. Pereira. Gaussian margin machines. In AISTATS, 2009.

[6] M. Dredze, K. Crammer, and F. Pereira. Conﬁdence-weighted linear classiﬁcation. In ICML, 2008.

[7] J. Duchi, E. Hazan, and Y. Singer. Adaptive subgradient methods for online learning and stochastic optimization. In COLT, 2010.

[8] J. Duchi, S. Shalev-Shwartz, Y. Singer, and A. Tewari. Composite objective mirror descent. In COLT, pages 250–264, 2010.

[9] Y. Freund and R.E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. In Euro-COLT, pages 23–37, 1995.

[10] P. Germain, A. Lacasse, F. Laviolette, and M. Marchand. Pac-bayesian learning of linear classiﬁers. In ICML, 2009.

[11] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2001.

[12] R. Herbrich, T. Graepel, and C. Campbell. Robust Bayes point machines. In ESANN 2000, pages 49–54, 2000.

[13] R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines. JMLR, 1:245–279, 2001.

[14] P.J. Huber. Robust estimation of a location parameter. Annals of Statistics, 53:73101, 1964.

[15] T. Jaakkola and M. Jordan. A variational approach to Bayesian logistic regression models and their extensions. In Workshop on Artiﬁcial Intelligence and Statistics, 1997.

[16] G. Lanckriet, L. Ghaoui, C. Bhattacharyya, and M. Jordan. A robust minimax approach to classiﬁcation. JMLR, 3:555–582, 2002.

[17] J. Langford and M. Seeger. Bounds for averaging classiﬁers. Technical report, CMU-CS-01102, 2002.

[18] J. Langford and J. Shawe-Taylor. PAC-bayes and margins. In NIPS, 2002.

[19] D. McAllester. PAC-Bayesian model averaging. In COLT, 1999.

[20] J. Nath, C. Bhattacharyya, and M. Murty. Clustering based large margin classiﬁcation: A scalable approach using SOCP formulation. In KDD, 2006.

[21] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.

[22] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularizao tion, Optimization and Beyond. MIT Press, 2002.

[23] M. Seeger. PAC-Bayesian generalization bounds for gaussian processes. JMLR, 3:233–269, 2002.

[24] P. Shivaswamy and T. Jebara. Ellipsoidal kernel machines. In AISTATS, 2007.

[25] V. N. Vapnik. Statistical Learning Theory. Wiley, 1998.

[26] M.H. Wright. The interior-point revolution in optimization: history, recent developments, and lasting consequences. Bull. Amer. Math. Soc., 42:39–56, 2005. 9