jmlr jmlr2013 jmlr2013-21 jmlr2013-21-reference knowledge-graph by maker-knowledge-mining

21 jmlr-2013-Classifier Selection using the Predicate Depth


Source: pdf

Author: Ran Gilad-Bachrach, Christopher J.C. Burges

Abstract: Typically, one approaches a supervised machine learning problem by writing down an objective function and finding a hypothesis that minimizes it. This is equivalent to finding the Maximum A Posteriori (MAP) hypothesis for a Boltzmann distribution. However, MAP is not a robust statistic. We present an alternative approach by defining a median of the distribution, which we show is both more robust, and has good generalization guarantees. We present algorithms to approximate this median. One contribution of this work is an efficient method for approximating the Tukey median. The Tukey median, which is often used for data visualization and outlier detection, is a special case of the family of medians we define: however, computing it exactly is exponentially slow in the dimension. Our algorithm approximates such medians in polynomial time while making weaker assumptions than those required by previous work. Keywords: classification, estimation, median, Tukey depth


reference text

A. Ambroladze, E. Parrado-Hernández, and J. Shawe-Taylor. Tighter PAC-Bayes bounds. Advances in neural information processing systems, 19:9, 2007. S. Ben-David, N. Eiron, and P. M. Long. On the difficulty of approximately maximizing agreements. J. Comput. Syst. Sci., 66(3):496–514, 2003. 3616 P REDICATE D EPTH N. Billor, A. Abebe, A. Turkmen, and S. V. Nudurupati. Classification based on depth transvariations. Journal of classification, 25(2):249–260, 2008. C. Borell. Convex set functions in d-space. Periodica Mathematica Hungarica, 6:111–136, 1975. ISSN 0031-5303. 10.1007/BF02018814. A. Caplin and B. Nalebuff. Aggregation and social choice: A mean voter theorem. Econometrica, 59(1):1–23, January 1991. T. M. Chan. An optimal randomized algorithm for maximum Tukey depth. In SODA, pages 430– 436, 2004. A. Christmann. Regression depth and support vector machine. DIMACS series in discrete mathematics and theoretical computer science, 72:71–86, 2006. K. L. Clarkson, D. Eppstein, G. L. Miller, C. Sturtivant, and S. H. Teng. Approximating center points with iterative radon points. Int. J. Comput. Geometry Appl., 6(3):357–377, 1996. C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995. J. A. Cuesta-Albertos and A. Nieto-Reyes. The random Tukey depth. Computational Statistics & Data Analysis, 52, 2008. A. Cuevas, M. Febrero, and R. Fraiman. Robust estimation and classification for functional data via projection-based depth notions. Computational Statistics, 2007. P. L. Davies and U. Gather. Breakdown and groups. The Annals of Statistics, 33(3):977–1035, 2005. S. Fine, R. Gilad-Bachrach, and E. Shamir. Query by committee, linear separation and random walks. Theor. Comput. Sci., 284(1):25–51, 2002. E. Fix and J. L. Hodges Jr. Discriminatory analysis-nonparametric discrimination: Consistency properties, 1951. R. Fraiman and G. Muniz. Trimmed means for functional data. Test, 2001. P. Germain, A. Lacasse, F. Laviolette, M. Marchand, and S. Shanian. From PAC-Bayes bounds to KL regularization. In ICML, 2009. A. K. Ghosh and P. Chaudhuri. On maximum depth and related classifiers. Scandinavian Journal of Statistics, 32(2):327–350, 2005a. A. K. Ghosh and P. Chaudhuri. On data depth and distribution-free discriminant analysis using separating surfaces. Bernoulli, 11(1):1–27, 2005b. R. Gilad-Bachrach, A. Navot, and N. Tishby. Bayes and Tukey meet at the center point. In Proceedings of the Conference on Learning Theory (COLT), pages 549–563. Springer, 2004. R. Gilad-Bachrach, A. Navot, and N. Tishby. Query by committee made real. In NIPS, 2005. F. R. Hampel. A general qualitative definition of robustness. The Annals of Mathematical Statistics, 42(6):1887–1896, 1971. 3617 G ILAD -BACHRACH AND B URGES D. Haussler and E. Welzl. Epsilon-nets and simplex range queries. In Symposium on Computational Geometry, pages 61–71, 1986. doi: 10.1145/10515.10522. X. He. Discussion: Breakdown and groups. The Annals of Statistics, 33(3):998–1000, 2005. R. Herbrich, T. Graepel, and C. Campbell. Bayes point machines. The Journal of Machine Learning Research, 1:245–279, 2001. R. Jörnsten. Clustering and classification based on the L1 data depth. Journal of Multivariate Analysis, 90, 2004. R. Y. Liu. On a notion of data depth based on random simplices. Annals of Statistics, 1990. R. Y. Liu, J. M. Parelius, and K. Singh. Multivariate analysis by data depth: descriptive statistics, graphics and inference,(with discussion and a rejoinder by Liu and Singh). The Annals of Statistics, 27(3):783–858, 1999. S. López-Pintado and J. Romo. On the concept of depth for functional data. Journal of The American Statistical Association, 104, 2009. J. C. Massé. Asymptotics for the Tukey median. Journal of Multivariate Analysis, 81, 2002. D. A. McAllester. Some PAC-Bayesian theorems. Machine Learning, 37(3):355–363, 1999. P. J. Rousseeuw and M. Hubert. Regression depth. Journal of the American Statistical Association, 94(446):388–402, 1999. M. Seeger. PAC-Bayesian generalisation error bounds for gaussian process classification. The Journal of Machine Learning Research, 3:233–269, 2003. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1994. J. Tukey. Mathematics and the picturing of data. In Proceedings of the International Congress of Mathematicians, 1975. Y. Zuo. Projection-based depth functions and associated medians. The Annals of Statistics, 2003. Y. Zuo and R. Serfling. General notions of statistical depth function. The Annals of Statistics, 28 (2):461–482, 2000. 3618