jmlr jmlr2013 jmlr2013-73 jmlr2013-73-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chong Zhang, Yufeng Liu
Abstract: Hard and soft classifiers are two important groups of techniques for classification problems. Logistic regression and Support Vector Machines are typical examples of soft and hard classifiers respectively. The essential difference between these two groups is whether one needs to estimate the class conditional probability for the classification task or not. In particular, soft classifiers predict the label based on the obtained class conditional probabilities, while hard classifiers bypass the estimation of probabilities and focus on the decision boundary. In practice, for the goal of accurate classification, it is unclear which one to use in a given situation. To tackle this problem, the Largemargin Unified Machine (LUM) was recently proposed as a unified family to embrace both groups. The LUM family enables one to study the behavior change from soft to hard binary classifiers. For multicategory cases, however, the concept of soft and hard classification becomes less clear. In that case, class probability estimation becomes more involved as it requires estimation of a probability vector. In this paper, we propose a new Multicategory LUM (MLUM) framework to investigate the behavior of soft versus hard classification under multicategory settings. Our theoretical and numerical results help to shed some light on the nature of multicategory classification and its transition behavior from soft to hard classifiers. The numerical results suggest that the proposed tuned MLUM yields very competitive performance. Keywords: hard classification, large-margin, soft classification, support vector machine
N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337–404, 1950. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pages 144–152, 1992. ISBN 0-89791-497-X. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995. K. Crammer, Y. Singer, N. Cristianini, J. Shawe-taylor, and B. Williamson. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000. J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010. 1384 M ULTICATEGORY L ARGE - MARGIN U NIFIED M ACHINES H. Huang, Y. Liu, Y. Du, C. Perou, D. N. Hayes, M. Todd, and J. S. Marron. Multiclass distance weighted discrimination. Journal of Computational and Graphical Statistics, 2013. Forthcoming. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines, theory, and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465):67–81, 2004. X. Lin, G. Wahba, D. Xiang, F. Gao, R. Klein, and B. Klein. Smoothing spline anova models for large data sets with bernoulli observations and the randomized gacv. Annals of Statistics, 28(6): 1570–1600, 2000. Y. Liu. Fisher consistency of multicategory support vector machines. In Eleventh International Conference on Artificial Intelligence and Statistics, pages 289–296, 2007. Y. Liu and X. Shen. Multicategory ψ-learning. Journal of the American Statistical Association, 101 (474):500–509, 2006. Y. Liu and M. Yuan. Reinforced multicategory support vector machines. Journal of Computational and Graphical Statistics, 20(4):901–919, 2011. Y. Liu, X. Shen, and H. Doss. Multicategory ψ-learning and support vector machine: computational tools. Journal of Computational and Graphical Statistics, 14(1):219–236, 2005. Y. Liu, H. H. Zhang, and Y. Wu. Soft or hard classification? large margin unified machines. Journal of the American Statistical Association, 106(493):166–177, 2011. J. S. Marron, M. Todd, and J. Ahn. Distance weighted discrimination. Journal of the American Statistical Association, 102(480):1267–1271, 2007. S. Y. Park, Y. Liu, D. Liu, and P. Scholl. Multicategory composite least squares classifiers. Statistical Analysis and Data Mining, 3(4):272–286, 2010. X. Shen and W. H. Wong. Convergence rate of sieve estimates. Annals of Statistics, 22(2):580–615, 1994. X. Shen, G. C. Tseng, X. Zhang, and W. H. Wong. On ψ-learning. Journal of the American Statistical Association, 98(463):724–734, 2003. I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels. Annals of Statistics, 35(2):575–607, 2007. Y. Tang and H. H. Zhang. Multiclass proximal support vector machines. Journal of Computational and Graphical Statistics, 15(2):339–355, 2006. A. Tewari and P. L. Bartlett. On the consistency of multiclass classification methods. Journal of Machine Learning Research, 8:1007–1025, 2007. P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, 2001. V. Vapnik. Statistical Learning Theory. Wiley, 1998. 1385 Z HANG AND L IU R. G. Verhaak, K. A. Hoadley, E. Purdom, V. Wang, Y. Qi, M. D. Wilkerson, C. R. Miller, L. Ding, T. Golub, J. P. Mesirov, G. Alexe, M. Lawrence, M. O’Kelly, P. Tamayo, B. A. Weir, S. Gabriel, W. Winckler, S. Gupta, L. Jakkula, H. S. Feiler, J. G. Hodgson, C. D. James, J. N. Sarkaria, C. Brennan, A. Kahn, P. T. Spellman, R. K. Wilson, T. P. Speed, J. W. Gray, M. Meyerson, G. Getz, C. M. Perou, D. N. Hayes, and Cancer Genome Atlas Research Network. Integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in pdgfra, idh1, egfr, and nf1. Cancer Cell, 17(1):98–110, 2010. G. Wahba. Support vector machines, reproducing kernel hilbert spaces and the randomized GACV. In Advances in Kernel Methods Support Vector Learning, pages 69–87. MIT Press, 1999. G. Wahba. Soft and hard classification by reproducing kernel hilbert space methods. In Proceedings of the National Academy of Sciences, volume 99, pages 16524–16530, 2002. J. Wang, X. Shen, and Y. Liu. Probability estimation for large margin classifiers. Biometrika, 95 (1):149–167, 2008. L. Wang and X. Shen. On l1 -norm multi-class support vector machines: methodology and theory. Journal of the American Statistical Association, 102(478):595–602, 2007. J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artificial Neural Networks, volume 4, pages 219–224, 1999. T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 5:975–1005, 2004. Y. Wu and Y. Liu. Robust truncated-hinge-loss support vector machines. Journal of the American Statistical Association, 102(479):974–983, 2007. T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annals of Statistics, 32:56–85, 2004a. T. Zhang. Statistical analysis of some multi-category large margin classification methods. Journal of Machine Learning Research, 5:1225–1251, 2004b. J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics, 14(1):185–205, 2005. J. Zhu, H. Zou, S. Rosset, and T. Hastie. Multi-class adaboost. Statistics and Its Interface, 2(3): 349–360, 2009. H. Zou, J. Zhu, and T. Hastie. New multicategory boosting algorithms based on multicategory fisher-consistent losses. Annals of Applied Statistics, 2(4):1290–1306, 2008. 1386