jmlr jmlr2013 jmlr2013-73 jmlr2013-73-reference knowledge-graph by maker-knowledge-mining

73 jmlr-2013-Multicategory Large-Margin Unified Machines

Source: pdf

Author: Chong Zhang, Yufeng Liu

Abstract: Hard and soft classiﬁers are two important groups of techniques for classiﬁcation problems. Logistic regression and Support Vector Machines are typical examples of soft and hard classiﬁers respectively. The essential difference between these two groups is whether one needs to estimate the class conditional probability for the classiﬁcation task or not. In particular, soft classiﬁers predict the label based on the obtained class conditional probabilities, while hard classiﬁers bypass the estimation of probabilities and focus on the decision boundary. In practice, for the goal of accurate classiﬁcation, it is unclear which one to use in a given situation. To tackle this problem, the Largemargin Uniﬁed Machine (LUM) was recently proposed as a uniﬁed family to embrace both groups. The LUM family enables one to study the behavior change from soft to hard binary classiﬁers. For multicategory cases, however, the concept of soft and hard classiﬁcation becomes less clear. In that case, class probability estimation becomes more involved as it requires estimation of a probability vector. In this paper, we propose a new Multicategory LUM (MLUM) framework to investigate the behavior of soft versus hard classiﬁcation under multicategory settings. Our theoretical and numerical results help to shed some light on the nature of multicategory classiﬁcation and its transition behavior from soft to hard classiﬁers. The numerical results suggest that the proposed tuned MLUM yields very competitive performance. Keywords: hard classiﬁcation, large-margin, soft classiﬁcation, support vector machine

reference text

N. Aronszajn. Theory of reproducing kernels. Transactions of the American Mathematical Society, 68(3):337–404, 1950. P. L. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classiﬁcation, and risk bounds. Journal of the American Statistical Association, 101(473):138–156, 2006. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classiﬁers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, COLT ’92, pages 144–152, 1992. ISBN 0-89791-497-X. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273–297, 1995. K. Crammer, Y. Singer, N. Cristianini, J. Shawe-taylor, and B. Williamson. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2001. Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000. J. Friedman, T. Hastie, and R. Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33(1):1–22, 2010. 1384 M ULTICATEGORY L ARGE - MARGIN U NIFIED M ACHINES H. Huang, Y. Liu, Y. Du, C. Perou, D. N. Hayes, M. Todd, and J. S. Marron. Multiclass distance weighted discrimination. Journal of Computational and Graphical Statistics, 2013. Forthcoming. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines, theory, and application to the classiﬁcation of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465):67–81, 2004. X. Lin, G. Wahba, D. Xiang, F. Gao, R. Klein, and B. Klein. Smoothing spline anova models for large data sets with bernoulli observations and the randomized gacv. Annals of Statistics, 28(6): 1570–1600, 2000. Y. Liu. Fisher consistency of multicategory support vector machines. In Eleventh International Conference on Artiﬁcial Intelligence and Statistics, pages 289–296, 2007. Y. Liu and X. Shen. Multicategory ψ-learning. Journal of the American Statistical Association, 101 (474):500–509, 2006. Y. Liu and M. Yuan. Reinforced multicategory support vector machines. Journal of Computational and Graphical Statistics, 20(4):901–919, 2011. Y. Liu, X. Shen, and H. Doss. Multicategory ψ-learning and support vector machine: computational tools. Journal of Computational and Graphical Statistics, 14(1):219–236, 2005. Y. Liu, H. H. Zhang, and Y. Wu. Soft or hard classiﬁcation? large margin uniﬁed machines. Journal of the American Statistical Association, 106(493):166–177, 2011. J. S. Marron, M. Todd, and J. Ahn. Distance weighted discrimination. Journal of the American Statistical Association, 102(480):1267–1271, 2007. S. Y. Park, Y. Liu, D. Liu, and P. Scholl. Multicategory composite least squares classiﬁers. Statistical Analysis and Data Mining, 3(4):272–286, 2010. X. Shen and W. H. Wong. Convergence rate of sieve estimates. Annals of Statistics, 22(2):580–615, 1994. X. Shen, G. C. Tseng, X. Zhang, and W. H. Wong. On ψ-learning. Journal of the American Statistical Association, 98(463):724–734, 2003. I. Steinwart and C. Scovel. Fast rates for support vector machines using gaussian kernels. Annals of Statistics, 35(2):575–607, 2007. Y. Tang and H. H. Zhang. Multiclass proximal support vector machines. Journal of Computational and Graphical Statistics, 15(2):339–355, 2006. A. Tewari and P. L. Bartlett. On the consistency of multiclass classiﬁcation methods. Journal of Machine Learning Research, 8:1007–1025, 2007. P. Tseng. Convergence of a block coordinate descent method for nondifferentiable minimization. Journal of Optimization Theory and Applications, 109(3):475–494, 2001. V. Vapnik. Statistical Learning Theory. Wiley, 1998. 1385 Z HANG AND L IU R. G. Verhaak, K. A. Hoadley, E. Purdom, V. Wang, Y. Qi, M. D. Wilkerson, C. R. Miller, L. Ding, T. Golub, J. P. Mesirov, G. Alexe, M. Lawrence, M. O’Kelly, P. Tamayo, B. A. Weir, S. Gabriel, W. Winckler, S. Gupta, L. Jakkula, H. S. Feiler, J. G. Hodgson, C. D. James, J. N. Sarkaria, C. Brennan, A. Kahn, P. T. Spellman, R. K. Wilson, T. P. Speed, J. W. Gray, M. Meyerson, G. Getz, C. M. Perou, D. N. Hayes, and Cancer Genome Atlas Research Network. Integrated genomic analysis identiﬁes clinically relevant subtypes of glioblastoma characterized by abnormalities in pdgfra, idh1, egfr, and nf1. Cancer Cell, 17(1):98–110, 2010. G. Wahba. Support vector machines, reproducing kernel hilbert spaces and the randomized GACV. In Advances in Kernel Methods Support Vector Learning, pages 69–87. MIT Press, 1999. G. Wahba. Soft and hard classiﬁcation by reproducing kernel hilbert space methods. In Proceedings of the National Academy of Sciences, volume 99, pages 16524–16530, 2002. J. Wang, X. Shen, and Y. Liu. Probability estimation for large margin classiﬁers. Biometrika, 95 (1):149–167, 2008. L. Wang and X. Shen. On l1 -norm multi-class support vector machines: methodology and theory. Journal of the American Statistical Association, 102(478):595–602, 2007. J. Weston and C. Watkins. Support vector machines for multi-class pattern recognition. In Proceedings of the Seventh European Symposium on Artiﬁcial Neural Networks, volume 4, pages 219–224, 1999. T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classiﬁcation by pairwise coupling. Journal of Machine Learning Research, 5:975–1005, 2004. Y. Wu and Y. Liu. Robust truncated-hinge-loss support vector machines. Journal of the American Statistical Association, 102(479):974–983, 2007. T. Zhang. Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization. Annals of Statistics, 32:56–85, 2004a. T. Zhang. Statistical analysis of some multi-category large margin classiﬁcation methods. Journal of Machine Learning Research, 5:1225–1251, 2004b. J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. Journal of Computational and Graphical Statistics, 14(1):185–205, 2005. J. Zhu, H. Zou, S. Rosset, and T. Hastie. Multi-class adaboost. Statistics and Its Interface, 2(3): 349–360, 2009. H. Zou, J. Zhu, and T. Hastie. New multicategory boosting algorithms based on multicategory ﬁsher-consistent losses. Annals of Applied Statistics, 2(4):1290–1306, 2008. 1386