jmlr jmlr2009 jmlr2009-48 jmlr2009-48-reference knowledge-graph by maker-knowledge-mining

48 jmlr-2009-Learning Nondeterministic Classifiers

Source: pdf

Author: Juan José del Coz, Jorge Díez, Antonio Bahamonde

Abstract: Nondeterministic classiﬁers are deﬁned as those allowed to predict more than one class for some entries from an input space. Given that the true class should be included in predictions and the number of classes predicted should be as small as possible, these kind of classiﬁers can be considered as Information Retrieval (IR) procedures. In this paper, we propose a family of IR loss functions to measure the performance of nondeterministic learners. After discussing such measures, we derive an algorithm for learning optimal nondeterministic hypotheses. Given an entry from the input space, the algorithm requires the posterior probabilities to compute the subset of classes with the lowest expected loss. From a general point of view, nondeterministic classiﬁers provide an improvement in the proportion of predictions that include the true class compared to their deterministic counterparts; the price to be paid for this increase is usually a tiny proportion of predictions with more than one class. The paper includes an extensive experimental study using three deterministic learners to estimate posterior probabilities: a multiclass Support Vector Machine (SVM), a Logistic Regression, and a Na¨ve Bayes. The data sets considered comprise both UCI ı multi-class learning tasks and microarray expressions of different kinds of cancer. We successfully compare nondeterministic classiﬁers with other alternative approaches. Additionally, we shall see how the quality of posterior probabilities (measured by the Brier score) determines the goodness of nondeterministic predictions. Keywords: nondeterministic, multiclassiﬁcation, reject option, multi-label classiﬁcation, posterior probabilities

reference text

J. Alonso, J. J. del Coz, J. D´ez, O. Luaces, and A. Bahamonde. Learning to predict one or more ı ranks in ordinal regression tasks. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’08), LNAI 5211, pages 39–54. Springer, 2008. S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer. MLL translocations specify a distinct gene expression proﬁle that distinguishes a unique leukemia. Nature Genetics, 30(1):41–47, 2002. A. Asuncion and D.J. Newman. UCI machine learning repository. School of Information and Computer Sciences. University of California, Irvine, California, USA, 2007. P.L. Bartlett and M.H. Wegkamp. Classiﬁcation with a reject option using a hinge loss. Journal of Machine Learning Research, 9:1823–1840, 2008. G.W. Brier. Veriﬁcation of forecasts expressed in terms of probability. Monthly Weather Rev, 78: 1–3, 1950. C. Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1):41–46, 1970. A. Clare and R.D. King. Predicting gene function in Saccharomyces cerevisiae. Bioinformatics, 19 (2):42–49, 2003. G. Corani and M. Zaffalon. Learning reliable classiﬁers from small or incomplete data sets: The Naive Credal Classiﬁer 2. Journal of Machine Learning Research, 9:581–621, 2008a. G. Corani and M. Zaffalon. JNCC2: The java implementation of Naive Credal Classiﬁer 2. Journal of Machine Learning Research (Machine Learning Open Source Software), 9:2695–2698, 2008b. J. Demˇar. Statistical comparisons of classiﬁers over multiple data sets. Journal of Machine Learns ing Research, 7:1–30, 2006. 2291 DEL C OZ , D´EZ AND BAHAMONDE I H.P. Kriegel, P. Kroger, A. Pryakhin, and M. Schubert. Using support vector machines for classifying large sets of multi-represented objects. Proc. 4th SIAM Int. Conf. on Data Mining, pages 102–114, 2004. C-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for logistic regression. Journal of Machine Learning Research, 9(Apr):627–650, 2008. S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, M. Angelo, M. E. McLaughlin, J. Y. H. Kim, L. C. Goumnerova, P. M. Black, C. Lau, J. C. Allen, D. Zagzag, J. M. Olson, T. Curran, C. Wetmore, J. A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D. N. Louis, J. P. Mesirov, E. S. Lander, and T. R. Golub. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870):436–442, 2002. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences (PNAS), 98(26):15149–15154, 2001. D.T. Ross, U. Scherf, M.B. Eisen, C.M. Perou, C. Rees, P. Spellman, V. Iyer, S.S. Jeffrey, M. Van de Rijn, M. Waltham, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24(3):227–234, 2000. G. Shafer and V. Vovk. A tutorial on conformal prediction. Journal of Machine Learning Research, 9:371–421, 2008. J.E. Staunton, D.K. Slonim, H.A. Coller, P. Tamayo, M.J. Angelo, J. Park, U. Scherf, J.K. Lee, W.O. Reinhold, J.N. Weinstein, et al. Chemosensitivity prediction by transcriptional proﬁling. Proceedings of the National Academy of Sciences (PNAS), 98(19):10787–10792, 2001. A.I. Su, J.B. Welsh, L.M. Sapinoso, S.G. Kern, P. Dimitrov, H. Lapp, P.G. Schultz, S.M. Powell, C.A. Moskaluk, H.F. Frierson, and G. M. Hampton. Molecular classiﬁcation of human carcinomas by use of gene expression signatures. Cancer Research, 61(20):7388–7393, 2001. P. Tamayo, D. Scanfeld, B.L. Ebert, M.A. Gillette, C.W.M. Roberts, and J.P. Mesirov. Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proceedings of the National Academy of Sciences (PNAS), 104(14):5959–5964, 2007. A.C. Tan, D.Q. Naiman, L. Xu, R.L. Winslow, and D. Geman. Simple decision rules for classifying human cancers from gene expression proﬁles. Bioinformatics, 21(20):3896–3904, 2005. R. Tibshirani and T. Hastie. Margin trees for high-dimensional classiﬁcation. Journal of Machine Learning Research, 8:637–652, 2007. G. Tsoumakas and I. Katakis. Multi-label classiﬁcation: An overview. International Journal of Data Warehousing and Mining, 3(3):1–13, 2007. T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classiﬁcation by pairwise coupling. Journal of Machine Learning Research, 5:975–1005, August 2004. 2292 L EARNING N ONDETERMINISTIC C LASSIFIERS E.J. Yeoh, M.E. Ross, S.A. Shurtleff, W.K. Williams, D. Patel, R. Mahfouz, F.G. Behm, S.C. Raimondi, M.V. Relling, A. Patel, C. Cheng, D. Campana, D. Wilkins, X. Zhou, J. Li, H. Liu, C.-H. Pui, W. E. Evans, C. Naeve, L. Wong, and J. R. Downing. Classiﬁcation, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression proﬁling. Cancer Cell, 1(2):133–143, 2002. K.Y. Yeung and R.E. Bumgarner. Multiclass classiﬁcation of microarray data with repeated measurements: application to cancer. Genome Biology, 4(12):R83, 2003. K.Y. Yeung, R.E. Bumgarner, and A.E. Raftery. Bayesian model averaging: development of an improved multiclass, gene selection and classiﬁcation tool for microarray data. Bioinformatics, 21(10):2394–2402, 2005. M. Zaffalon. The Na¨ve Credal Classiﬁer. Journal of Statistical Planning and Inference, 105(1): ı 5–21, 2002. 2293