jmlr jmlr2009 jmlr2009-48 jmlr2009-48-reference knowledge-graph by maker-knowledge-mining

48 jmlr-2009-Learning Nondeterministic Classifiers


Source: pdf

Author: Juan José del Coz, Jorge Díez, Antonio Bahamonde

Abstract: Nondeterministic classifiers are defined as those allowed to predict more than one class for some entries from an input space. Given that the true class should be included in predictions and the number of classes predicted should be as small as possible, these kind of classifiers can be considered as Information Retrieval (IR) procedures. In this paper, we propose a family of IR loss functions to measure the performance of nondeterministic learners. After discussing such measures, we derive an algorithm for learning optimal nondeterministic hypotheses. Given an entry from the input space, the algorithm requires the posterior probabilities to compute the subset of classes with the lowest expected loss. From a general point of view, nondeterministic classifiers provide an improvement in the proportion of predictions that include the true class compared to their deterministic counterparts; the price to be paid for this increase is usually a tiny proportion of predictions with more than one class. The paper includes an extensive experimental study using three deterministic learners to estimate posterior probabilities: a multiclass Support Vector Machine (SVM), a Logistic Regression, and a Na¨ve Bayes. The data sets considered comprise both UCI ı multi-class learning tasks and microarray expressions of different kinds of cancer. We successfully compare nondeterministic classifiers with other alternative approaches. Additionally, we shall see how the quality of posterior probabilities (measured by the Brier score) determines the goodness of nondeterministic predictions. Keywords: nondeterministic, multiclassification, reject option, multi-label classification, posterior probabilities


reference text

J. Alonso, J. J. del Coz, J. D´ez, O. Luaces, and A. Bahamonde. Learning to predict one or more ı ranks in ordinal regression tasks. Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD’08), LNAI 5211, pages 39–54. Springer, 2008. S.A. Armstrong, J.E. Staunton, L.B. Silverman, R. Pieters, M.L. den Boer, M.D. Minden, S.E. Sallan, E.S. Lander, T.R. Golub, and S.J. Korsmeyer. MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nature Genetics, 30(1):41–47, 2002. A. Asuncion and D.J. Newman. UCI machine learning repository. School of Information and Computer Sciences. University of California, Irvine, California, USA, 2007. P.L. Bartlett and M.H. Wegkamp. Classification with a reject option using a hinge loss. Journal of Machine Learning Research, 9:1823–1840, 2008. G.W. Brier. Verification of forecasts expressed in terms of probability. Monthly Weather Rev, 78: 1–3, 1950. C. Chow. On optimum recognition error and reject tradeoff. IEEE Transactions on Information Theory, 16(1):41–46, 1970. A. Clare and R.D. King. Predicting gene function in Saccharomyces cerevisiae. Bioinformatics, 19 (2):42–49, 2003. G. Corani and M. Zaffalon. Learning reliable classifiers from small or incomplete data sets: The Naive Credal Classifier 2. Journal of Machine Learning Research, 9:581–621, 2008a. G. Corani and M. Zaffalon. JNCC2: The java implementation of Naive Credal Classifier 2. Journal of Machine Learning Research (Machine Learning Open Source Software), 9:2695–2698, 2008b. J. Demˇar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learns ing Research, 7:1–30, 2006. 2291 DEL C OZ , D´EZ AND BAHAMONDE I H.P. Kriegel, P. Kroger, A. Pryakhin, and M. Schubert. Using support vector machines for classifying large sets of multi-represented objects. Proc. 4th SIAM Int. Conf. on Data Mining, pages 102–114, 2004. C-J. Lin, R. C. Weng, and S. S. Keerthi. Trust region newton method for logistic regression. Journal of Machine Learning Research, 9(Apr):627–650, 2008. S. L. Pomeroy, P. Tamayo, M. Gaasenbeek, L. M. Sturla, M. Angelo, M. E. McLaughlin, J. Y. H. Kim, L. C. Goumnerova, P. M. Black, C. Lau, J. C. Allen, D. Zagzag, J. M. Olson, T. Curran, C. Wetmore, J. A. Biegel, T. Poggio, S. Mukherjee, R. Rifkin, A. Califano, G. Stolovitzky, D. N. Louis, J. P. Mesirov, E. S. Lander, and T. R. Golub. Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature, 415(6870):436–442, 2002. S. Ramaswamy, P. Tamayo, R. Rifkin, S. Mukherjee, C.H. Yeang, M. Angelo, C. Ladd, M. Reich, E. Latulippe, J.P. Mesirov, et al. Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences (PNAS), 98(26):15149–15154, 2001. D.T. Ross, U. Scherf, M.B. Eisen, C.M. Perou, C. Rees, P. Spellman, V. Iyer, S.S. Jeffrey, M. Van de Rijn, M. Waltham, et al. Systematic variation in gene expression patterns in human cancer cell lines. Nature Genetics, 24(3):227–234, 2000. G. Shafer and V. Vovk. A tutorial on conformal prediction. Journal of Machine Learning Research, 9:371–421, 2008. J.E. Staunton, D.K. Slonim, H.A. Coller, P. Tamayo, M.J. Angelo, J. Park, U. Scherf, J.K. Lee, W.O. Reinhold, J.N. Weinstein, et al. Chemosensitivity prediction by transcriptional profiling. Proceedings of the National Academy of Sciences (PNAS), 98(19):10787–10792, 2001. A.I. Su, J.B. Welsh, L.M. Sapinoso, S.G. Kern, P. Dimitrov, H. Lapp, P.G. Schultz, S.M. Powell, C.A. Moskaluk, H.F. Frierson, and G. M. Hampton. Molecular classification of human carcinomas by use of gene expression signatures. Cancer Research, 61(20):7388–7393, 2001. P. Tamayo, D. Scanfeld, B.L. Ebert, M.A. Gillette, C.W.M. Roberts, and J.P. Mesirov. Metagene projection for cross-platform, cross-species characterization of global transcriptional states. Proceedings of the National Academy of Sciences (PNAS), 104(14):5959–5964, 2007. A.C. Tan, D.Q. Naiman, L. Xu, R.L. Winslow, and D. Geman. Simple decision rules for classifying human cancers from gene expression profiles. Bioinformatics, 21(20):3896–3904, 2005. R. Tibshirani and T. Hastie. Margin trees for high-dimensional classification. Journal of Machine Learning Research, 8:637–652, 2007. G. Tsoumakas and I. Katakis. Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 3(3):1–13, 2007. T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classification by pairwise coupling. Journal of Machine Learning Research, 5:975–1005, August 2004. 2292 L EARNING N ONDETERMINISTIC C LASSIFIERS E.J. Yeoh, M.E. Ross, S.A. Shurtleff, W.K. Williams, D. Patel, R. Mahfouz, F.G. Behm, S.C. Raimondi, M.V. Relling, A. Patel, C. Cheng, D. Campana, D. Wilkins, X. Zhou, J. Li, H. Liu, C.-H. Pui, W. E. Evans, C. Naeve, L. Wong, and J. R. Downing. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell, 1(2):133–143, 2002. K.Y. Yeung and R.E. Bumgarner. Multiclass classification of microarray data with repeated measurements: application to cancer. Genome Biology, 4(12):R83, 2003. K.Y. Yeung, R.E. Bumgarner, and A.E. Raftery. Bayesian model averaging: development of an improved multiclass, gene selection and classification tool for microarray data. Bioinformatics, 21(10):2394–2402, 2005. M. Zaffalon. The Na¨ve Credal Classifier. Journal of Statistical Planning and Inference, 105(1): ı 5–21, 2002. 2293