jmlr jmlr2010 jmlr2010-90 jmlr2010-90-reference knowledge-graph by maker-knowledge-mining

90 jmlr-2010-Permutation Tests for Studying Classifier Performance

Source: pdf

Author: Markus Ojala, Gemma C. Garriga

Abstract: We explore the framework of permutation-based p-values for assessing the performance of classiﬁers. In this paper we study two simple permutation tests. The ﬁrst test assess whether the classiﬁer has found a real class structure in the data; the corresponding null distribution is estimated by permuting the labels in the data. This test has been used extensively in classiﬁcation problems in computational biology. The second test studies whether the classiﬁer is exploiting the dependency between the features in classiﬁcation; the corresponding null distribution is estimated by permuting the features within classes, inspired by restricted randomization techniques traditionally used in statistics. This new test can serve to identify descriptive features which can be valuable information in improving the classiﬁer performance. We study the properties of these tests and present an extensive empirical evaluation on real and synthetic data. Our analysis shows that studying the classiﬁer performance via permutation tests is effective. In particular, the restricted permutation test clearly reveals whether the classiﬁer exploits the interdependency between the features in the data. Keywords: classiﬁcation, labeled data, permutation tests, restricted randomization, signiﬁcance testing

reference text

Arthur Asuncion and David J. Newman. UCI machine learning repository, 2007. http://www. ics.uci.edu/˜mlearn/MLRepository.html. Yoav Benjamini and Yosef Hochberg. Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society. Series B (Methodological), 57(1):289–300, 1995. Alain Berlinet, G´ rard Biau, and Laurent Rouvi` re. Functional supervised classiﬁcation with e e wavelets. Annales de l’ISUP, 52:61–80, 2008. Julian Besag and Peter Clifford. Sequential Monte Carlo p-values. Biometrika, 78(2):301–304, 1991. Howard D. Bondell and Brian J. Reich. Simultaneous regression shrinkage, variable selection, and supervised clustering of predictors with OSCAR. Biometrics, 64:115–123, 2008. Ulisses Braga-Neto and Edward R. Dougherty. Is cross-validation valid for small-sample microarray classiﬁcation? Bioinformatics, 20(3):374–380, 2004. George Casella and Roger L. Berger. Statistical Inference. Duxbury Resource Center, 2001. Yuguo Chen, Persi Diaconis, Susan P. Holmes, and Jun S. Liu. Sequential Monte Carlo methods for statistical analysis of tables. Journal of the American Statistical Association, 100(469):109–120, 2005. Bradley Efron. Bootstrap methods: Another look at the jackknife. The Annals of Statistics, 7(1): 1–26, 1979. Bradley Efron and Robert J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, 1993. Michael P. Fay, Hyune-Ju Kim, and Mark Hachey. On using truncated sequential probability ratio test boundaries for Monte Carlo implementation of hypothesis tests. Journal of Computational and Graphical Statistics, 16(4):946–967, December 2007. Eibe Frank. Pruning Decision Trees and Lists. PhD thesis, University of Waikato, 2000. Eibe Frank and Ian H. Witten. Using a permutation test for attribute selection in decision trees. In International Conference on Machine Learning, pages 152–160, 1998. Aristides Gionis, Heikki Mannila, Taneli Mielik¨ inen, and Panayiotis Tsaparas. Assessing data a mining results via swap randomization. ACM Trans. Knowl. Discov. Data, 1(3), 2007. 1862 P ERMUTATION T ESTS FOR S TUDYING C LASSIFIER P ERFORMANCE Polina Golland and Bruce Fischl. Permutation tests for classiﬁcation: Towards statistical signiﬁcance in image-based studies. In International Conference on Information Processing and Medical Imaging, pages 330–341, 2003. Polina Golland, Feng Liang, Sayan Mukherjee, and Dmitry Panchenko. Permutation tests for classiﬁcation. In Annual Conference on Learning Theory, pages 501–515, 2005. Phillip I. Good. Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses; Springer series in statistics., volume 2nd. Springer, 2000. Sture Holm. A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics, 6:65–70, 1979. Tailen Hsing, Sanju Attoor, and Edward R. Dougherty. Relation between permutation-test p values and classiﬁer error estimates. Mach. Learn., 52(1-2):11–30, 2003. Anders Isaksson, Mikael Wallman, Hanna G¨ ransson, and Mats G. Gustafsson. Cross-validation o and bootstrapping are unreliable in small sample classiﬁcation. Pattern Recogn. Lett., 29(14): 1960–1965, 2008. David Jensen. Induction with Randomization Testing: Decision-Oriented Analysis of Large Data Sets. PhD thesis, Washington University, St. Louis, Missouri, USA, 1992. Rosalia Maglietta, Annarita D’Addabbo, Ada Piepoli, Francesco Perri, Sabino Liuni, Graziano Pesole, and Nicola Ancona. Selection of relevant genes in cancer diagnosis based on their prediction accuracy. Artif. Intell. Med., 40(1):29–44, 2007. Annette M. Molinaro, Richard Simon, and Ruth M. Pfeiffer. Prediction error estimation: a comparison of resampling methods. Bioinformatics, 21(15):3301–3307, 2005. Markus Ojala and Gemma C. Garriga. Permutation tests for studying classiﬁer performance. In Proceedings of the 9th IEEE International Conference on Data Mining, pages 908–913, 2009. Markus Ojala, Niko Vuokko, Aleksi Kallio, Niina Haiminen, and Heikki Mannila. Randomization methods for assessing data analysis results on real-valued matrices. Statistical Analysis and Data Mining, 2(4):209–230, 2009. Fabrice Rossi and Nathalie Villa. Support vector machine for functional data classiﬁcation. Neurocomputing, 69(7-9):730–742, 2006. Abraham Wald. Sequential tests of statistical hypotheses. The Annals of Mathematical Statistics, 16(2):117–186, 1945. Ian H. Witten and Eibe Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, San Francisco, 2nd edition edition, 2005. 1863