jmlr jmlr2005 jmlr2005-50 jmlr2005-50-reference knowledge-graph by maker-knowledge-mining

50 jmlr-2005-Learning with Decision Lists of Data-Dependent Features

Source: pdf

Author: Mario Marchand, Marina Sokolova

Abstract: We present a learning algorithm for decision lists which allows features that are constructed from the data and allows a trade-off between accuracy and complexity. We provide bounds on the generalization error of this learning algorithm in terms of the number of errors and the size of the classiﬁer it ﬁnds on the training data. We also compare its performance on some natural data sets with the set covering machine and the support vector machine. Furthermore, we show that the proposed bounds on the generalization error provide effective guides for model selection. Keywords: decision list machines, set covering machines, sparsity, data-dependent features, sample compression, model selection, learning theory

reference text

Martin Anthony. Generalization error bounds for threshold decision lists. Journal of Machine Learning Research, 5:189–217, 2004. Avrim Blum and Mona Singh. Learning functions of k terms. In COLT ’90: Proceedings of the third annual workshop on Computational learning theory, pages 144–153. Morgan Kaufmann Publishers Inc., 1990. ISBN 1-55860-146-5. Anselm Blumer, Andrzej Ehrenfeucht, David Haussler, and Manfred K. Warmuth. Occam’s razor. Information Processing Letters, 24:377–380, 1987. Aditi Dhagat and Lisa Hellerstein. PAC learning with irrelevant attributes. In Proc. of the 35rd Annual Symposium on Foundations of Computer Science, pages 64–74. IEEE Computer Society Press, Los Alamitos, CA, 1994. Andrzej Ehrenfeucht and David Haussler. Learning decision trees from random examples. Information and Computation, 82:231–246, 1989. Thomas Eiter, Toshihide Ibaraki, and Kazuhiso Makino. Decision lists and related boolean functions. Theoretical Computer Science, 270:493–524, 2002. Sally Floyd and Manfred K. Warmuth. Sample compression, learnability, and the VapnikChervonenkis dimension. Machine Learning, 21(3):269–304, 1995. Thore Graepel, Ralf Herbrich, and John Shawe-Taylor. Generalisation error bounds for sparse linear classiﬁers. In Proceedings of the Thirteenth Annual Conference on Computational Learning Theory, pages 298–303, 2000. Thore Graepel, Ralf Herbrich, and Robert C. Williamson. From margin to sparsity. In Advances in neural information processing systems, volume 13, pages 210–216, 2001. Ralf Herbrich. Learning Kernel Classiﬁers. MIT Press, Cambridge, Massachusetts, 2002. Geoffrey Hinton and Michael Revow. Using pairs of data-points to deﬁne splits for decision trees. Advances in Neural Information Processing Systems 8, pages 507–513, 1996. 450 D ECISION L ISTS OF DATA -D EPENDENT F EATURES Jyrki Kivinen, Heikki Mannila, and Esko Ukkonen. Learning hierarchical rule sets. In Proceedings of the ﬁfth annual ACM workshop on Computational Learning Theory, pages 37–44. Morgan Kaufmann, 1992. Nick Littlestone. Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm. Machine Learning, 2(4):285–318, 1988. Nick Littlestone and Manfred K. Warmuth. Relating data compression and learnability. Technical report, University of California Santa Cruz, 1986. Mario Marchand and Mostefa Golea. On learning simple neural concepts: from halfspace intersections to neural decision lists. Network: Computation in Neural Systems, 4:67–85, 1993. Mario Marchand, Mohak Shah, John Shawe-Taylor, and Marina Sokolova. The set covering machine with data-dependent half-spaces. In Proceedings of the Twentieth International Conference on Machine Learning(ICML’2003), pages 520–527. Morgan Kaufmann, 2003. Mario Marchand and John Shawe-Taylor. Learning with the set covering machine. Proceedings of the Eighteenth International Conference on Machine Learning (ICML 2001), pages 345–352, 2001. Mario Marchand and John Shawe-Taylor. The set covering machine. Journal of Machine Learning Reasearch, 3:723–746, 2002. Shahar Mendelson. Rademacher averages and phase transitions in Glivenko-Cantelli class. IEEE Transactions on Information Theory, 48:251–263, 2002. Ziv Nevo and Ran El-Yaniv. On online learning of decision lists. Journal of Machine Learning Reasearch, 3:271–301, 2002. Ronald L. Rivest. Learning decision lists. Machine Learning, 2:229–246, 1987. Marina Sokolova, Mario Marchand, Nathalie Japkowicz, and John Shawe-Taylor. The decision list machine. In Advances in Neural Information Processing Systems(NIPS’2002), volume 15, pages 921–928. The MIT Press, 2003. Vladimir N. Vapnik. Statistical Learning Theory. Wiley, New-York, NY, 1998. 451