jmlr jmlr2008 jmlr2008-54 jmlr2008-54-reference knowledge-graph by maker-knowledge-mining

54 jmlr-2008-Learning to Select Features using their Properties

Source: pdf

Author: Eyal Krupka, Amir Navot, Naftali Tishby

Abstract: Feature selection is the task of choosing a small subset of features that is sufﬁcient to predict the target labels well. Here, instead of trying to directly determine which features are better, we attempt to learn the properties of good features. For this purpose we assume that each feature is represented by a set of properties, referred to as meta-features. This approach enables prediction of the quality of features without measuring their value on the training instances. We use this ability to devise new selection algorithms that can efﬁciently search for new good features in the presence of a huge number of features, and to dramatically reduce the number of feature measurements needed. We demonstrate our algorithms on a handwritten digit recognition problem and a visual object category recognition problem. In addition, we show how this novel viewpoint enables derivation of better generalization bounds for the joint learning problem of selection and classiﬁcation, and how it contributes to a better understanding of the problem. Speciﬁcally, in the context of object recognition, previous works showed that it is possible to ﬁnd one set of features which ﬁts most object categories (aka a universal dictionary). Here we use our framework to analyze one such universal dictionary and ﬁnd that the quality of features in this dictionary can be predicted accurately by its meta-features. Keywords: feature selection, unobserved features, meta-features

reference text

P.L. Bartlett. The size of the weights is more important than the size of the network. IEEE Transactions on Information Theory, 1998. A. Blum and J. Langford. Pac-mdl bounds. Learning Theory and Kernel Machines, 2003. K. Crammer. Mcsvm_1.0: C code for multiclass svm, 2003. http://www.cis.upenn.edu/∼crammer. D. Decoste and B. Schölkopf. Training invariant support vector machines. Machine Learning, 2002. L. Ein-Dor, O. Zuk, and E. Domany. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proceedings of the National Academy of Sciences, 2006. D. Gabor. Theory of communication. J. IEE, 93:429–459, 1946. R. Gilad-Bachrach, A. Navot, and N. Tishby. Margin based feature selection - theory and algorithms. In International Conference on Machine Learning (ICML), 2004. R. Greiner. Using value of information to learn and classify under hard budgets. In NIPS Workshop on Value of Information in Inference, Learning and Decision-Making, 2005. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 2003. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classiﬁcation using support vector machines. Machine Learning, 46, 2002. M. W. Kadous and C. Sammut. Classiﬁcation of multivariate time series and structured data using constructive induction. Machine Learning, 2005. M. J. Kearns and U. V. Vazirani. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA, USA, 1994. R. Kohavi and G.H. John. Wrapper for feature subset selection. Artiﬁcial Intelligence, 97(1-2): 273–324, 1997. E. Krupka and N. Tishby. Generalization from observed to unobserved features by clustering. Journal of Machine Learning Research, 2008. E. Krupka and N. Tishby. Incorporating prior knowledge on features into learning. In International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2007. E. Kussul, T. Baidyk, L. Kasatkina, and V. Lukovich. Rosenblatt perceptrons for handwritten digit recognition. In Int’l Joint Conference on Neural Networks, pages 1516–20, 2001. F. Lauer and G. Bloch. Incorporating prior knowledge in support vector machines for classiﬁcation: a review. Submitted to Neurocomputing, 2006. Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November 1998. 2375 K RUPKA , NAVOT AND T ISHBY S. Lee, V. Chatalbashev, D. Vickrey, and D. Koller. Learning a meta-level prior for feature relevance from multiple related tasks. In International Conference on Machine Learning (ICML), 2007. K. Levi, M. Fink, and Y. Weiss. Learning from a small number of training examples by exploiting object categories. LCVPR04 workshop on Learning in Computer Vision, 2004. D. Lizotte, O. Madani, and R. Greiner. Budgeted learning of naive-bayes classiﬁers. In Conference on Uncertainty in Artiﬁcial Intelligence (UAI), 2003. A. Navot, L. Shpigelman, N. Tishby, and E. Vaadia. Nearest neighbor based feature selection for regression and its application to neural activity. In Advances in Neural Information Processing Systems (NIPS), 2006. J. R. Quinlan. Induction of decision trees. In Jude W. Shavlik and Thomas G. Dietterich, editors, Readings in Machine Learning. Morgan Kaufmann, 1990. R. Raina, A.Y. Ng, and D. Koller. Constructing informative priors using transfer learning. In Proc. Twenty-Third International Conference on Machine Learning, 2006. T. Serre, L. Wolf, and T. Poggio. Object recognition with features inspired by visual cortex. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), 2005. T. Serre, L. Wolf, S. Bileschi, M. Riesenhuber, and T. Poggio. Robust object recognition with cortex-like mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2007. S. Shalev-Shwartz and Y. Singer. Efﬁcient learning of label ranking by soft projections onto polyhedra. Journal of Machine Learning Research, 2006. P. Simard, Y. LeCun, J. S. Denker, and B. Victorri. Transformation invariance in pattern recognitiontangent distance and tangent propagation. In Neural Networks: Tricks of the Trade, 1996. P. Y. Simard, Y. A. Le Cun, and Denker. Efﬁcient pattern recognition using a new transformation distance. In Advances in Neural Information Processing Systems (NIPS). 1993. B. Taskar, M. F. Wong, and D. Koller. Learning on the test data: Leveraging unseen features. In International Conference on Machine Learning (ICML), 2003. S. Ullman, M. Vidal-Naquet, and E. Sali. Visual features of intermediate complexity and their use in classiﬁcation. Nature Neuroscience, 2002. V. N. Vapnik. The Nature Of Statistical Learning Theory. Springer-Verlag, 1995. V. N. Vapnik. Statistical Learning Theory. Wiley, 1998. J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, and V. Vapnik. Feature selection for svms. In Advances in Neural Information Processing Systems (NIPS), 2000. 2376