nips nips2002 nips2002-89 nips2002-89-reference knowledge-graph by maker-knowledge-mining

89 nips-2002-Feature Selection by Maximum Marginal Diversity

Source: pdf

Author: Nuno Vasconcelos

Abstract: We address the question of feature selection in the context of visual recognition. It is shown that, besides efﬁcient from a computational standpoint, the infomax principle is nearly optimal in the minimum Bayes error sense. The concept of marginal diversity is introduced, leading to a generic principle for feature selection (the principle of maximum marginal diversity) of extreme computational simplicity. The relationships between infomax and the maximization of marginal diversity are identiﬁed, uncovering the existence of a family of classiﬁcation procedures for which near optimal (in the Bayes error sense) feature selection does not require combinatorial search. Examination of this family in light of recent studies on the statistics of natural images suggests that visual recognition problems are a subset of it.

reference text

[1] S. Basu, C. Micchelli, and P. Olsen. Maximum Entropy and Maximum Likelihood Criteria for Feature Selection from Multivariate Data. In Proc. IEEE International Symposium on Circuits and Systems, Geneva, Switzerland,2000.

[2] A. Bell and T. Sejnowski. An Information Maximisation Approach to Blind Separation and Blind Deconvolution. Neural Computation, 7(6):1129–1159, 1995.

[3] B. Bonnlander and A. Weigand. Selecting Input Variables using Mutual Information and Nonparametric Density Estimation. In Proc. IEEE International ICSC Symposium on Artiﬁcial Neural Networks, Tainan,Taiwan,1994.

[4] D. Erdogmus and J. Principe. Information Transfer Through Classiﬁers and its Relation to Probability of Error. In Proc. of the International Joint Conference on Neural Networks, Washington, 2001.

[5] J. Huang and D. Mumford. Statistics of Natural Images and Models. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, Colorado, 1999.

[6] A. Jain and D. Zongker. Feature Selection: Evaluation, Application, and Small Sample Performance. IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(2):153–158, February 1997.

[7] R. Linsker. Self-Organization in a Perceptual Network. IEEE Computer, 21(3):105–117, March 1988.

[8] J. Portilla and E. Simoncelli. Texture Modeling and Synthesis using Joint Statistics of Complex Wavelet Coefﬁcients. In IEEE Workshop on Statistical and Computational Theories of Vision, Fort Collins, Colorado, 1999.

[9] J. Principe, D. Xu, and J. Fisher. Information-Theoretic Learning. In S. Haykin, editor, Unsupervised Adaptive Filtering, Volume 1: Blind-Souurce Separation. Wiley, 2000.

[10] G. Saon and M. Padmanabhan. Minimum Bayes Error Feature Selection for Continuous Speech Recognition. In Proc. Neural Information Proc. Systems, Denver, USA, 2000.

[11] K. Torkolla and W. Campbell. Mutual Information in Learning Feature Transforms. In Proc. International Conference on Machine Learning, Stanford, USA, 2000.

[12] G. Trunk. A Problem of Dimensionality: a Simple Example. IEEE Trans. on Pattern. Analysis and Machine Intelligence, 1(3):306–307, July 1979.

[13] N. Vasconcelos. Feature Selection by Maximum Marginal Diversity: Optimality and Implications for Visual Recognition. In submitted, 2002.

[14] N. Vasconcelos and G. Carneiro. What is the Role of Independence for Visual Regognition? In Proc. European Conference on Computer Vision, Copenhagen, Denmark, 2002.

[15] H. Yang and J. Moody. Data Visualization and Feature Selection: New Algorithms for Nongaussian Data. In Proc. Neural Information Proc. Systems, Denver, USA, 2000.