jmlr jmlr2009 jmlr2009-97 jmlr2009-97-reference knowledge-graph by maker-knowledge-mining

97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model

Source: pdf

Author: Jianqing Fan, Richard Samworth, Yichao Wu

Abstract: Variable selection in high-dimensional space characterizes many contemporary problems in scientiﬁc discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan & Lv, 2008) or feature selection using a twosample t-test in high-dimensional classiﬁcation (Tibshirani et al., 2003). Within the context of the linear model, Fan & Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit deﬁnition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case. Even in the least-squares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in high-dimensional classiﬁcation where the popularly used two-sample t-method fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology. Keywords: classiﬁcation, feature screening, generalized linear models, robust regression, feature selection

reference text

Hirotsugu Akaike. A new look at the statistical model identiﬁcation. IEEE Transactions on Automatic Control, 19(6):716–723, 1974. Hussein Almuallim and Thomas G. Diettrich. Learning boolean concepts in the presence of many irrelevant features. Artiﬁcial Intelligence, 69(1–2):279–305, 1994. Anestis Antoniadis and Jianqing Fan. Regularized wavelet approximations (with discussion). J. Amer. Statist. Assoc., 96(455):939-967, 2001. Eric Bair, Trevor Hastie, Debashis Paul and Robert Tibshirani. Prediction by supervised principal components. J. Amer. Statist. Assoc., 101(473):119-137, 2006. Yoshua Bengio and Nicolas Chapados. Extensions to metric based model selection. J. Mach. Learn. Res., 3:1209–1227, 2003. Jinbo Bi, Kristin P. Bennett, Mark J. Embrechts, Curt M. Breneman and Minghu Song. Dimensionality reduction via sparse support vector machines. J. Mach. Learn. Res. 3:1229–1243, 2003. Peter J. Bickel, Ya’acov Ritov and Alexandre Tsybakov. Simultaneous analysis of Lasso and Dantzig selector. Ann. Statist., 37(4):1705–1732, 2009. Emmanuel Candes and Terrence Tao. The Dantzig selector: statistical estimation when p is much larger than n (with discussion). Ann. Statist., 35(6):2313-2404, 2007. 2035 FAN , S AMWORTH AND W U David L. Donoho and Michael Elad. Maximal sparsity representation via L1 Minimization. Proc. Nat. Aca. Sci., 100:2197–2202, 2003. Sandrine Dudoit, Juliet P. Shaffer and Jennifer C. Boldrick. Multiple hypothesis testing in microarray experiments. Statist. Sci., 18:71–103, 2003. Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani. Least angle regression (with discussion). Ann. Statist., 32: 409–499, 2004. Bradley Efron. Microarrays, empirical Bayes and the two-groups model (with discussion). Statist. Sci., 23:1–47, 2008. Jianqing Fan and Yingying Fan. High dimensional classiﬁcation using shrunken independence rule. Ann. Statist, 36(6):3605–2637, 2008. Jianqing Fan and Runze Li. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc., 96:1348–1360, 2001. Jianqing Fan and Jinchi Lv. Sure independence screening for ultra-high dimensional feature space (with discussion). J. Roy. Statist. Soc., Ser. B, 70:849–911, 2008. Jianqing Fan and Heng Peng. On non-concave penalized likelihood with diverging number of parameters. Ann. Statist., 32(3):928–961, 2004. Jianqing Fan and Yi Ren. Statistical analysis of DNA microarray data. Clinical Cancer Res., 12:4469–4473, 2006. Jianqing Fan and Rui Song. Sure Independence Screening in Generalized Linear Models with NPDimensionality. Manuscript, 2009. Yoav Freund, and Robert E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. System Sci., 55(1):119–139. Isabelle Guyon and Andr´ Elisseeff. An introduction to variable and feature selection. J. Mach. e Learn. Res., 3:1157–1182, 2003. Isabelle Guyon, Steve Gunn, Masoud Nikravesh and Lofti Zadeh, editors. Feature Extraction, Foundations and Applications. Springer, New York, 2006. Mark A. Hall. Correlation-based feature selection for discrete and numeric class machine learning. In International Conference on Machine Learning, Stanford, CA, pages 359–366, 2000. Peter Hall, D. M. Titterington, and Jing-Hao Xue. Tiling methods for assessing the inﬂuence of components in a classiﬁer. J. Roy. Statist. Soc., Ser. B, 71(4):783–803, 2009. Trevor Hastie, Robert Tibshirani and Jerome Friedman. The elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, 2001. Peter J. Huber. Robust estimation of location. Ann. Math. Statist., 35:73–101, 1964. 2036 U LTRAHIGH D IMENSIONAL F EATURE S ELECTION Javed Khan, Jun S. Wei, Markus Ringn´ r, Lao H. Saal, Marc Ladanyi, Frank Westermann, Frank e Berthold, Manfred Schwab, Cristina R. Antonescu, Carsten Peterson and Paul S. Meltzer. Classiﬁcation and diagnostic prediction of cancers using gene expression proﬁling and artiﬁcial neural networks. Nature Medicine, 7:673–679, 2001. Igor Kononenko. Estimating attributes: Analysis and extension of RELIEF. In Machine Learning: ECML-94, Springer Berlin/Heidelberg, 1994. Yoonkyung Lee, Yi Lin and Grace Wahba. Multicategory Support Vector Machines, Theory, and Application to the Classiﬁcation of Microarray Data and Satellite Radiance Data. J. Amer. Statist. Assoc., 99(465):67–81, 2004. Huan Liu and Hiroshi Motoda. Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Boston MA, 1998. Yufeng Liu, Xiaotong Shen and Hani Doss. Multicategory ψ-learning and support vector machine: computational tools. J. Computat. Graph. Statist., 14(1):219–236, 2005. Peter McCullagh and John A. Nelder. Generalized Linear Models. Chapman & Hall, London, 1989. Nicolai Meinshausen and Peter B¨ hlmann. High dimensional graphs and variable selection with the u Lasso. The Annals of Statistics, 34(3):1436–1462, 2006. Andr´ Oberthuer, Frank Berthold, Patrick Warnat, Barbara Hero, Yvonne Kahlert, R¨ diger Spitz, e u Karen Ernestus, Rainer K¨ nig, Stefan Haas, Roland Eils, Manfred Schwab, Benedikt Brors, o Frank Westermann, Matthias Fischer. Customized Oligonucleotide Microarray Gene Expression Based Classiﬁcation of Neuroblastoma Patients Outperforms Current Clinical Risk Stratiﬁcation. Journal of Clinical Oncology, 24:5070–5078, 2006. Mee Young Park and Trevor Hastie. L1 -regularization path algorithm for generalized linear models. J. Roy. Statist. Soc. Ser. B, 69(4):659–677, 2007. Debashis Paul, Eric Bair, Trevor Hastie and Robert Tibshirani. “Pre-conditioning” for feature selection and regression in high-dimensional problems. Ann. Statist., 36(4):1595–1618. Gideon Schwarz. Estimating the dimension of a model. Ann. Statist., 6(2):461–464, 1978. Robert Tibshirani. Regression shrinkage and selection via lasso. Jour. Roy. Statist. Soc. B., 58(1):267–288, 1996. Robert Tibshirani, Trevor Hastie, Balasubramanian Narasimhan and Gilbert Chu. Class prediction by nearest shrunken centroids, with applications to DNA microarrays. Statist. Sci., 18(1):104– 117, 2003. Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995. Lei Yu and Huan Liu. Feature selection for high-dimensional data: a fast correlation-based ﬁlter solution. In International Conference on Machine Learning, pages 856–863, Washington DC, USA, 2003. 2037 FAN , S AMWORTH AND W U Hui Zou. The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc., 101(476):1418–1429, 2006. Cun-Hui Zhang. Nearly unbiased variable selection under minimax concave penalty. Ann. Statist., forthcoming. Cun-Hui Zhang and Jian Huang. The sparsity and bias of the LASSO selection in high-dimensional linear regression. Ann. Statist., 36(4):1567–1594. Zheng Zhao and Huan Liu. Searching for interacting features. In Proceedings of the International joint conferences on Artiﬁcial Intelligence, pages 1156–1161, Hyderabad, India, 2007. Peng Zhao and Bin Yu. On model selection consistency of Lasso. J. Machine Learning Res., 7:2541– 2563, 2006. Hui Zou and Runze Li. One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Statist., 36(4):1509-1566, 2008. 2038