jmlr jmlr2009 jmlr2009-91 jmlr2009-91-reference knowledge-graph by maker-knowledge-mining

91 jmlr-2009-Subgroup Analysis via Recursive Partitioning


Source: pdf

Author: Xiaogang Su, Chih-Ling Tsai, Hansheng Wang, David M. Nickerson, Bogong Li

Abstract: Subgroup analysis is an integral part of comparative analysis where assessing the treatment effect on a response is of central interest. Its goal is to determine the heterogeneity of the treatment effect across subpopulations. In this paper, we adapt the idea of recursive partitioning and introduce an interaction tree (IT) procedure to conduct subgroup analysis. The IT procedure automatically facilitates a number of objectively defined subgroups, in some of which the treatment effect is found prominent while in others the treatment has a negligible or even negative effect. The standard CART (Breiman et al., 1984) methodology is inherited to construct the tree structure. Also, in order to extract factors that contribute to the heterogeneity of the treatment effect, variable importance measure is made available via random forests of the interaction trees. Both simulated experiments and analysis of census wage data are presented for illustration. Keywords: CART, interaction, subgroup analysis, random forests


reference text

H. Akaike. Information theory and an extension of the maximum likelihood principle. In B. N. Petrov and F. Czaki, editors, 2nd Int. Symp. Inf. Theory, pages 267–281. Budapest: Akad Kiado, 1973. S. F. Assmann, S. J. Pocock, L. E. Enos, and L. E. Kasten. Subgroup analysis and other (mis)uses of baseline data in clinical trials. Lancet, 255:1064–1069, 2000. L. Breiman. Random forests. Machine Learning, 45:5–32, 2001. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification And Regression Trees. Wadsworth International Group, Belmont, CA, 1984. S.-C. Chow and J.-P. Liu. Design and Analysis of Clinical Trials: Concepts and Methodologies. Hoboken, NJ: Wiley-Interscience, 2004. 157 S U , T SAI , WANG , N ICKERSON AND L I A. Ciampi, J. Thiffault, J.-P. Nakache, and B. Asselain. Stratification by stepwise regression, correspondence analysis and recursive partition. Computational Statistics and Data Analysis, 4:185– 204, 1986. CPMP Working Party on Efficacy on Medicinal Products. Biostatistical methodology in clinical trials in applications for marketing authorisations for medicinal products: Note for guidance. Statistics in Medicine, 14:1659–1682, 1995. M. Gail and R. Simon. Testing for qualitative interactions between treatment effects and patient subsets. Biometrics, 41:362–372, 1985. I. Kononenko and S. J. Hong. Attribute Selection for Modeling. In: Future Generation Computer Systems, 13:181–195, 1997. S. W. Lagakos. The challenge of subgroup analyses - reporting without distorting. The New England Journal of Medicine, 354:1667–1669, 2006. M. Leblanc and J. Crowley. Survival trees by goodness of split. Journal of the American Statistical Association, 88:457–467, 1993 A. D. R. McQuarrie and C.-L. Tsai. Regression and Time Series Model Selection. Singapore: World Scientific, 1998. J. Morgan and J. Sonquist. Problems in the analysis of survey data and a proposal. Journal of the American Statistical Association, 58:415–434, 1963. A. Negassa, A. Ciampi, M. Abrahamowicz, S. Shapiro, and J.-F. Boivin. Tree-structured subgroup analysis for censored survival data: validation of computationally inexpensive model selection criteria. Statistics and Computing, 15:231–239, 2005. G. Schwarz. Estimating the dimension of a model. Annals of Statistics, 6:461–464, 1978. P. Sleight. Debate: Subgroup analyses in clinical trials: Fun to look at, but don’t believe them! Current Controlled Trials on Cardiovascular Medicine, 1:25–27, 2000. X. G. Su, T. Zhou, X. Yan, J. Fan, and S. Yang. Interaction trees with censored survival data. The International Journal of Biostatistics, vol. 4 : Iss. 1, Article 2, 2008. Available at: http://www.bepress.com/ijb/vol4/iss1/2. R. Tibshirani and K. Knight. The covariance inflation criterion for adaptive model selection. Journal of the Royal Statistical Society, Series B, 61:529–546, 1999. L. Torgo. A study on end-cut preference in least squares regression trees. Proceedings of the 10th Portuguese Conference on Artificial Intelligence. Lecture Notes In Computer Science, 2258:104– 115. Springer-Verlag: London, UK, 2001. J. Ye. On measuring and correcting the effects of data mining and model selection. Journal of the American Statistical Association, 93:120–131, 1998. 158