jmlr jmlr2009 jmlr2009-91 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiaogang Su, Chih-Ling Tsai, Hansheng Wang, David M. Nickerson, Bogong Li
Abstract: Subgroup analysis is an integral part of comparative analysis where assessing the treatment effect on a response is of central interest. Its goal is to determine the heterogeneity of the treatment effect across subpopulations. In this paper, we adapt the idea of recursive partitioning and introduce an interaction tree (IT) procedure to conduct subgroup analysis. The IT procedure automatically facilitates a number of objectively defined subgroups, in some of which the treatment effect is found prominent while in others the treatment has a negligible or even negative effect. The standard CART (Breiman et al., 1984) methodology is inherited to construct the tree structure. Also, in order to extract factors that contribute to the heterogeneity of the treatment effect, variable importance measure is made available via random forests of the interaction trees. Both simulated experiments and analysis of census wage data are presented for illustration. Keywords: CART, interaction, subgroup analysis, random forests
Reference: text
sentIndex sentText sentNum sentScore
1 Bureau of Labor Statistics Washington, DC 20212, USA Editor: Saharon Rosset Abstract Subgroup analysis is an integral part of comparative analysis where assessing the treatment effect on a response is of central interest. [sent-17, score-0.264]
2 Its goal is to determine the heterogeneity of the treatment effect across subpopulations. [sent-18, score-0.369]
3 In this paper, we adapt the idea of recursive partitioning and introduce an interaction tree (IT) procedure to conduct subgroup analysis. [sent-19, score-0.858]
4 The IT procedure automatically facilitates a number of objectively defined subgroups, in some of which the treatment effect is found prominent while in others the treatment has a negligible or even negative effect. [sent-20, score-0.562]
5 Also, in order to extract factors that contribute to the heterogeneity of the treatment effect, variable importance measure is made available via random forests of the interaction trees. [sent-23, score-0.757]
6 Keywords: CART, interaction, subgroup analysis, random forests 1. [sent-25, score-0.435]
7 Introduction In comparative studies where two or more treatments are compared, subgroup analysis emerges after the overall assessment of the treatment effect and plays an important role in determining whether and how the treatment effect (i. [sent-26, score-0.858]
8 , comparison among treatments) varies across subgroups induced by covariates. [sent-28, score-0.28]
9 S U , T SAI , WANG , N ICKERSON AND L I to know if there are subgroups of trial participants who are more (or less) likely to be helped (or harmed) by the intervention under investigation. [sent-31, score-0.28]
10 Subgroup analysis helps explore the heterogeneity of the treatment effect and extract the maximum amount of information from the available data. [sent-32, score-0.401]
11 (2000), 70% of trials published over a threemonth period in four leading medical journals involved subgroup analyses. [sent-34, score-0.374]
12 The research questions in subgroup analysis can be either pre-planned or raised in a post-hoc manner. [sent-35, score-0.33]
13 If there is a prior hypothesis of the treatment effect being different in particular subgroups, then this hypothesis and its assessment should be part of the planned study design (CPMP, 1995). [sent-36, score-0.264]
14 At the same time, subgroup analysis is also often used for post-hoc exploratory identification of unusual or unexpected results (Chow and Liu, 2004). [sent-37, score-0.33]
15 Suppose, for example, that it is of interest to investigate whether the treatment effect is consistent among three age groups: young, middleaged, and older individuals. [sent-38, score-0.326]
16 The evaluation is formally approached by means of an interaction test between treatment and age. [sent-39, score-0.409]
17 If the resulting test is significant, multiple comparisons are then used to find out further details about the magnitude and direction of the treatment effect within each age group. [sent-40, score-0.326]
18 Limitations of traditional subgroup analysis have been extensively noted (see, e. [sent-42, score-0.33]
19 First of all, the subgroups themselves, as well as the number of subgroups to be examined, are specified by the investigator beforehand in the current practice of subgroup analysis, which renders subgroup analysis a highly subjective process. [sent-46, score-1.22]
20 Even for the field expert, it is a daunting task to determine which specific subgroups should be used in subgroup analysis. [sent-47, score-0.61]
21 For example, one may fail to identify a subgroup of great prospective interest or intentionally avoid reporting subgroups where the investigational treatment is found unsuccessful or even potentially harmful. [sent-49, score-0.867]
22 Moreover, significance testing is the main approach in subgroup analysis. [sent-51, score-0.33]
23 However, a large number of subgroups inevitably causes concerns related to multiplicity and lack of power. [sent-53, score-0.315]
24 In this paper, we propose a data-driven tree procedure, labelled as “interaction trees” (IT), to explore the heterogeneity structure of the treatment effect across a number of subgroups that are objectively defined in a post hoc manner. [sent-54, score-0.907]
25 Their pruning idea for tree size selection has become and remains the current standard in constructing tree models. [sent-59, score-0.505]
26 In Section 4, we apply the IT procedure to analyze a wage data set, in which the goal is to determine whether or not women are underpaid or overpaid as compared to their male counterparts and, if so, by what amount. [sent-63, score-0.447]
27 Tree-Structured Subgroup Analysis We consider a study designed to assess a binary treatment effect on a continuous response or output while adjusting or controlling for a number of covariates. [sent-66, score-0.264]
28 , n}, where yi is the continuous response; trti is the binary treatment indicator taking values 1 or 0; and xi = (xi1 , . [sent-73, score-0.349]
29 Our goal in subgroup analysis is to find out whether there exist subgroups of individuals in which the treatment shows heterogeneous effects, and if so, how the treatment effect varies across them. [sent-77, score-1.1]
30 Applying a tree procedure to guide the subgroup analysis is rather intuitive. [sent-78, score-0.59]
31 First, subgroup analysis essentially involves the interaction between the treatment and the covariates. [sent-79, score-0.739]
32 It is often hard to determine whether or not interaction really exists among variables for a given tree structure. [sent-83, score-0.406]
33 We shall seeks ways that enable us to explicitly assess the interaction between the treatment and the covariates. [sent-84, score-0.409]
34 By recursively partitioning the data into two subgroups that show the greatest heterogeneity in treatment effect, we are able to optimize the subgroup analysis and make it more efficient in representing the heterogeneity structure of the treatment effect. [sent-86, score-1.357]
35 Thirdly, the tree procedure is objective, data-driven, and automated. [sent-87, score-0.26]
36 The grouping strategy and the number of subgroups are automatically determined by the procedure. [sent-88, score-0.28]
37 The proposed method would result in a set of objectively recognized and mutually exclusive subgroups, ranking from the most effective to the least effective in terms of treatment effect. [sent-89, score-0.261]
38 (2008), who studied treestructured subgroup analysis in the context of censored survival data, are the only previous works along similar lines to our proposal. [sent-92, score-0.416]
39 , 1984) convention, which consists of three major steps: (1) growing a large initial tree; (2) a pruning algorithm; and (3) a validation method for determining the best tree size. [sent-94, score-0.282]
40 Once a final tree structure is obtained, the subgroups are naturally induced by its terminal nodes. [sent-95, score-0.653]
41 To achieve better efficiency, an amalgamation algorithm is used to merge terminal nodes that show homogenous treatment effects. [sent-96, score-0.482]
42 In addition, we adopt the variable importance technique in the context of random forest to extract covariates that exhibit important interactions with the treatment. [sent-97, score-0.304]
43 Observations answering “yes” go to the left child node tL and observations answering “no” go to the right child node t R . [sent-102, score-0.346]
44 When X has many distinct categories, one may place them into order according to the treatment effect estimate within each category and then proceed as if X is ordinal. [sent-108, score-0.264]
45 To evaluate the heterogeneity of the treatment effect between t L and tR , we compare (µL − µL ) with (µR − µR ) so that the interaction between split s and treatment is under investigation. [sent-113, score-0.822]
46 Remark 1: It can be easily seen that t(s) given in (1) is equivalent to the t test for testing H0 : γ = 0 in the following threshold model (s) (s) yi = β0 + β1 · trti + δ · zi + γ · trti · zi + εi , (2) (s) where zi = 1{Xi ≤c} is the binary variable associated with split s. [sent-122, score-0.29]
47 2 Pruning The final tree could be any subtree of T0 . [sent-131, score-0.309]
48 Repeating this procedure yields a nested sequence of subtrees TM · · · Tm Tm−1 · · · T1 T0 , where TM is the null tree with the root node only and means “is a subtree of”. [sent-142, score-0.453]
49 3 Selecting the Best-Sized Subtree The final tree will be selected from the nested subtree sequence {Tm : m = 0, 1, . [sent-144, score-0.309]
50 Existence of the overall interaction between the treatment and the covariates can be roughly assessed by inspecting whether a null final tree structure is obtained. [sent-161, score-0.754]
51 Unlike con145 S U , T SAI , WANG , N ICKERSON AND L I ventional subgroup analysis, the subgroups obtained from the IT procedure are mutually exclusive. [sent-162, score-0.647]
52 Due to the subjectivity in determining the subgroups and multiplicity emerging from significance testings across subgroups, it is generally agreed that subgroup analysis should be regarded as exploratory and hypothesis-generating. [sent-163, score-0.645]
53 To summarize the terminal nodes, one can compute the average responses for both treatments within each terminal node, as well as their relative differences or ratios and the associated standard errors. [sent-165, score-0.3]
54 It happens often that the treatment shows homogeneous effects for entirely different causal reasons in terminal nodes stemming from different branches. [sent-169, score-0.436]
55 A smaller number of subgroups are easier to summarize and comprehend and more efficient to represent the heterogeneity structure for the treatment effect. [sent-172, score-0.611]
56 The pair showing the least heterogeneity in treatment effect are then merged together. [sent-175, score-0.369]
57 The same procedure is executed iteratively until all the remaining subgroups display outstanding heterogeneity. [sent-176, score-0.317]
58 In the end, one can sort the final subgroups based on the strength of the treatment effect, from the most effective to the least effective. [sent-177, score-0.506]
59 Thus, we suggest that amalgamation be executed with data (L1 + L2 ) that pool the learning sample L1 and the validation sample L2 together, so that the resultant subgroups can be validated using the test sample L3 . [sent-181, score-0.364]
60 In the context of subgroup analysis, a covariate is called an effect-modifier of the treatment if it strongly interacts with the treatment. [sent-185, score-0.632]
61 Variable importance measure helps answer questions such as which features or predictors are important in modifying the treatment effect. [sent-186, score-0.332]
62 While there are many methods available for extracting variable importance information, we propose an algorithm analogous to the procedure used in random forests (Breiman, 2001), which is among the newest and most promising developments in this regards. [sent-188, score-0.248]
63 • Based on Lb , grow a large IT tree Tb by searching over m0 randomly selected covariates at each split. [sent-195, score-0.345]
64 For each IT tree Tb , the b-th out-of-bag sample (denoted as L − Lb ), which contains all observations that are not in Lb , is sent down Tb to compute the interaction measure G(Tb ). [sent-215, score-0.406]
65 147 S U , T SAI , WANG , N ICKERSON AND L I Model A B C D E F Form Y = 2 + 2 · trt + 2 Z1 + 2 Z2 + ε Y = 2 + 2 · trt + 2 Z1 + 2 Z2 + 2 · trt · Z1 Z2 + ε Y = 2 + 2 · trt + 2 Z1 + 2 Z2 + 2 · trt · Z1 + 2 · trt · Z2 + ε Y = 10 + 10 · trt · exp{(X1 − 0. [sent-236, score-1.722]
66 5)2 } + ε Y = 2 + 2 · trt + 2 Z1 + 2 Z2 + 2 · trt · Z1 + 2 · trt · Z2 + ε Y = 2 + 2 · trt + 2 Z1 + 2 Z2 + 2 · trt · Z1 + 2 · trt · Z2 + ε Error Distribution N (0, 1) N (0, 1) N (0, 1) N (0, 1)√ √ Unif(− 3, 3) exp(1) Table 1: Models used for assessing the performance of the IT procedure. [sent-238, score-1.476]
67 Model B involves a second-order interaction between the treatment and two terms of thresholds, both at 0. [sent-244, score-0.409]
68 If the IT procedure works, a tree structure with three terminal nodes should be selected. [sent-246, score-0.47]
69 A large tree is expected in order to represent the interaction structure in this case. [sent-249, score-0.406]
70 The relative frequencies of the final tree sizes selected by the IT procedure are presented in Table 2. [sent-254, score-0.26]
71 , the final tree selected by the IT procedure is split by X1 and X2 and only by them). [sent-259, score-0.304]
72 The IT procedure correctly selects the null tree structure at least 83. [sent-262, score-0.26]
73 For models B-F, the IT procedure also successfully identifies the true final tree structure and selects the desired splitting variables a majority of the time. [sent-271, score-0.26]
74 0 Model A Table 2: Relative frequencies (in percentages) of the final tree size identified by the interaction tree (IT) procedure in 200 runs. [sent-475, score-0.666]
75 selection approach, in which the importance of a covariate, say, X, is determined by the p-value for testing H0 : β3 = 0 in the interaction model y = β0 + β1 x + β2 trt + β3 trt · x + ε. [sent-476, score-0.781]
76 09 importance Model A − I X1 X2 X3 X4 X1 X3 X4 X1 Model E − II X2 X3 X4 X1 Model F − I X2 X3 X4 Model F − II X2 X3 X4 8 6 0 2 4 logworth 0. [sent-505, score-0.275]
77 With a data set consisting of 1200 observations generated from Model A, Graph A-I plots the variable importance score computed from 500 random interaction trees while Graph A-II plots the logworth of the p-value from the simple feature selection method. [sent-512, score-0.517]
78 A public policy advocate would be very interested in specific subgroups of the working population where the pay gap between sexes is still dominant. [sent-538, score-0.311]
79 A large initial tree with 33 terminal nodes is constructed and pruned using data in L 1 . [sent-560, score-0.433]
80 For the sake of illustration, the best tree structure with 9 terminal nodes as well as some related summary statistics are given in Figure 3. [sent-563, score-0.433]
81 To merge those terminal nodes among which the wage discrepancies due to gender are not significantly different, we run the amalgamation algorithm with the pooled data (L1 + L2 ). [sent-564, score-0.387]
82 It results in four final subgroups, which are then ranked as I-IV according to the ratio of women versus men in terms of average wage. [sent-565, score-0.277]
83 509 Figure 3: The best-sized interaction tree for the CPS data selected by BIC. [sent-583, score-0.406]
84 This specification was also made available to each terminal node in Figure 3. [sent-593, score-0.257]
85 The two-sample t test for comparing the average wages between men and women is also presented. [sent-596, score-0.339]
86 Figure 3 indicates that the wage disparity between men and women varies with their occupation (X5 ), industry of the job (X6 ), household compositional situation (X11 ), and age (X1 ). [sent-603, score-0.656]
87 The sixteen covariates are age (X1 ), workclass (X2 ), education (X3 ), marital (X4 ), industry (X5 ), occupation (X6 ), race (X7 ), union (X8 ), fulltime (X9 ), tax. [sent-613, score-0.475]
88 hour while the average wage of women is only $8. [sent-620, score-0.316]
89 Interestingly the IT tree has also identified a subgroup, Group IV, in which men are underpaid compared to women. [sent-622, score-0.377]
90 0001 Table 4: Summary statistics for the four final subgroups of the CPS data. [sent-646, score-0.28]
91 For example, the IT structure can help identify the most and least effective subgroups for the investigational medicine. [sent-663, score-0.311]
92 If the new medicine shows an overall effect that is significant, and, if even in the least effective subgroup under examination, it does not present any harmful side effects, then its release may be endorsed without reservation. [sent-664, score-0.402]
93 In trials where the proposed compound is not found to be significantly effective, the tree-structured subgroup analysis may identify sub-populations that contribute to the failure of the compound. [sent-665, score-0.374]
94 We believe that efforts along these lines make subgroup analysis an efficient and valuable tool for research in many application fields. [sent-667, score-0.33]
95 Categorical Splits The following theorem provides motivation and justification for the computationally efficient strategy of “ordinalizing” categorical covariates in interaction trees as discussed in Section 2. [sent-671, score-0.487]
96 Then any subset A ⊂ C, together with its complement A = C − A, induces a partition of the data into tL = {(yi , trti , xi ) : xil ∈ A} and tR = {(yi , trti , xi ) : xil ∈ A } . [sent-681, score-0.324]
97 Let µck = E(y | X = c, trt = k) for any element c ∈ C and k = 0, 1. [sent-685, score-0.246]
98 Testing for qualitative interactions between treatment effects and patient subsets. [sent-760, score-0.27]
99 The challenge of subgroup analyses - reporting without distorting. [sent-771, score-0.33]
100 Tree-structured subgroup analysis for censored survival data: validation of computationally inexpensive model selection criteria. [sent-797, score-0.416]
wordName wordTfidf (topN-words)
[('subgroup', 0.33), ('subgroups', 0.28), ('trt', 0.246), ('treatment', 0.226), ('tree', 0.223), ('women', 0.185), ('interaction', 0.183), ('logworth', 0.169), ('tb', 0.161), ('terminal', 0.15), ('cps', 0.139), ('ickerson', 0.139), ('sai', 0.139), ('wage', 0.131), ('categorical', 0.123), ('trti', 0.123), ('covariates', 0.122), ('tm', 0.11), ('node', 0.107), ('importance', 0.106), ('forests', 0.105), ('heterogeneity', 0.105), ('ecursive', 0.105), ('ubgroup', 0.105), ('lb', 0.097), ('men', 0.092), ('subtree', 0.086), ('partitioning', 0.085), ('services', 0.085), ('household', 0.077), ('covariate', 0.076), ('breiman', 0.071), ('nalysis', 0.068), ('wang', 0.067), ('child', 0.066), ('occupation', 0.065), ('labor', 0.065), ('marital', 0.065), ('cart', 0.065), ('age', 0.062), ('hansheng', 0.062), ('underpaid', 0.062), ('wages', 0.062), ('nodes', 0.06), ('trees', 0.059), ('pruning', 0.059), ('qi', 0.055), ('yr', 0.052), ('su', 0.052), ('yl', 0.05), ('tl', 0.05), ('ln', 0.05), ('bootstrap', 0.048), ('amalgamation', 0.046), ('assmann', 0.046), ('censored', 0.046), ('ciampi', 0.046), ('forestry', 0.046), ('householder', 0.046), ('leblanc', 0.046), ('preset', 0.046), ('race', 0.046), ('ii', 0.046), ('split', 0.044), ('clinical', 0.044), ('industry', 0.044), ('trials', 0.044), ('interactions', 0.044), ('bureau', 0.043), ('di', 0.043), ('education', 0.04), ('survival', 0.04), ('tsai', 0.039), ('xil', 0.039), ('th', 0.038), ('resultant', 0.038), ('effect', 0.038), ('procedure', 0.037), ('objectively', 0.035), ('multiplicity', 0.035), ('nal', 0.035), ('medicine', 0.034), ('male', 0.032), ('census', 0.032), ('extract', 0.032), ('pay', 0.031), ('actuarial', 0.031), ('bogong', 0.031), ('citizenship', 0.031), ('cpmp', 0.031), ('crowley', 0.031), ('entertainment', 0.031), ('fulltime', 0.031), ('gail', 0.031), ('investigational', 0.031), ('lagakos', 0.031), ('mcquarrie', 0.031), ('negassa', 0.031), ('nickerson', 0.031), ('permissible', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning
Author: Xiaogang Su, Chih-Ling Tsai, Hansheng Wang, David M. Nickerson, Bogong Li
Abstract: Subgroup analysis is an integral part of comparative analysis where assessing the treatment effect on a response is of central interest. Its goal is to determine the heterogeneity of the treatment effect across subpopulations. In this paper, we adapt the idea of recursive partitioning and introduce an interaction tree (IT) procedure to conduct subgroup analysis. The IT procedure automatically facilitates a number of objectively defined subgroups, in some of which the treatment effect is found prominent while in others the treatment has a negligible or even negative effect. The standard CART (Breiman et al., 1984) methodology is inherited to construct the tree structure. Also, in order to extract factors that contribute to the heterogeneity of the treatment effect, variable importance measure is made available via random forests of the interaction trees. Both simulated experiments and analysis of census wage data are presented for illustration. Keywords: CART, interaction, subgroup analysis, random forests
Author: Petra Kralj Novak, Nada Lavrač, Geoffrey I. Webb
Abstract: This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different rule learning heuristics, and use different means for selecting subsets of induced patterns. This paper contributes a novel understanding of these subareas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule discovery task and by exploring the apparent differences between the approaches. It also shows that various rule learning heuristics used in CSM, EPM and SD algorithms all aim at optimizing a trade off between rule coverage and precision. The commonalities (and differences) between the approaches are showcased on a selection of best known variants of CSM, EPM and SD algorithms. The paper also provides a critical survey of existing supervised descriptive rule discovery visualization methods. Keywords: descriptive rules, rule learning, contrast set mining, emerging patterns, subgroup discovery
Author: Eugene Tuv, Alexander Borisov, George Runger, Kari Torkkola
Abstract: Predictive models benefit from a compact, non-redundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using tree-based ensembles to generate a compact subset of non-redundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach. Keywords: trees, resampling, importance, masking, residuals
4 0.084615238 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures
Author: Babak Shahbaba, Radford Neal
Abstract: We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data. Keywords: mixture models, Dirichlet process, classification
5 0.043913774 23 jmlr-2009-Discriminative Learning Under Covariate Shift
Author: Steffen Bickel, Michael Brückner, Tobias Scheffer
Abstract: We address classification problems for which the training instances are governed by an input distribution that is allowed to differ arbitrarily from the test distribution—problems also referred to as classification under covariate shift. We derive a solution that is purely discriminative: neither training nor test distribution are modeled explicitly. The problem of learning under covariate shift can be written as an integrated optimization problem. Instantiating the general optimization problem leads to a kernel logistic regression and an exponential model classifier for covariate shift. The optimization problem is convex under certain conditions; our findings also clarify the relationship to the known kernel mean matching procedure. We report on experiments on problems of spam filtering, text classification, and landmine detection. Keywords: covariate shift, discriminative learning, transfer learning
6 0.037334871 89 jmlr-2009-Strong Limit Theorems for the Bayesian Scoring Criterion in Bayesian Networks
7 0.034758501 36 jmlr-2009-Fourier Theoretic Probabilistic Inference over Permutations
8 0.034717198 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
9 0.033899497 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes
10 0.033502217 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning
11 0.032752629 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification
12 0.030700928 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory
13 0.030419307 43 jmlr-2009-Java-ML: A Machine Learning Library (Machine Learning Open Source Software Paper)
14 0.029612914 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
15 0.029412104 96 jmlr-2009-Transfer Learning for Reinforcement Learning Domains: A Survey
16 0.028496664 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning
17 0.028073754 31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling
18 0.028022164 67 jmlr-2009-Online Learning with Sample Path Constraints
19 0.027771695 15 jmlr-2009-Cautious Collective Classification
20 0.026672909 50 jmlr-2009-Learning When Concepts Abound
topicId topicWeight
[(0, 0.149), (1, -0.114), (2, 0.087), (3, 0.044), (4, 0.044), (5, -0.117), (6, -0.083), (7, 0.116), (8, 0.12), (9, 0.055), (10, 0.125), (11, -0.313), (12, 0.328), (13, 0.281), (14, 0.161), (15, 0.208), (16, 0.099), (17, 0.147), (18, 0.112), (19, -0.005), (20, 0.108), (21, -0.082), (22, -0.049), (23, 0.049), (24, 0.136), (25, -0.042), (26, -0.015), (27, 0.088), (28, -0.074), (29, -0.011), (30, 0.018), (31, -0.015), (32, 0.08), (33, -0.04), (34, 0.076), (35, -0.062), (36, -0.099), (37, -0.066), (38, 0.044), (39, 0.042), (40, 0.042), (41, -0.062), (42, -0.004), (43, -0.042), (44, -0.045), (45, -0.041), (46, 0.06), (47, 0.014), (48, 0.079), (49, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.96544868 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning
Author: Xiaogang Su, Chih-Ling Tsai, Hansheng Wang, David M. Nickerson, Bogong Li
Abstract: Subgroup analysis is an integral part of comparative analysis where assessing the treatment effect on a response is of central interest. Its goal is to determine the heterogeneity of the treatment effect across subpopulations. In this paper, we adapt the idea of recursive partitioning and introduce an interaction tree (IT) procedure to conduct subgroup analysis. The IT procedure automatically facilitates a number of objectively defined subgroups, in some of which the treatment effect is found prominent while in others the treatment has a negligible or even negative effect. The standard CART (Breiman et al., 1984) methodology is inherited to construct the tree structure. Also, in order to extract factors that contribute to the heterogeneity of the treatment effect, variable importance measure is made available via random forests of the interaction trees. Both simulated experiments and analysis of census wage data are presented for illustration. Keywords: CART, interaction, subgroup analysis, random forests
Author: Petra Kralj Novak, Nada Lavrač, Geoffrey I. Webb
Abstract: This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different rule learning heuristics, and use different means for selecting subsets of induced patterns. This paper contributes a novel understanding of these subareas of data mining by presenting a unified terminology, by explaining the apparent differences between the learning tasks as variants of a unique supervised descriptive rule discovery task and by exploring the apparent differences between the approaches. It also shows that various rule learning heuristics used in CSM, EPM and SD algorithms all aim at optimizing a trade off between rule coverage and precision. The commonalities (and differences) between the approaches are showcased on a selection of best known variants of CSM, EPM and SD algorithms. The paper also provides a critical survey of existing supervised descriptive rule discovery visualization methods. Keywords: descriptive rules, rule learning, contrast set mining, emerging patterns, subgroup discovery
Author: Eugene Tuv, Alexander Borisov, George Runger, Kari Torkkola
Abstract: Predictive models benefit from a compact, non-redundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using tree-based ensembles to generate a compact subset of non-redundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach. Keywords: trees, resampling, importance, masking, residuals
4 0.3091844 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures
Author: Babak Shahbaba, Radford Neal
Abstract: We introduce a new nonlinear model for classification, in which we model the joint distribution of response variable, y, and covariates, x, non-parametrically using Dirichlet process mixtures. We keep the relationship between y and x linear within each component of the mixture. The overall relationship becomes nonlinear if the mixture contains more than one component, with different regression coefficients. We use simulated data to compare the performance of this new approach to alternative methods such as multinomial logit (MNL) models, decision trees, and support vector machines. We also evaluate our approach on two classification problems: identifying the folding class of protein sequences and detecting Parkinson’s disease. Our model can sometimes improve predictive accuracy. Moreover, by grouping observations into sub-populations (i.e., mixture components), our model can sometimes provide insight into hidden structure in the data. Keywords: mixture models, Dirichlet process, classification
5 0.16106912 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning
Author: Marc Boullé
Abstract: With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper, we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with data sets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time. Keywords: large scale learning, naive Bayes, Bayesianism, model selection, model averaging
6 0.16001973 15 jmlr-2009-Cautious Collective Classification
7 0.15501304 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
8 0.13757423 47 jmlr-2009-Learning Linear Ranking Functions for Beam Search with Application to Planning
9 0.13730307 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
10 0.13422613 67 jmlr-2009-Online Learning with Sample Path Constraints
11 0.13343774 23 jmlr-2009-Discriminative Learning Under Covariate Shift
12 0.13057235 11 jmlr-2009-Bayesian Network Structure Learning by Recursive Autonomy Identification
13 0.12968926 96 jmlr-2009-Transfer Learning for Reinforcement Learning Domains: A Survey
14 0.12872681 72 jmlr-2009-Polynomial-Delay Enumeration of Monotonic Graph Classes
15 0.128438 31 jmlr-2009-Evolutionary Model Type Selection for Global Surrogate Modeling
16 0.12519157 43 jmlr-2009-Java-ML: A Machine Learning Library (Machine Learning Open Source Software Paper)
17 0.12158549 74 jmlr-2009-Properties of Monotonic Effects on Directed Acyclic Graphs
18 0.12050835 63 jmlr-2009-On Efficient Large Margin Semisupervised Learning: Method and Theory
20 0.11607575 57 jmlr-2009-Multi-task Reinforcement Learning in Partially Observable Stochastic Environments
topicId topicWeight
[(8, 0.029), (11, 0.034), (19, 0.011), (26, 0.059), (38, 0.032), (47, 0.018), (52, 0.042), (55, 0.032), (58, 0.029), (65, 0.482), (66, 0.076), (68, 0.011), (90, 0.038), (96, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.80953747 91 jmlr-2009-Subgroup Analysis via Recursive Partitioning
Author: Xiaogang Su, Chih-Ling Tsai, Hansheng Wang, David M. Nickerson, Bogong Li
Abstract: Subgroup analysis is an integral part of comparative analysis where assessing the treatment effect on a response is of central interest. Its goal is to determine the heterogeneity of the treatment effect across subpopulations. In this paper, we adapt the idea of recursive partitioning and introduce an interaction tree (IT) procedure to conduct subgroup analysis. The IT procedure automatically facilitates a number of objectively defined subgroups, in some of which the treatment effect is found prominent while in others the treatment has a negligible or even negative effect. The standard CART (Breiman et al., 1984) methodology is inherited to construct the tree structure. Also, in order to extract factors that contribute to the heterogeneity of the treatment effect, variable importance measure is made available via random forests of the interaction trees. Both simulated experiments and analysis of census wage data are presented for illustration. Keywords: CART, interaction, subgroup analysis, random forests
Author: Junning Li, Z. Jane Wang
Abstract: In real world applications, graphical statistical models are not only a tool for operations such as classification or prediction, but usually the network structures of the models themselves are also of great interest (e.g., in modeling brain connectivity). The false discovery rate (FDR), the expected ratio of falsely claimed connections to all those claimed, is often a reasonable error-rate criterion in these applications. However, current learning algorithms for graphical models have not been adequately adapted to the concerns of the FDR. The traditional practice of controlling the type I error rate and the type II error rate under a conventional level does not necessarily keep the FDR low, especially in the case of sparse networks. In this paper, we propose embedding an FDR-control procedure into the PC algorithm to curb the FDR of the skeleton of the learned graph. We prove that the proposed method can control the FDR under a user-specified level at the limit of large sample sizes. In the cases of moderate sample size (about several hundred), empirical experiments show that the method is still able to control the FDR under the user-specified level, and a heuristic modification of the method is able to control the FDR more accurately around the user-specified level. The proposed method is applicable to any models for which statistical tests of conditional independence are available, such as discrete models and Gaussian models. Keywords: Bayesian networks, false discovery rate, PC algorithm, directed acyclic graph, skeleton
Author: Eugene Tuv, Alexander Borisov, George Runger, Kari Torkkola
Abstract: Predictive models benefit from a compact, non-redundant subset of features that improves interpretability and generalization. Modern data sets are wide, dirty, mixed with both numerical and categorical predictors, and may contain interactive effects that require complex models. This is a challenge for filters, wrappers, and embedded feature selection methods. We describe details of an algorithm using tree-based ensembles to generate a compact subset of non-redundant features. Parallel and serial ensembles of trees are combined into a mixed method that can uncover masking and detect features of secondary effect. Simulated and actual examples illustrate the effectiveness of the approach. Keywords: trees, resampling, importance, masking, residuals
4 0.23196933 85 jmlr-2009-Settable Systems: An Extension of Pearl's Causal Model with Optimization, Equilibrium, and Learning
Author: Halbert White, Karim Chalak
Abstract: Judea Pearl’s Causal Model is a rich framework that provides deep insight into the nature of causal relations. As yet, however, the Pearl Causal Model (PCM) has had a lesser impact on economics or econometrics than on other disciplines. This may be due in part to the fact that the PCM is not as well suited to analyzing structures that exhibit features of central interest to economists and econometricians: optimization, equilibrium, and learning. We offer the settable systems framework as an extension of the PCM that permits causal discourse in systems embodying optimization, equilibrium, and learning. Because these are common features of physical, natural, or social systems, our framework may prove generally useful for machine learning. Important features distinguishing the settable system framework from the PCM are its countable dimensionality and the use of partitioning and partition-specific response functions to accommodate the behavior of optimizing and interacting agents and to eliminate the requirement of a unique fixed point for the system. Refinements of the PCM include the settable systems treatment of attributes, the causal role of exogenous variables, and the dual role of variables as causes and responses. A series of closely related machine learning examples and examples from game theory and machine learning with feedback demonstrates some limitations of the PCM and motivates the distinguishing features of settable systems. Keywords: equations causal models, game theory, machine learning, recursive estimation, simultaneous
5 0.23160489 97 jmlr-2009-Ultrahigh Dimensional Feature Selection: Beyond The Linear Model
Author: Jianqing Fan, Richard Samworth, Yichao Wu
Abstract: Variable selection in high-dimensional space characterizes many contemporary problems in scientific discovery and decision making. Many frequently-used techniques are based on independence screening; examples include correlation ranking (Fan & Lv, 2008) or feature selection using a twosample t-test in high-dimensional classification (Tibshirani et al., 2003). Within the context of the linear model, Fan & Lv (2008) showed that this simple correlation ranking possesses a sure independence screening property under certain conditions and that its revision, called iteratively sure independent screening (ISIS), is needed when the features are marginally unrelated but jointly related to the response variable. In this paper, we extend ISIS, without explicit definition of residuals, to a general pseudo-likelihood framework, which includes generalized linear models as a special case. Even in the least-squares setting, the new method improves ISIS by allowing feature deletion in the iterative process. Our technique allows us to select important features in high-dimensional classification where the popularly used two-sample t-method fails. A new technique is introduced to reduce the false selection rate in the feature screening stage. Several simulated and two real data examples are presented to illustrate the methodology. Keywords: classification, feature screening, generalized linear models, robust regression, feature selection
6 0.23113394 32 jmlr-2009-Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions
7 0.22804999 70 jmlr-2009-Particle Swarm Model Selection (Special Topic on Model Selection)
8 0.22776125 3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning
9 0.22722776 75 jmlr-2009-Provably Efficient Learning with Typed Parametric Models
10 0.22582024 62 jmlr-2009-Nonlinear Models Using Dirichlet Process Mixtures
11 0.22503674 82 jmlr-2009-Robustness and Regularization of Support Vector Machines
12 0.22426298 93 jmlr-2009-The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models
13 0.22419825 87 jmlr-2009-Sparse Online Learning via Truncated Gradient
14 0.22414887 92 jmlr-2009-Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining
15 0.22208063 94 jmlr-2009-The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs
16 0.22202323 38 jmlr-2009-Hash Kernels for Structured Data
17 0.22117363 48 jmlr-2009-Learning Nondeterministic Classifiers
18 0.22062673 13 jmlr-2009-Bounded Kernel-Based Online Learning
19 0.22048025 100 jmlr-2009-When Is There a Representer Theorem? Vector Versus Matrix Regularizers
20 0.21909973 71 jmlr-2009-Perturbation Corrections in Approximate Inference: Mixture Modelling Applications