jmlr jmlr2010 jmlr2010-56 knowledge-graph by maker-knowledge-mining

56 jmlr-2010-Introduction to Causal Inference

Source: pdf

Author: Peter Spirtes

Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to ﬁnd a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classiﬁcation and prediction problems. Keywords: Bayesian networks, causation, causal inference

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. [sent-4, score-1.524]

2 The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. [sent-6, score-0.722]

3 This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classiﬁcation and prediction problems. [sent-7, score-1.462]

4 Finding answers to questions about the mechanisms by which variables come to take on values, or predicting the value of a variable after some other variable has been manipulated, is characteristic of causal inference. [sent-13, score-0.744]

5 Even when experimental interventions are possible, performing the many thousands of experiments that would be required to discover causal relationships between thousands or tens of thousands of variables is often not practical. [sent-19, score-0.723]

6 The past 30 years has also seen a number of conceptual developments that are partial solutions to these causal inference problems, particularly in the area of graphical causal modeling. [sent-21, score-1.434]

7 The articles in the Special Topic on Causality (containing articles from 2007 to 2009) address these and other problems in making causal inferences. [sent-24, score-0.751]

8 Suppose an insurance company at time t wants to determine what rates to charge an individual for health insurance who has rwt = 1, bmit = 25, and sex = 0, and that this rate is partly based on the probability of the individual having a heart attack in the next 5 years. [sent-34, score-0.71]

9 This can be estimated by using the rate of heart attacks among the subpopulation matching the subject, that is rwt = 1, bmit = 25, sex = 0. [sent-35, score-0.794]

10 It is possible that people who drink an average of between 1 and 2 glasses of red wine per day for 5 years have lowered rates of heart attacks because of socio-economic factors that both cause average daily consumption of red wine and other life-style factors that prevent heart attacks. [sent-43, score-1.015]

11 2 Manipulated Probabilities In contrast to the previous case, suppose that an epidemiologist is deciding whether or not to recommend providing very strong incentives for adults to drink an average of 1 to 2 glasses of red wine per day in order to prevent heart attacks. [sent-51, score-0.582]

12 The density of heart attacks observationally conditional on drinking an average of 1 to 2 glasses of red wine per day is not the density relevant to answering this question, and the question of whether drinking red wine prevents heart attacks is crucial. [sent-53, score-1.33]

13 Suppose drinking red wine does not prevent heart attacks, but the heart attack rate is lower among moderate red wine drinkers because some socio-economic variable causes both moderate red wine drinking and other healthy life-styles choices that prevent heart attacks. [sent-54, score-1.363]

14 If the incentives are very effective, the density of heart attacks among people who would drink red wine after the incentives are in place is approximately equal to the density of heart attacks among people who are assigned to drink moderate amounts of red wine in an experimental study. [sent-58, score-1.573]

15 1646 I NTRODUCTION TO C AUSAL I NFERENCE rwt | rwt–5 = 1), which is the density of the variables in the subpopulation where rwt–5 = 1 because people have been observed to drink that amount of red wine, as in the unmanipulated population. [sent-72, score-0.843]

16 P(sex, bmit-5 , hat-5 , rwt-5 , bmit , hat , rwt || P’(rwt–5 = 1) = 1) is a density, so it is possible to form marginal and conditional probability densities from it. [sent-73, score-0.627]

17 For example, P(hat | bmit–5 = 25 || P’(rwt–5 = 1) = 1) is the probability of having had a heart attack between t–5 and t among people who have a bmi of 25 at t–5, everyone having been assigned to drink an average of 1 glass of red wine daily between t–10 and t–5. [sent-74, score-0.607]

18 5, in which case the resulting density is P(sex, bmit-5 , hat-5 , rwt-5 , bmit , hat , rwt || {P’(rwt–5 = 1) = 0. [sent-79, score-0.564]

19 The resulting density is P(sex, bmit-5 , hat-5 , rwt-5 , bmit , hat , rwt || {P’(rwt–5 = 0 | sex = 0) = 0. [sent-88, score-0.693]

20 1647 S PIRTES With causal inference, as with statistical inference, it is generally the case that in order to make inference tractable both computationally and statistically, simplifying assumptions are made. [sent-106, score-0.748]

21 One kind of simplifying assumption common to both statistical and causal inference is the assumption that the population distribution lies in some parametric family (for example, Gaussian) or that relationships between variables are exactly linear. [sent-107, score-0.877]

22 An example of a simplifying assumption unique to causal inference is that multiple causal mechanisms relating variables do not exactly cancel (Section 3). [sent-108, score-1.464]

23 Problem 2 is usually broken into two parts: ﬁnding a set of causal models from sample data, some manipulations (experiments) and background assumptions (Sections 3 and 4), and predicting the effects of a manipulation from a set of causal models (Section 3). [sent-110, score-1.747]

24 In some cases, the inferred causal models may contain unmeasured variables as well as measured variables. [sent-112, score-0.8]

25 Output: A set of causal models that is as small as possible, and contains a true causal model that contains at least the variables in O. [sent-117, score-1.44]

26 Problem 4: Predicting the Effects of Manipulations from Causal Models Input: An unmanipulated density P(O), a set C of causal models that contain at least the variables in O, a manipulation M, and sets X, Y ⊆ O. [sent-118, score-1.074]

27 The reason that the stated goal for the output of Problem 3 is a set of causal models, is that it is generally not possible to reliably ﬁnd a true causal model given the inputs. [sent-122, score-1.386]

28 This has been a serious impediment to the improvement of algorithms for constructing causal models, because it makes evaluating the performance of such algorithms difﬁcult. [sent-124, score-0.693]

29 For example, epidemiologists sometimes want to know what would the effect on heart attacks have been, if a manipulation such as assigning moderate drinking of red wine from t–10 to t–5, had been applied to the subpopulation which has not moderately drunk red wine from t–10 to t–5. [sent-139, score-1.012]

30 If the subpopulation that did not moderately drink red wine between t–10 and t–5 differs systematically from the rest of the population with respect to causes of heart attacks, the subpopulations’ response to being assigned to drink red wine would be different than the rest of the population. [sent-141, score-1.092]

31 One general approach is to assume that the value of red wine drinking between t–10 and t–5 contains information about hidden causes of red wine drinking that are also causes of heart attacks. [sent-146, score-0.843]

32 Problem 5: Counterfactual predictive modeling Input: An unmanipulated density P(O), a set C of causal models that contain at least the variables in O, a counterfactual manipulation M, and sets X, Y ⊆ O. [sent-147, score-1.15]

33 In the Special Topic on Causality in this journal, Shpitser and Pearl (2008) describes a solution to Problem 5 in the case where the causal graph is known, but may contain unmeasured common causes. [sent-149, score-0.787]

34 Causal Models This section describes several different kinds of commonly used causal models, and how to use them to calculate the effects of manipulations. [sent-151, score-0.793]

35 The next section describes search algorithms for discovering causal models. [sent-152, score-0.758]

36 A causal model with free parameters also speciﬁes a set of probability densities over a given set of variables; however, in addition, for each manipulation that can be performed on the population it also speciﬁes a set of post-manipulation probability densities over a given set of variables. [sent-155, score-1.161]

37 A causal model with free parameters together with the values of the free parameters is a causal model with ﬁxed parameters; a causal model with ﬁxed parameters is mapped to a single density given a speciﬁcation of a manipulation. [sent-156, score-2.168]

38 Often, a causal model is speciﬁed in two parts: a statistical model, and a causal graph that describes the causal relations between variables. [sent-157, score-2.154]

39 The most frequently used causal models belong to two broad families: (1) causal Bayesian networks, (2) structural equation models. [sent-158, score-1.41]

40 The statistical setup for both causal Bayesian networks and structural equation models is a standard one. [sent-163, score-0.717]

41 The following sections describe the causal part of the model. [sent-167, score-0.693]

42 There are two conditions that are equivalent to the local directed Markov condition described below that are useful in causal inference: the global directed Markov condition, and factorization according to G, both of which are described next. [sent-171, score-0.801]

43 A DAG can also be used to represent causal relations between variables. [sent-187, score-0.727]

44 A is a direct cause of B relative to a set of variables V in a population when there exist two manipulations of V\{B} (that is, all the variables in V, except B, are manipulated to speciﬁc values) that differ only in the values assigned to A and that produce different probability densities of B. [sent-188, score-0.573]

45 A causal DAG G for a population contains an edge A → B iff A is a direct cause of B in the speciﬁed population. [sent-189, score-0.82]

46 In order to use samples from probability densities to make causal inferences, some assumptions relating causal relations to probability densities need to be made. [sent-190, score-1.658]

47 Causal Markov Assumption: For a causally sufﬁcient set of variables V in a population N with density P(V), P(V) satisﬁes the local directed Markov condition for the causal DAG of N. [sent-193, score-1.075]

48 The importance of the manipulation rule is that if the causal DAG is known, and the unmanipulated density can be estimated from a sample, it allows the prediction of the effect of an unobserved manipulation. [sent-196, score-1.02]

49 Hence the manipulation rule is the solution to Problem 4, in the special case where the observed variables are causally sufﬁcient, and the unique correct causal DAG is known. [sent-197, score-0.936]

50 In the Special Topic on Causality of this journal, Shpitser and Pearl (2008) describe an algorithm that has recently been developed and show that it is a complete solution to Problem 4 in the special case where a unique causal DAG is known (Shpitser and Pearl, 2006a,b; Huang and Valtorta, 2006). [sent-205, score-0.693]

51 In the Special Topic on Causality, Shpitser and Pearl (2008) describe for the ﬁrst time an algorithm that is a complete solution to Problem 5 in the special case where a unique causal DAG is known, even if the set of observed variables is not causally sufﬁcient. [sent-207, score-0.805]

52 Model Search Traditionally, there have been a number of different approaches to causal discovery. [sent-230, score-0.693]

53 The gold standard of causal discovery has typically been to perform planned or randomized experiments (Fisher, 1971). [sent-231, score-0.736]

54 Moreover, recent data collection techniques and causal inference problems raise several practical difﬁculties regarding the number of experiments that need to be performed in order to answer all of the outstanding questions. [sent-233, score-0.72]

55 In the absence of experiments, in practice (particularly in the social sciences) search for causal models is often informal, and based on a combination of background assumptions about causal relations together with statistical tests of the causal models. [sent-234, score-2.252]

56 This is further complicated by the fact that, as explained below, for reliable causal inference it is not sufﬁcient to ﬁnd one model that passes a statistical test; instead it is necessary to ﬁnd all such models. [sent-241, score-0.72]

57 1 Underdetermination of Causal Models by Data Causal model (with ﬁxed parameter) search is often broken into two parts: search for a causal graph, and estimation of the free parameters from sample data and the causal graph. [sent-245, score-1.476]

58 This section concentrates on the search for causal graphs, because the search for causal graphs is signiﬁcantly different than the search for graphs that are to be used only as statistical models. [sent-249, score-1.521]

59 If the assumption of causal sufﬁciency of the observed variables is not made, all three kinds of equivalence classes have corresponding equivalence classes over the set of observed variables, and the problem of causal underdetermination becomes much more severe. [sent-276, score-1.606]

60 In a linear SEM it is assumed that each variable is a linear function of its causal parents and a unique error term; in a Gaussian SEM it is assumed in addition that the errors term are Gaussian. [sent-282, score-0.733]

61 If the oracle always gives correct answers, and the Causal Markov and Causal Faithfulness Assumptions hold, then the PC algorithm always outputs a Markov equivalence class that contains the true causal model, even though the algorithm does not check each directed acyclic graph. [sent-285, score-0.824]

62 The issue of multiple testing appears in Bayesian approaches to causal discovery as multiple causal model scoring. [sent-296, score-1.408]

63 Experimental evaluation shows signiﬁcant improvements in the accuracy of argumentative over purely statistical tests, and improvements on the accuracy of causal 8. [sent-304, score-0.693]

64 1656 I NTRODUCTION TO C AUSAL I NFERENCE structure discovery from sampled data from randomly generated causal models and on real-world data sets. [sent-308, score-0.739]

65 (2010b) show that a general framework for localized causal membership structure learning is very accurate even in small sample situations and can thus be used as a ﬁrst step for efﬁcient global structure learning, as well as accurate prediction and feature selection. [sent-311, score-0.693]

66 It also provides extensive empirical comparisons of state of the art causal learning methods with non-causal methods for the above tasks. [sent-312, score-0.693]

67 In the Special Topic on Causality, Pellet and Elisseeff (2008) link the causal model search problem to a classic machine learning prediction problem. [sent-315, score-0.738]

68 They show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal model search algorithm. [sent-316, score-0.768]

69 Ideally, the variables returned by a feature-selection algorithm identify those features of the causal graph. [sent-318, score-0.723]

70 3 Dealing with Underdetermination One possibility for dealing with the underdetermination of causal models by observational data is to strengthen the available information by sampling from manipulated densities, or in other words, performing experiments. [sent-322, score-0.982]

71 (This is Problem 4 in the case where the predictions are made from a set of causal models C rather than a single causal model, and the set of observed variables may not be causally sufﬁcient. [sent-326, score-1.522]

72 That is, a prior probability is placed over each causal DAG G, and a posterior probability for each G is calculated. [sent-332, score-0.693]

73 For example, if the Markov equivalence class contains A → B ← C → D and A → B ← C ← D, then the two causal DAGs disagree about the effect of manipulating D on C, but agree about the effect of manipulating A on B. [sent-337, score-0.795]

74 Open Questions The following is an overview of important problems that remain in the domain of causal modeling. [sent-343, score-0.693]

75 Matching causal models and search algorithms to causal problems. [sent-345, score-1.455]

76 There are a wide variety of causal models that have been employed in different disciplines. [sent-346, score-0.717]

77 What kind of scores can be used to best evaluate causal models from various kinds of data? [sent-353, score-0.761]

78 How can search algorithms be improved to incorporate different kinds of background knowledge, search over different classes of causal models, run faster, handle more variables and larger sample sizes, be more reliable at small sample sizes, and produce output that is as informative as possible? [sent-357, score-0.899]

79 For causal search algorithms, what are their semantic and syntactic properties (for example, soundness, consistency, maximum informativeness)? [sent-360, score-0.738]

80 For precise deﬁnitions in the causal context, see Robins et al. [sent-370, score-0.693]

81 1658 I NTRODUCTION TO C AUSAL I NFERENCE Are there stronger assumptions that are plausible for some domains that might allow for stronger causal inferences? [sent-372, score-0.743]

82 Derivation of consequences from causal graph and unmanipulated densities. [sent-380, score-0.821]

83 Shpitser and Pearl have given complete algorithms for deriving the consequences of various causal models with hidden common causes in terms of the unmanipulated density and the given manipulation (Shpitser and Pearl, 2008). [sent-381, score-1.076]

84 Partial extensions of these results to deriving consequences from sets of causal models have been given (Zhang, 2008); are there further extensions to derivations from sets of causal models? [sent-382, score-1.41]

85 For example, in a linear SEM, if an unobserved variable T causes observed variables X 1 , X 2 , X 3 , X 4 , and there are no other causal relations among these variables, then there are no entailed conditional independence relations among just the observed variables X 1 , X 2 , X 3 , X 4 . [sent-386, score-0.973]

86 This information is useful in ﬁnding causal structure with unmeasured variables. [sent-388, score-0.746]

87 How is it possible to deﬁne new variables that are functions of the measured variables, but more useful for causal inference and more meaningful? [sent-395, score-0.75]

88 Applications of causal inference algorithms in many domains (Cooper and Glymour, 1999) help test and improve causal inference algorithms, suggest new problems, and produce domain knowledge. [sent-398, score-1.462]

89 What are the most appropriate performance measures for causal inference algorithms? [sent-401, score-0.72]

90 What is the best research design for testing causal inference algorithms? [sent-403, score-0.72]

91 Many different ﬁelds have studied causal discovery including Artiﬁcial Intelligence, Econometrics, Operations Research, Control Theory, and Statistics. [sent-406, score-0.715]

92 Local causal and Markov blanket induction for causal discovery and feature selection for classiﬁcation, Part I: Algorithms and empirical evaluation. [sent-411, score-1.408]

93 Local causal and Markov blanket induction for causal discovery and feature selection for classiﬁcation, Part II: Analysis and extensions. [sent-414, score-1.408]

94 Improving the reliability of causal discovery from small data sets using argumentation. [sent-421, score-0.715]

95 On the number of experiments sufﬁcient and in the worst case necessary to identify all causal relations among n variables. [sent-437, score-0.727]

96 Active learning of causal networks with intervention experiments and optimal designs. [sent-444, score-0.716]

97 Identiﬁability in causal Bayesian networks: A sound and complete algorithm. [sent-458, score-0.693]

98 Markov properties for linear causal models with correlated errors. [sent-465, score-0.717]

99 Identiﬁcation of joint interventional distributions in recursive semiMarkovian causal models. [sent-527, score-0.693]

100 An algorithm for fast recovery of sparse causal graphs. [sent-545, score-0.693]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('causal', 0.693), ('rwt', 0.312), ('manipulated', 0.182), ('wine', 0.181), ('manipulation', 0.131), ('sex', 0.129), ('population', 0.127), ('drink', 0.122), ('dag', 0.122), ('unmanipulated', 0.107), ('densities', 0.105), ('dags', 0.1), ('bmit', 0.099), ('shpitser', 0.099), ('sem', 0.098), ('attacks', 0.094), ('drinking', 0.091), ('heart', 0.091), ('density', 0.089), ('causally', 0.082), ('counterfactual', 0.076), ('pirtes', 0.076), ('manipulations', 0.076), ('markov', 0.072), ('red', 0.072), ('sems', 0.072), ('causality', 0.071), ('spirtes', 0.071), ('incentives', 0.069), ('ntroduction', 0.069), ('subpopulation', 0.069), ('hat', 0.064), ('equivalence', 0.054), ('directed', 0.054), ('substantive', 0.053), ('ausal', 0.053), ('unmeasured', 0.053), ('faithfulness', 0.05), ('pearl', 0.049), ('conditional', 0.047), ('search', 0.045), ('observational', 0.045), ('kinds', 0.044), ('nference', 0.043), ('background', 0.042), ('people', 0.042), ('parents', 0.04), ('judea', 0.039), ('entailed', 0.039), ('underdetermination', 0.038), ('glymour', 0.038), ('topic', 0.038), ('effects', 0.036), ('clark', 0.035), ('relations', 0.034), ('independence', 0.034), ('cooper', 0.033), ('richardson', 0.032), ('causes', 0.032), ('aliferis', 0.032), ('peter', 0.031), ('drinkers', 0.03), ('causation', 0.03), ('variables', 0.03), ('moderate', 0.03), ('articles', 0.029), ('assumptions', 0.028), ('attack', 0.027), ('everyone', 0.027), ('inference', 0.027), ('glasses', 0.026), ('insurance', 0.026), ('bayesian', 0.025), ('pc', 0.025), ('models', 0.024), ('manipulating', 0.024), ('intervention', 0.023), ('ilya', 0.023), ('assigned', 0.023), ('acyclic', 0.023), ('besnard', 0.023), ('bromberg', 0.023), ('ethical', 0.023), ('hanks', 0.023), ('kalisch', 0.023), ('scheines', 0.023), ('tillman', 0.023), ('cov', 0.023), ('domains', 0.022), ('discovery', 0.022), ('daily', 0.022), ('day', 0.021), ('developments', 0.021), ('mechanisms', 0.021), ('randomized', 0.021), ('graph', 0.021), ('inferences', 0.02), ('describes', 0.02), ('kang', 0.02), ('greg', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 56 jmlr-2010-Introduction to Causal Inference

Author: Peter Spirtes

2 0.38284513 67 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos

Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classiÄ?Ĺš cation. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-deÄ?Ĺš ned sufÄ?Ĺš cient conditions. In a Ä?Ĺš rst set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distribuc 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS tions, types of classiÄ?Ĺš ers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we Ä?Ĺš nd that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show

3 0.27913588 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity

Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer

Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identiﬁability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identiﬁable without prior knowledge of network structure. We propose computationally efﬁcient methods for estimating the model, as well as methods to assess the signiﬁcance of the causal inﬂuences. The model is successfully applied on ﬁnancial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality

4 0.22810929 68 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions

Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos

Abstract: In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. SpeciÄ?Ĺš cally, we investigate the empirical convergence of GLL to the true local neighborhood as a function of sample size. Moreover, we study how predictivity improves with increasing sample size. Then we investigate how sensitive are the algorithms to multiple statistical testing, especially in the presence of many irrelevant features. Next we discuss the role of the algorithm parameters and also show that Markov blanket and causal graph concepts can be used to understand deviations from optimality of state-of-the-art non-causal algorithms. The present paper also introduces the following extensions to the core GLL framework: parallel and distributed versions of GLL algorithms, versions with false discovery rate control, strategies for constructing novel heuristics for speciÄ?Ĺš c domains, and divide-and-conquer local-to-global learning (LGL) strategies. We test the generality of the LGL approach by deriving a novel LGL-based algorithm that compares favorably c 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS to the state-of-the-art global learning algorithms. In addition, we investigate the use of non-causal feature selection methods to facilitate global learning. Open problems and future research paths related to local and local-to-global causal learning are discussed. Keywords: local causal discovery, Markov blanket induction, feature selection, classiÄ?Ĺš cation, causal structure learning, learning of Bayesian networks

5 0.054009311 63 jmlr-2010-Learning Instance-Specific Predictive Models

Author: Shyam Visweswaran, Gregory F. Cooper

Abstract: This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-speciﬁc heuristic to locate a set of suitable models to average over. We call this method the instance-speciﬁc Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using ﬁve different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. Keywords: instance-speciﬁc, Bayesian network, Markov blanket, Bayesian model averaging

6 0.047914911 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

7 0.043902837 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure

8 0.038795646 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

9 0.034066916 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

10 0.033046257 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

11 0.030300664 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers

12 0.025185833 90 jmlr-2010-Permutation Tests for Studying Classifier Performance

13 0.02405061 99 jmlr-2010-Restricted Eigenvalue Properties for Correlated Gaussian Designs

14 0.023915932 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

15 0.02349033 6 jmlr-2010-A Rotation Test to Verify Latent Structure

16 0.023013849 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

17 0.021847278 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

18 0.021083852 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

19 0.0210825 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

20 0.019464204 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.156), (1, 0.525), (2, 0.294), (3, -0.062), (4, 0.015), (5, -0.112), (6, 0.001), (7, -0.028), (8, 0.037), (9, 0.006), (10, -0.044), (11, -0.057), (12, 0.013), (13, -0.017), (14, 0.009), (15, 0.044), (16, 0.007), (17, -0.003), (18, 0.031), (19, 0.013), (20, 0.012), (21, -0.036), (22, -0.035), (23, -0.039), (24, 0.064), (25, 0.047), (26, -0.042), (27, 0.22), (28, 0.008), (29, 0.074), (30, 0.02), (31, 0.012), (32, 0.074), (33, -0.047), (34, 0.088), (35, -0.053), (36, 0.058), (37, -0.023), (38, 0.02), (39, 0.045), (40, -0.134), (41, -0.004), (42, -0.042), (43, -0.056), (44, 0.003), (45, -0.024), (46, -0.034), (47, -0.071), (48, -0.005), (49, -0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96788752 56 jmlr-2010-Introduction to Causal Inference

Author: Peter Spirtes

2 0.8029148 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity

Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer

3 0.70748729 67 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos

4 0.61474717 68 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions

Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos

5 0.19326124 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

Author: Remco Bouckaert, Raymond Hemmecke, Silvia Lindner, Milan Studený

Abstract: The topic of the paper is computer testing of (probabilistic) conditional independence (CI) implications by an algebraic method of structural imsets. The basic idea is to transform (sets of) CI statements into certain integral vectors and to verify by a computer the corresponding algebraic relation between the vectors, called the independence implication. We interpret the previous methods for computer testing of this implication from the point of view of polyhedral geometry. However, the main contribution of the paper is a new method, based on linear programming (LP). The new method overcomes the limitation of former methods to the number of involved variables. We recall/describe the theoretical basis for all four methods involved in our computational experiments, whose aim was to compare the efﬁciency of the algorithms. The experiments show that the LP method is clearly the fastest one. As an example of possible application of such algorithms we show that testing inclusion of Bayesian network structures or whether a CI statement is encoded in an acyclic directed graph can be done by the algebraic method. Keywords: conditional independence inference, linear programming approach

6 0.17136775 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

7 0.15454642 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

8 0.14425144 69 jmlr-2010-Lp-Nested Symmetric Distributions

9 0.13831414 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models

10 0.13792025 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine

11 0.13021544 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide

12 0.13006467 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

13 0.1268038 10 jmlr-2010-An Exponential Model for Infinite Rankings

14 0.11672875 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure

15 0.11610416 37 jmlr-2010-Evolving Static Representations for Task Transfer

16 0.11485625 6 jmlr-2010-A Rotation Test to Verify Latent Structure

17 0.11402004 63 jmlr-2010-Learning Instance-Specific Predictive Models

18 0.11288119 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

19 0.10929693 77 jmlr-2010-Model-based Boosting 2.0

20 0.10802867 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.012), (8, 0.013), (21, 0.011), (30, 0.389), (32, 0.037), (33, 0.081), (36, 0.073), (37, 0.057), (75, 0.112), (81, 0.015), (85, 0.052), (96, 0.024), (97, 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.66556919 56 jmlr-2010-Introduction to Causal Inference

Author: Peter Spirtes

2 0.39418098 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project

Author: Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten

Abstract: WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the software’s functionality, we review aspects of project management and historical development decisions that likely had an impact on the uptake of the project. Keywords: machine learning software, open source software

3 0.38843143 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity

Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer

4 0.38520896 68 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions

Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos

5 0.38296098 67 jmlr-2010-Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos

6 0.34979773 63 jmlr-2010-Learning Instance-Specific Predictive Models

7 0.34838352 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

8 0.34794793 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

9 0.34580439 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

10 0.34535798 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

11 0.34236106 69 jmlr-2010-Lp-Nested Symmetric Distributions