jmlr jmlr2010 jmlr2010-56 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
Reference: text
sentIndex sentText sentNum sentScore
1 The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. [sent-4, score-1.524]
2 The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. [sent-6, score-0.722]
3 This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. [sent-7, score-1.462]
4 Finding answers to questions about the mechanisms by which variables come to take on values, or predicting the value of a variable after some other variable has been manipulated, is characteristic of causal inference. [sent-13, score-0.744]
5 Even when experimental interventions are possible, performing the many thousands of experiments that would be required to discover causal relationships between thousands or tens of thousands of variables is often not practical. [sent-19, score-0.723]
6 The past 30 years has also seen a number of conceptual developments that are partial solutions to these causal inference problems, particularly in the area of graphical causal modeling. [sent-21, score-1.434]
7 The articles in the Special Topic on Causality (containing articles from 2007 to 2009) address these and other problems in making causal inferences. [sent-24, score-0.751]
8 Suppose an insurance company at time t wants to determine what rates to charge an individual for health insurance who has rwt = 1, bmit = 25, and sex = 0, and that this rate is partly based on the probability of the individual having a heart attack in the next 5 years. [sent-34, score-0.71]
9 This can be estimated by using the rate of heart attacks among the subpopulation matching the subject, that is rwt = 1, bmit = 25, sex = 0. [sent-35, score-0.794]
10 It is possible that people who drink an average of between 1 and 2 glasses of red wine per day for 5 years have lowered rates of heart attacks because of socio-economic factors that both cause average daily consumption of red wine and other life-style factors that prevent heart attacks. [sent-43, score-1.015]
11 2 Manipulated Probabilities In contrast to the previous case, suppose that an epidemiologist is deciding whether or not to recommend providing very strong incentives for adults to drink an average of 1 to 2 glasses of red wine per day in order to prevent heart attacks. [sent-51, score-0.582]
12 The density of heart attacks observationally conditional on drinking an average of 1 to 2 glasses of red wine per day is not the density relevant to answering this question, and the question of whether drinking red wine prevents heart attacks is crucial. [sent-53, score-1.33]
13 Suppose drinking red wine does not prevent heart attacks, but the heart attack rate is lower among moderate red wine drinkers because some socio-economic variable causes both moderate red wine drinking and other healthy life-styles choices that prevent heart attacks. [sent-54, score-1.363]
14 If the incentives are very effective, the density of heart attacks among people who would drink red wine after the incentives are in place is approximately equal to the density of heart attacks among people who are assigned to drink moderate amounts of red wine in an experimental study. [sent-58, score-1.573]
15 1646 I NTRODUCTION TO C AUSAL I NFERENCE rwt | rwt–5 = 1), which is the density of the variables in the subpopulation where rwt–5 = 1 because people have been observed to drink that amount of red wine, as in the unmanipulated population. [sent-72, score-0.843]
16 P(sex, bmit-5 , hat-5 , rwt-5 , bmit , hat , rwt || P’(rwt–5 = 1) = 1) is a density, so it is possible to form marginal and conditional probability densities from it. [sent-73, score-0.627]
17 For example, P(hat | bmit–5 = 25 || P’(rwt–5 = 1) = 1) is the probability of having had a heart attack between t–5 and t among people who have a bmi of 25 at t–5, everyone having been assigned to drink an average of 1 glass of red wine daily between t–10 and t–5. [sent-74, score-0.607]
18 5, in which case the resulting density is P(sex, bmit-5 , hat-5 , rwt-5 , bmit , hat , rwt || {P’(rwt–5 = 1) = 0. [sent-79, score-0.564]
19 The resulting density is P(sex, bmit-5 , hat-5 , rwt-5 , bmit , hat , rwt || {P’(rwt–5 = 0 | sex = 0) = 0. [sent-88, score-0.693]
20 1647 S PIRTES With causal inference, as with statistical inference, it is generally the case that in order to make inference tractable both computationally and statistically, simplifying assumptions are made. [sent-106, score-0.748]
21 One kind of simplifying assumption common to both statistical and causal inference is the assumption that the population distribution lies in some parametric family (for example, Gaussian) or that relationships between variables are exactly linear. [sent-107, score-0.877]
22 An example of a simplifying assumption unique to causal inference is that multiple causal mechanisms relating variables do not exactly cancel (Section 3). [sent-108, score-1.464]
23 Problem 2 is usually broken into two parts: finding a set of causal models from sample data, some manipulations (experiments) and background assumptions (Sections 3 and 4), and predicting the effects of a manipulation from a set of causal models (Section 3). [sent-110, score-1.747]
24 In some cases, the inferred causal models may contain unmeasured variables as well as measured variables. [sent-112, score-0.8]
25 Output: A set of causal models that is as small as possible, and contains a true causal model that contains at least the variables in O. [sent-117, score-1.44]
26 Problem 4: Predicting the Effects of Manipulations from Causal Models Input: An unmanipulated density P(O), a set C of causal models that contain at least the variables in O, a manipulation M, and sets X, Y ⊆ O. [sent-118, score-1.074]
27 The reason that the stated goal for the output of Problem 3 is a set of causal models, is that it is generally not possible to reliably find a true causal model given the inputs. [sent-122, score-1.386]
28 This has been a serious impediment to the improvement of algorithms for constructing causal models, because it makes evaluating the performance of such algorithms difficult. [sent-124, score-0.693]
29 For example, epidemiologists sometimes want to know what would the effect on heart attacks have been, if a manipulation such as assigning moderate drinking of red wine from t–10 to t–5, had been applied to the subpopulation which has not moderately drunk red wine from t–10 to t–5. [sent-139, score-1.012]
30 If the subpopulation that did not moderately drink red wine between t–10 and t–5 differs systematically from the rest of the population with respect to causes of heart attacks, the subpopulations’ response to being assigned to drink red wine would be different than the rest of the population. [sent-141, score-1.092]
31 One general approach is to assume that the value of red wine drinking between t–10 and t–5 contains information about hidden causes of red wine drinking that are also causes of heart attacks. [sent-146, score-0.843]
32 Problem 5: Counterfactual predictive modeling Input: An unmanipulated density P(O), a set C of causal models that contain at least the variables in O, a counterfactual manipulation M, and sets X, Y ⊆ O. [sent-147, score-1.15]
33 In the Special Topic on Causality in this journal, Shpitser and Pearl (2008) describes a solution to Problem 5 in the case where the causal graph is known, but may contain unmeasured common causes. [sent-149, score-0.787]
34 Causal Models This section describes several different kinds of commonly used causal models, and how to use them to calculate the effects of manipulations. [sent-151, score-0.793]
35 The next section describes search algorithms for discovering causal models. [sent-152, score-0.758]
36 A causal model with free parameters also specifies a set of probability densities over a given set of variables; however, in addition, for each manipulation that can be performed on the population it also specifies a set of post-manipulation probability densities over a given set of variables. [sent-155, score-1.161]
37 A causal model with free parameters together with the values of the free parameters is a causal model with fixed parameters; a causal model with fixed parameters is mapped to a single density given a specification of a manipulation. [sent-156, score-2.168]
38 Often, a causal model is specified in two parts: a statistical model, and a causal graph that describes the causal relations between variables. [sent-157, score-2.154]
39 The most frequently used causal models belong to two broad families: (1) causal Bayesian networks, (2) structural equation models. [sent-158, score-1.41]
40 The statistical setup for both causal Bayesian networks and structural equation models is a standard one. [sent-163, score-0.717]
41 The following sections describe the causal part of the model. [sent-167, score-0.693]
42 There are two conditions that are equivalent to the local directed Markov condition described below that are useful in causal inference: the global directed Markov condition, and factorization according to G, both of which are described next. [sent-171, score-0.801]
43 A DAG can also be used to represent causal relations between variables. [sent-187, score-0.727]
44 A is a direct cause of B relative to a set of variables V in a population when there exist two manipulations of V\{B} (that is, all the variables in V, except B, are manipulated to specific values) that differ only in the values assigned to A and that produce different probability densities of B. [sent-188, score-0.573]
45 A causal DAG G for a population contains an edge A → B iff A is a direct cause of B in the specified population. [sent-189, score-0.82]
46 In order to use samples from probability densities to make causal inferences, some assumptions relating causal relations to probability densities need to be made. [sent-190, score-1.658]
47 Causal Markov Assumption: For a causally sufficient set of variables V in a population N with density P(V), P(V) satisfies the local directed Markov condition for the causal DAG of N. [sent-193, score-1.075]
48 The importance of the manipulation rule is that if the causal DAG is known, and the unmanipulated density can be estimated from a sample, it allows the prediction of the effect of an unobserved manipulation. [sent-196, score-1.02]
49 Hence the manipulation rule is the solution to Problem 4, in the special case where the observed variables are causally sufficient, and the unique correct causal DAG is known. [sent-197, score-0.936]
50 In the Special Topic on Causality of this journal, Shpitser and Pearl (2008) describe an algorithm that has recently been developed and show that it is a complete solution to Problem 4 in the special case where a unique causal DAG is known (Shpitser and Pearl, 2006a,b; Huang and Valtorta, 2006). [sent-205, score-0.693]
51 In the Special Topic on Causality, Shpitser and Pearl (2008) describe for the first time an algorithm that is a complete solution to Problem 5 in the special case where a unique causal DAG is known, even if the set of observed variables is not causally sufficient. [sent-207, score-0.805]
52 Model Search Traditionally, there have been a number of different approaches to causal discovery. [sent-230, score-0.693]
53 The gold standard of causal discovery has typically been to perform planned or randomized experiments (Fisher, 1971). [sent-231, score-0.736]
54 Moreover, recent data collection techniques and causal inference problems raise several practical difficulties regarding the number of experiments that need to be performed in order to answer all of the outstanding questions. [sent-233, score-0.72]
55 In the absence of experiments, in practice (particularly in the social sciences) search for causal models is often informal, and based on a combination of background assumptions about causal relations together with statistical tests of the causal models. [sent-234, score-2.252]
56 This is further complicated by the fact that, as explained below, for reliable causal inference it is not sufficient to find one model that passes a statistical test; instead it is necessary to find all such models. [sent-241, score-0.72]
57 1 Underdetermination of Causal Models by Data Causal model (with fixed parameter) search is often broken into two parts: search for a causal graph, and estimation of the free parameters from sample data and the causal graph. [sent-245, score-1.476]
58 This section concentrates on the search for causal graphs, because the search for causal graphs is significantly different than the search for graphs that are to be used only as statistical models. [sent-249, score-1.521]
59 If the assumption of causal sufficiency of the observed variables is not made, all three kinds of equivalence classes have corresponding equivalence classes over the set of observed variables, and the problem of causal underdetermination becomes much more severe. [sent-276, score-1.606]
60 In a linear SEM it is assumed that each variable is a linear function of its causal parents and a unique error term; in a Gaussian SEM it is assumed in addition that the errors term are Gaussian. [sent-282, score-0.733]
61 If the oracle always gives correct answers, and the Causal Markov and Causal Faithfulness Assumptions hold, then the PC algorithm always outputs a Markov equivalence class that contains the true causal model, even though the algorithm does not check each directed acyclic graph. [sent-285, score-0.824]
62 The issue of multiple testing appears in Bayesian approaches to causal discovery as multiple causal model scoring. [sent-296, score-1.408]
63 Experimental evaluation shows significant improvements in the accuracy of argumentative over purely statistical tests, and improvements on the accuracy of causal 8. [sent-304, score-0.693]
64 1656 I NTRODUCTION TO C AUSAL I NFERENCE structure discovery from sampled data from randomly generated causal models and on real-world data sets. [sent-308, score-0.739]
65 (2010b) show that a general framework for localized causal membership structure learning is very accurate even in small sample situations and can thus be used as a first step for efficient global structure learning, as well as accurate prediction and feature selection. [sent-311, score-0.693]
66 It also provides extensive empirical comparisons of state of the art causal learning methods with non-causal methods for the above tasks. [sent-312, score-0.693]
67 In the Special Topic on Causality, Pellet and Elisseeff (2008) link the causal model search problem to a classic machine learning prediction problem. [sent-315, score-0.738]
68 They show how a generic feature-selection algorithm returning strongly relevant variables can be turned into a causal model search algorithm. [sent-316, score-0.768]
69 Ideally, the variables returned by a feature-selection algorithm identify those features of the causal graph. [sent-318, score-0.723]
70 3 Dealing with Underdetermination One possibility for dealing with the underdetermination of causal models by observational data is to strengthen the available information by sampling from manipulated densities, or in other words, performing experiments. [sent-322, score-0.982]
71 (This is Problem 4 in the case where the predictions are made from a set of causal models C rather than a single causal model, and the set of observed variables may not be causally sufficient. [sent-326, score-1.522]
72 That is, a prior probability is placed over each causal DAG G, and a posterior probability for each G is calculated. [sent-332, score-0.693]
73 For example, if the Markov equivalence class contains A → B ← C → D and A → B ← C ← D, then the two causal DAGs disagree about the effect of manipulating D on C, but agree about the effect of manipulating A on B. [sent-337, score-0.795]
74 Open Questions The following is an overview of important problems that remain in the domain of causal modeling. [sent-343, score-0.693]
75 Matching causal models and search algorithms to causal problems. [sent-345, score-1.455]
76 There are a wide variety of causal models that have been employed in different disciplines. [sent-346, score-0.717]
77 What kind of scores can be used to best evaluate causal models from various kinds of data? [sent-353, score-0.761]
78 How can search algorithms be improved to incorporate different kinds of background knowledge, search over different classes of causal models, run faster, handle more variables and larger sample sizes, be more reliable at small sample sizes, and produce output that is as informative as possible? [sent-357, score-0.899]
79 For causal search algorithms, what are their semantic and syntactic properties (for example, soundness, consistency, maximum informativeness)? [sent-360, score-0.738]
80 For precise definitions in the causal context, see Robins et al. [sent-370, score-0.693]
81 1658 I NTRODUCTION TO C AUSAL I NFERENCE Are there stronger assumptions that are plausible for some domains that might allow for stronger causal inferences? [sent-372, score-0.743]
82 Derivation of consequences from causal graph and unmanipulated densities. [sent-380, score-0.821]
83 Shpitser and Pearl have given complete algorithms for deriving the consequences of various causal models with hidden common causes in terms of the unmanipulated density and the given manipulation (Shpitser and Pearl, 2008). [sent-381, score-1.076]
84 Partial extensions of these results to deriving consequences from sets of causal models have been given (Zhang, 2008); are there further extensions to derivations from sets of causal models? [sent-382, score-1.41]
85 For example, in a linear SEM, if an unobserved variable T causes observed variables X 1 , X 2 , X 3 , X 4 , and there are no other causal relations among these variables, then there are no entailed conditional independence relations among just the observed variables X 1 , X 2 , X 3 , X 4 . [sent-386, score-0.973]
86 This information is useful in finding causal structure with unmeasured variables. [sent-388, score-0.746]
87 How is it possible to define new variables that are functions of the measured variables, but more useful for causal inference and more meaningful? [sent-395, score-0.75]
88 Applications of causal inference algorithms in many domains (Cooper and Glymour, 1999) help test and improve causal inference algorithms, suggest new problems, and produce domain knowledge. [sent-398, score-1.462]
89 What are the most appropriate performance measures for causal inference algorithms? [sent-401, score-0.72]
90 What is the best research design for testing causal inference algorithms? [sent-403, score-0.72]
91 Many different fields have studied causal discovery including Artificial Intelligence, Econometrics, Operations Research, Control Theory, and Statistics. [sent-406, score-0.715]
92 Local causal and Markov blanket induction for causal discovery and feature selection for classification, Part I: Algorithms and empirical evaluation. [sent-411, score-1.408]
93 Local causal and Markov blanket induction for causal discovery and feature selection for classification, Part II: Analysis and extensions. [sent-414, score-1.408]
94 Improving the reliability of causal discovery from small data sets using argumentation. [sent-421, score-0.715]
95 On the number of experiments sufficient and in the worst case necessary to identify all causal relations among n variables. [sent-437, score-0.727]
96 Active learning of causal networks with intervention experiments and optimal designs. [sent-444, score-0.716]
97 Identifiability in causal Bayesian networks: A sound and complete algorithm. [sent-458, score-0.693]
98 Markov properties for linear causal models with correlated errors. [sent-465, score-0.717]
99 Identification of joint interventional distributions in recursive semiMarkovian causal models. [sent-527, score-0.693]
100 An algorithm for fast recovery of sparse causal graphs. [sent-545, score-0.693]
wordName wordTfidf (topN-words)
[('causal', 0.693), ('rwt', 0.312), ('manipulated', 0.182), ('wine', 0.181), ('manipulation', 0.131), ('sex', 0.129), ('population', 0.127), ('drink', 0.122), ('dag', 0.122), ('unmanipulated', 0.107), ('densities', 0.105), ('dags', 0.1), ('bmit', 0.099), ('shpitser', 0.099), ('sem', 0.098), ('attacks', 0.094), ('drinking', 0.091), ('heart', 0.091), ('density', 0.089), ('causally', 0.082), ('counterfactual', 0.076), ('pirtes', 0.076), ('manipulations', 0.076), ('markov', 0.072), ('red', 0.072), ('sems', 0.072), ('causality', 0.071), ('spirtes', 0.071), ('incentives', 0.069), ('ntroduction', 0.069), ('subpopulation', 0.069), ('hat', 0.064), ('equivalence', 0.054), ('directed', 0.054), ('substantive', 0.053), ('ausal', 0.053), ('unmeasured', 0.053), ('faithfulness', 0.05), ('pearl', 0.049), ('conditional', 0.047), ('search', 0.045), ('observational', 0.045), ('kinds', 0.044), ('nference', 0.043), ('background', 0.042), ('people', 0.042), ('parents', 0.04), ('judea', 0.039), ('entailed', 0.039), ('underdetermination', 0.038), ('glymour', 0.038), ('topic', 0.038), ('effects', 0.036), ('clark', 0.035), ('relations', 0.034), ('independence', 0.034), ('cooper', 0.033), ('richardson', 0.032), ('causes', 0.032), ('aliferis', 0.032), ('peter', 0.031), ('drinkers', 0.03), ('causation', 0.03), ('variables', 0.03), ('moderate', 0.03), ('articles', 0.029), ('assumptions', 0.028), ('attack', 0.027), ('everyone', 0.027), ('inference', 0.027), ('glasses', 0.026), ('insurance', 0.026), ('bayesian', 0.025), ('pc', 0.025), ('models', 0.024), ('manipulating', 0.024), ('intervention', 0.023), ('ilya', 0.023), ('assigned', 0.023), ('acyclic', 0.023), ('besnard', 0.023), ('bromberg', 0.023), ('ethical', 0.023), ('hanks', 0.023), ('kalisch', 0.023), ('scheines', 0.023), ('tillman', 0.023), ('cov', 0.023), ('domains', 0.022), ('discovery', 0.022), ('daily', 0.022), ('day', 0.021), ('developments', 0.021), ('mechanisms', 0.021), ('randomized', 0.021), ('graph', 0.021), ('inferences', 0.02), ('describes', 0.02), ('kang', 0.02), ('greg', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999952 56 jmlr-2010-Introduction to Causal Inference
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classiÄ?Ĺš cation. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-deÄ?Ĺš ned sufÄ?Ĺš cient conditions. In a Ä?Ĺš rst set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distribuc 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS tions, types of classiÄ?Ĺš ers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we Ä?Ĺš nd that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show
3 0.27913588 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer
Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. SpeciÄ?Ĺš cally, we investigate the empirical convergence of GLL to the true local neighborhood as a function of sample size. Moreover, we study how predictivity improves with increasing sample size. Then we investigate how sensitive are the algorithms to multiple statistical testing, especially in the presence of many irrelevant features. Next we discuss the role of the algorithm parameters and also show that Markov blanket and causal graph concepts can be used to understand deviations from optimality of state-of-the-art non-causal algorithms. The present paper also introduces the following extensions to the core GLL framework: parallel and distributed versions of GLL algorithms, versions with false discovery rate control, strategies for constructing novel heuristics for speciÄ?Ĺš c domains, and divide-and-conquer local-to-global learning (LGL) strategies. We test the generality of the LGL approach by deriving a novel LGL-based algorithm that compares favorably c 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS to the state-of-the-art global learning algorithms. In addition, we investigate the use of non-causal feature selection methods to facilitate global learning. Open problems and future research paths related to local and local-to-global causal learning are discussed. Keywords: local causal discovery, Markov blanket induction, feature selection, classiÄ?Ĺš cation, causal structure learning, learning of Bayesian networks
5 0.054009311 63 jmlr-2010-Learning Instance-Specific Predictive Models
Author: Shyam Visweswaran, Gregory F. Cooper
Abstract: This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. Keywords: instance-specific, Bayesian network, Markov blanket, Bayesian model averaging
6 0.047914911 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide
7 0.043902837 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure
8 0.038795646 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine
9 0.034066916 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
10 0.033046257 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
11 0.030300664 33 jmlr-2010-Efficient Heuristics for Discriminative Structure Learning of Bayesian Network Classifiers
12 0.025185833 90 jmlr-2010-Permutation Tests for Studying Classifier Performance
13 0.02405061 99 jmlr-2010-Restricted Eigenvalue Properties for Correlated Gaussian Designs
14 0.023915932 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
15 0.02349033 6 jmlr-2010-A Rotation Test to Verify Latent Structure
16 0.023013849 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
17 0.021847278 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
18 0.021083852 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference
19 0.0210825 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library
20 0.019464204 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models
topicId topicWeight
[(0, -0.156), (1, 0.525), (2, 0.294), (3, -0.062), (4, 0.015), (5, -0.112), (6, 0.001), (7, -0.028), (8, 0.037), (9, 0.006), (10, -0.044), (11, -0.057), (12, 0.013), (13, -0.017), (14, 0.009), (15, 0.044), (16, 0.007), (17, -0.003), (18, 0.031), (19, 0.013), (20, 0.012), (21, -0.036), (22, -0.035), (23, -0.039), (24, 0.064), (25, 0.047), (26, -0.042), (27, 0.22), (28, 0.008), (29, 0.074), (30, 0.02), (31, 0.012), (32, 0.074), (33, -0.047), (34, 0.088), (35, -0.053), (36, 0.058), (37, -0.023), (38, 0.02), (39, 0.045), (40, -0.134), (41, -0.004), (42, -0.042), (43, -0.056), (44, 0.003), (45, -0.024), (46, -0.034), (47, -0.071), (48, -0.005), (49, -0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.96788752 56 jmlr-2010-Introduction to Causal Inference
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
2 0.8029148 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer
Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classiÄ?Ĺš cation. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-deÄ?Ĺš ned sufÄ?Ĺš cient conditions. In a Ä?Ĺš rst set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distribuc 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS tions, types of classiÄ?Ĺš ers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we Ä?Ĺš nd that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. SpeciÄ?Ĺš cally, we investigate the empirical convergence of GLL to the true local neighborhood as a function of sample size. Moreover, we study how predictivity improves with increasing sample size. Then we investigate how sensitive are the algorithms to multiple statistical testing, especially in the presence of many irrelevant features. Next we discuss the role of the algorithm parameters and also show that Markov blanket and causal graph concepts can be used to understand deviations from optimality of state-of-the-art non-causal algorithms. The present paper also introduces the following extensions to the core GLL framework: parallel and distributed versions of GLL algorithms, versions with false discovery rate control, strategies for constructing novel heuristics for speciÄ?Ĺš c domains, and divide-and-conquer local-to-global learning (LGL) strategies. We test the generality of the LGL approach by deriving a novel LGL-based algorithm that compares favorably c 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS to the state-of-the-art global learning algorithms. In addition, we investigate the use of non-causal feature selection methods to facilitate global learning. Open problems and future research paths related to local and local-to-global causal learning are discussed. Keywords: local causal discovery, Markov blanket induction, feature selection, classiÄ?Ĺš cation, causal structure learning, learning of Bayesian networks
5 0.19326124 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference
Author: Remco Bouckaert, Raymond Hemmecke, Silvia Lindner, Milan Studený
Abstract: The topic of the paper is computer testing of (probabilistic) conditional independence (CI) implications by an algebraic method of structural imsets. The basic idea is to transform (sets of) CI statements into certain integral vectors and to verify by a computer the corresponding algebraic relation between the vectors, called the independence implication. We interpret the previous methods for computer testing of this implication from the point of view of polyhedral geometry. However, the main contribution of the paper is a new method, based on linear programming (LP). The new method overcomes the limitation of former methods to the number of involved variables. We recall/describe the theoretical basis for all four methods involved in our computational experiments, whose aim was to compare the efficiency of the algorithms. The experiments show that the LP method is clearly the fastest one. As an example of possible application of such algorithms we show that testing inclusion of Bayesian network structures or whether a CI statement is encoded in an acyclic directed graph can be done by the algebraic method. Keywords: conditional independence inference, linear programming approach
6 0.17136775 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
7 0.15454642 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
8 0.14425144 69 jmlr-2010-Lp-Nested Symmetric Distributions
9 0.13831414 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models
10 0.13792025 28 jmlr-2010-Continuous Time Bayesian Network Reasoning and Learning Engine
11 0.13021544 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide
12 0.13006467 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
13 0.1268038 10 jmlr-2010-An Exponential Model for Infinite Rankings
14 0.11672875 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure
15 0.11610416 37 jmlr-2010-Evolving Static Representations for Task Transfer
16 0.11485625 6 jmlr-2010-A Rotation Test to Verify Latent Structure
17 0.11402004 63 jmlr-2010-Learning Instance-Specific Predictive Models
18 0.11288119 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
19 0.10929693 77 jmlr-2010-Model-based Boosting 2.0
20 0.10802867 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations
topicId topicWeight
[(4, 0.012), (8, 0.013), (21, 0.011), (30, 0.389), (32, 0.037), (33, 0.081), (36, 0.073), (37, 0.057), (75, 0.112), (81, 0.015), (85, 0.052), (96, 0.024), (97, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.66556919 56 jmlr-2010-Introduction to Causal Inference
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
2 0.39418098 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project
Author: Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten
Abstract: WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the software’s functionality, we review aspects of project management and historical development decisions that likely had an impact on the uptake of the project. Keywords: machine learning software, open source software
3 0.38843143 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer
Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. SpeciÄ?Ĺš cally, we investigate the empirical convergence of GLL to the true local neighborhood as a function of sample size. Moreover, we study how predictivity improves with increasing sample size. Then we investigate how sensitive are the algorithms to multiple statistical testing, especially in the presence of many irrelevant features. Next we discuss the role of the algorithm parameters and also show that Markov blanket and causal graph concepts can be used to understand deviations from optimality of state-of-the-art non-causal algorithms. The present paper also introduces the following extensions to the core GLL framework: parallel and distributed versions of GLL algorithms, versions with false discovery rate control, strategies for constructing novel heuristics for speciÄ?Ĺš c domains, and divide-and-conquer local-to-global learning (LGL) strategies. We test the generality of the LGL approach by deriving a novel LGL-based algorithm that compares favorably c 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS to the state-of-the-art global learning algorithms. In addition, we investigate the use of non-causal feature selection methods to facilitate global learning. Open problems and future research paths related to local and local-to-global causal learning are discussed. Keywords: local causal discovery, Markov blanket induction, feature selection, classiÄ?Ĺš cation, causal structure learning, learning of Bayesian networks
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classiÄ?Ĺš cation. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-deÄ?Ĺš ned sufÄ?Ĺš cient conditions. In a Ä?Ĺš rst set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distribuc 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS tions, types of classiÄ?Ĺš ers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we Ä?Ĺš nd that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show
6 0.34979773 63 jmlr-2010-Learning Instance-Specific Predictive Models
7 0.34838352 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
8 0.34794793 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
9 0.34580439 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
10 0.34535798 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
11 0.34236106 69 jmlr-2010-Lp-Nested Symmetric Distributions
12 0.34111971 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
13 0.34111542 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
14 0.34067589 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions
15 0.34058547 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
16 0.33964166 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization
17 0.3382723 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
18 0.33785191 104 jmlr-2010-Sparse Spectrum Gaussian Process Regression
19 0.33730918 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
20 0.33499315 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels