jmlr jmlr2012 jmlr2012-24 knowledge-graph by maker-knowledge-mining

24 jmlr-2012-Causal Bounds and Observable Constraints for Non-deterministic Models


Source: pdf

Author: Roland R. Ramsahai

Abstract: Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively. Keywords: instrumental variables, instrumental inequality, causal bounds, convex polytope, latent variables, directed acyclic graph

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. [sent-6, score-0.484]

2 The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. [sent-7, score-0.419]

3 A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. [sent-9, score-0.275]

4 Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively. [sent-11, score-1.123]

5 Keywords: instrumental variables, instrumental inequality, causal bounds, convex polytope, latent variables, directed acyclic graph 1. [sent-12, score-0.803]

6 Collections of latent conditional independencies may imply inequality constraints on parameters of the observable distribution. [sent-15, score-0.254]

7 To compute the constraints, Pearl (1995) defines the IV model as a deterministic counterfactual model (Rubin, 1974). [sent-24, score-0.292]

8 Without intervention data and further assumptions, the causal effect of B on C cannot be point identified (Durbin, 1954; Angrist et al. [sent-26, score-0.384]

9 , 1996), but, using the deterministic counterfactual model, it can be bounded with the joint distribution of A, B and C (Pearl, 1995; Robins, 1989; Manski, 1990). [sent-27, score-0.292]

10 Using the deterministic counterfactual approach and linear programming software developed by Balke (1995), the constraints on the causal effect of B on C were improved by Balke and Pearl (1997) and extended to other models by Kaufman et al. [sent-29, score-0.736]

11 This linear programming approach within a deterministic counterfactual model has become the standard tool for computing such constraints, with some exceptions (Geiger and Meek, 1998; Kang and Tian, 2006). [sent-31, score-0.292]

12 As a technical construct for computations, deterministic counterfactual models are widely accepted as valuable. [sent-32, score-0.292]

13 Applications of deterministic counterfactual models assume there are underlying deterministic relations (Angrist et al. [sent-33, score-0.436]

14 Even if an analyst is unaware of the type of mechanisms involved in their study, it would be desirable to avoid deterministic counterfactuals if alternative computations are no more difficult. [sent-36, score-0.265]

15 The method in §2 provides such an alternative, which is agnostic to whether the underlying mechanisms are probabilistic or deterministic, to deriving falsifiable constraints and causal bounds of the type previously described. [sent-37, score-0.599]

16 In this discussion, causal inference is formalized within standard decision theory (Spirtes et al. [sent-39, score-0.317]

17 Graphical models for representing causal assumptions are described in §4. [sent-45, score-0.365]

18 Nontrivial modifications of the computation technique are considered in §6, §7 and §8 to derive novel constraints and causal bounds when various assumptions in the IV model are weakened. [sent-46, score-0.586]

19 Example 1 Consider an IV model of partial compliance, where A ∈ {1, 2} is treatment assigned, B ∈ {0, 1} is treatment taken and C is an outcome of interest. [sent-47, score-0.267]

20 The counterfactual IV model involves counterfactual variables (B1 , B2 ) which represent a unit’s deterministic compliance behaviour when A is set to 1 (no treatment) or 2 (treatment) respectively. [sent-48, score-0.675]

21 Analyses of this model often make the monotonicity assumption B2 ≥ B1 , meaning that a unit which does not take treatment if assigned it, will never take it. [sent-49, score-0.368]

22 830 C ONSTRAINTS FOR N ON - DETERMINISTIC M ODELS The deterministic counterfactual framework only allows the compliance behaviour in Example 1 to be modelled as deterministic. [sent-50, score-0.494]

23 In the context of Example 1, monotonicity in the weaker model is equivalent to assuming that units are more likely to take treatment if assigned it than if not assigned it. [sent-53, score-0.428]

24 The counterfactual IV model in Example 1 uses the exclusion restriction assumption (Imbens and Angrist, 1994). [sent-54, score-0.617]

25 This assumption restricts C to be a deterministic function of compliance behaviour and treatment taken only. [sent-55, score-0.429]

26 Stochastic exclusion restrictions are considered within the deterministic counterfactual framework in Hirano et al. [sent-56, score-0.642]

27 In a weaker fully probabilistic model, the exclusion restriction assumption C ⊥ A | (B,U) is used in §2 to replicate results which were derived ⊥ under the stronger model (Balke and Pearl, 1993; Pearl, 1995; Balke and Pearl, 1997). [sent-58, score-0.494]

28 Whilst varying the strength of the exclusion restriction, novel constraints are derived in §7 with the probabilistic approach. [sent-59, score-0.503]

29 This allows a sensitivity analysis to the non-deterministic exclusion restriction, which is important when assumptions involve unobservable variables (Shepherd et al. [sent-60, score-0.43]

30 Another assumption in the IV model in Example 1 is that treatment assignment is independent of compliance behaviour. [sent-62, score-0.323]

31 Let V = Ξ(T ) and H be the convex hull of V , where T is the collection of extreme ˆ ˆ vertices of T . [sent-97, score-0.228]

32 0 0 0 1 0 0 0 1 Z Figure 1: Transformation of extreme vertices (top) of polytope (bottom). [sent-143, score-0.254]

33 The proof of Theorem 1 does not use the specific form of Ξ(·), only its monotonicity in each coordinate. [sent-145, score-0.224]

34 It is possible for the randomization or exclusion restriction assumption to fail without violation of any of the constraints in (1). [sent-149, score-0.686]

35 For example, if all v ∗ lie in H \V and randomization holds then the exclusion restriction in Equation (3) is not satisfied but all v ∈ H , which means that the inequalities in (1) are satisfied. [sent-151, score-0.559]

36 In the partial compliance model of Example 1, the vertex τ ∗ = (0, 0, 0, 0) corresponds to a unit which is classified as never recover (response is 0 regardless of treatment taken) and a never taker (treatment taken is 0 regardless of treatment assigned). [sent-160, score-0.442]

37 The probabilistic model is parameterised by τ ∗ over the entire polytope T whereas the counterfactual model is parameterised by τ ∗ only at the extreme vertices of the polytope T . [sent-161, score-0.65]

38 In special cases where latent determinism is realistic then such a parameterisation is meaningful and assumptions ˆ about the non-existence of certain vertices of the polytope or τ ∗ ∈ T can potentially be justified. [sent-162, score-0.396]

39 If latent determinism is known to be unrealistic (Aalen and Frigessi, 2007) and the reparameterisation is a technical construct then it may be wise to steer clear of any interpretation beyond simply saying that they are the vertices of the polytope defining the model. [sent-163, score-0.348]

40 The concepts are demonstrated in the reformulation of the monotonicity assumption in §8. [sent-164, score-0.224]

41 The deterministic counterfactual approach assumes latent determinism and interprets the vertices as having real meaning. [sent-165, score-0.515]

42 Under the deterministic interpretation, the monotonicity assumption implies that certain vertices are not valid for the model. [sent-166, score-0.435]

43 The probabilistic approach defines monotonicity as a constraint on the latent conditional distributions to lie in a particular half-space, still allowing probabilistic behaviour. [sent-167, score-0.363]

44 Causal Graphical Models The IV model considered so far, that is, without causal assumptions, is relatively simple. [sent-169, score-0.317]

45 Graphs that are useful for representing conditional independence and causal assumptions are described in §4. [sent-171, score-0.439]

46 1 Directed Acyclic Graph A purely probabilistic directed acyclic graph (DAG) (Lauritzen, 1996) consists of a set of vertices or nodes, N , and a set of directed edges, E . [sent-175, score-0.234]

47 , 1993; Pearl, 1993; Lauritzen, 2001; Dawid, 2002) in Figure 2 (right) are considered, where ACE(B → C) = α = P(C = 1 || B = 1) − P(C = 1 || B = 0) is the causal effect of interest. [sent-189, score-0.317]

48 The augmented DAGs which represent the IV model without randomization and the exclusion restriction are given in Figure 3. [sent-197, score-0.608]

49 ' FB c   E B E C    s d    T d   d   d    A U  FB $ c c   E B E C       T         A U  Figure 3: Augmented DAGs which represent the causal IV model without randomization (left) and without exclusion restriction (right). [sent-198, score-0.876]

50 The augmented DAG in Figure 2 (right) still applies under monotonicity since no extra conditional independences are assumed. [sent-200, score-0.312]

51 Key requirements are the monotonicity of the mapping and that the space of valid parameters is the convex hull of the transformed polytope. [sent-205, score-0.357]

52 1 Falsifiable Constraints Some applications, such as studies with partial compliance, require constraints involving the distribution P(C, B | A), whereas others can only identify the pairwise conditional distributions P(C | A) and P(B | A). [sent-211, score-0.229]

53 1, it may be necessary to obtain causal bounds in terms of the pairwise conditional distributions P(C | A) and P(B | A). [sent-219, score-0.45]

54 Similarly, constraints and causal bounds in terms of the identifiable ζcb. [sent-225, score-0.538]

55 This is because the model in Figure 3 (left) only makes assumptions about distributions in the observational regime and the regime with intervention on B, since it includes the regime indicator FB . [sent-236, score-0.259]

56 If there is data on P(C | A) and P(B | A) but not P(C | B, A) then constraints and bounds involving γ and θ are useful. [sent-238, score-0.249]

57 Assuming the exclusion restriction in Equation (3) still holds, τ ∗ still fully parameterises P(C, B | A,U). [sent-240, score-0.436]

58 The constraints are tight since the vertices of the convex hull are a subset of the vertices of the transformed polytope and any vertex is achievable if the value of U, corresponding to the vertex, occurs with probability one. [sent-246, score-0.551]

59 (6) Although the expression in (6) bounds the unobservable causal effect, there are no falsifiable constraints to invalidate the model. [sent-250, score-0.57]

60 If a sample from P(C, B | A) is available, the mapping τ ∗ −→ vi∗ can be used to compute observable constraints and causal bounds. [sent-252, score-0.518]

61 Relaxing the Exclusion Restriction in the Instrumental Variable Model The exclusion restriction assumption may often be inapplicable, for example, if patients in a study with partial compliance become aware of their treatment assignment and this affects their outcome. [sent-272, score-0.794]

62 The probabilistic nature of the exclusion restriction within the decision framework allows the strength of the direct relation to be varied and the sensitivity of inference to this assumption to be assessed. [sent-274, score-0.462]

63 A weaker alternative to the exclusion restriction assumption, C ⊥ A | (B,U), in the binary IV ⊥ model is 0 ≤ |η∗ − η∗ | ≤ ε for b = 0, 1, where η∗ = P(C = 1 | B = b, A = a,U) and 0 ≤ ε ≤ 1. [sent-275, score-0.468]

64 Figure 5: Transformation to the extreme vertices corresponding to the polytope which represents the IV model with the weaker exclusion restriction, for ε = 0. [sent-336, score-0.636]

65 The condition ε = 0 is equivalent to the exclusion restriction. [sent-338, score-0.35]

66 For ε = 1, there are no constraints on (η∗ , η∗ ) other than the axioms of probability and there are no falsifiable constraints or causal b1 b2 bounds for the IV model without the exclusion restriction. [sent-339, score-1.052]

67 The augmented DAG in Figure 3 (right) does not represent any assumptions about ε but assumptions about ε are required to obtain non-trivial constraints and bounds. [sent-340, score-0.272]

68 Use of the technique produces the causal bounds in Appendix D and the constraints ζ00. [sent-351, score-0.538]

69 2 ≤ 1, which is a weaker version of the ‘instrumental inequality’ of Equation (1) and can be violated if the IV model with the weak exclusion restriction, ε = 0. [sent-367, score-0.382]

70 By adding the component P(C | B, A,U) to v ∗ , causal bounds on P(C | A, FB = B) = ∑U P(C | B, A,U)P(U) can be derived for each A and used to compute bounds on ACE(B → C) since P(C | FB = B) = ∑A P(C | A, FB = B)P(A). [sent-369, score-0.505]

71 Monotonicity Assumption in the Instrumental Variable Model The monotonicity assumption in the literature (Imbens and Angrist, 1994; Angrist et al. [sent-386, score-0.224]

72 In a partial compliance study, a patient may be more likely to take treatment under assignment but it may not be reasonable to assume that their behaviour is deterministically related to treatment assignment. [sent-388, score-0.501]

73 A monotonicity assumption in a weaker probabilistic model is considered here and can be expressed mathematically for the binary IV model by δ∗ ≥ δ∗ , from Equation (2). [sent-389, score-0.282]

74 The IV model considered in this section includes the exclusion restriction and the randomization assumption, as in the augmented DAG in Figure 2 (right). [sent-391, score-0.608]

75 As an illustrative example, consider the computation of falsifiable constraints and causal bounds on ϕ given γ, without monotonicity, where ϕ = ACE(A → B) = θ01 − θ02 . [sent-392, score-0.538]

76 This makes sense since it is assumed that ϕ∗ ≥ 0 and B lies on the causal pathway from A to C. [sent-398, score-0.317]

77 The 11 01 12 02 01 02 6 •’s are the vertices which are not removed after assuming monotonicity and the 6 ◦’s, which correspond to dashes in Figure 6, are the vertices which are removed. [sent-402, score-0.424]

78 Figure 7 clearly demonstrates that the constraints without monotonicity are trivial whereas those with it are not. [sent-403, score-0.351]

79 To determine the effect of the monotonicity assumption on the constraints and bounds with (γ, θ ) in §5, the same mapping is used as in the derivation of the bivariate bounds on α but applied ˆ to the restricted T and T formed by removing the appropriate vertices. [sent-404, score-0.573]

80 The convex hull of the transformed polytope for the IV model with the monotonicity assumption is the region above the shaded surface and without the monotonicity assumption is the entire cuboid. [sent-407, score-0.672]

81 2 Vitamin A Supplementation Another example of partial compliance is the study of Vitamin A supplementation in northern Sumatra, described by Sommer and Zeger (1991). [sent-419, score-0.26]

82 Assumptions IV model IV, no randomization IV, partial exclusion restriction (ε = 0. [sent-442, score-0.594]

83 5) IV, monotonicity IV model IV, no randomization IV, partial exclusion restriction (ε = 0. [sent-443, score-0.818]

84 From Table 2, the imposition of the monotonicity assumption has no effect and is unnecessary for these data sets. [sent-461, score-0.224]

85 However the randomized treatment assignment is important since the bounds computed without randomization are very wide and not much can be inferred about the causal effect. [sent-462, score-0.682]

86 Even though the bounds are much wider for the Lipid Research Clinic Program (1984) data, under the partial exclusion restriction with ε = 0. [sent-463, score-0.565]

87 5, it can still be deduced that there is a positive causal effect. [sent-464, score-0.317]

88 Discussion The methods given here are applied while relaxing various assumptions that are often used in the deterministic counterfactual IV model. [sent-466, score-0.34]

89 By removing the assumption that there are latent deterministic mechanisms, it is shown that the same bounds and constraints are obtained and that the models are empirically equivalent §3. [sent-467, score-0.38]

90 The results for models which relax the randomization and exclusion restriction assumptions are valuable for sensitivity analyses. [sent-468, score-0.607]

91 In §7, the constraints and bounds were computed for the IV model with a partial exclusion restriction for ε = 0. [sent-470, score-0.692]

92 The ideas discussed can be extended to other models involving conditional independence since it is the factorization of the probability distribution which determines the algebraic structure of the polytope representing the model. [sent-474, score-0.227]

93 Causal Bounds for Instrumental Variable Model Without Exclusion Restriction For the binary IV model without the exclusion restriction in §7, for ε = 0. [sent-557, score-0.436]

94 Non-parametric bounds on causal effects from partial compliance data. [sent-670, score-0.656]

95 Mendelian randomisation as an instrumental variable approach to causal inference. [sent-734, score-0.485]

96 Analytic bounds on causal risk difference in directed acyclic graphs involving three observed binary variables. [sent-806, score-0.509]

97 The analysis of randomized and non-randomized AIDS treatment trials using a new approach to causal inference in logitudinal studies. [sent-885, score-0.433]

98 Alternative graphical causal models and the identification of direct effects. [sent-905, score-0.344]

99 Estimating causal effects of treatments in randomized and nonrandomized studies. [sent-911, score-0.352]

100 Recursive vs non-recursive systems: an attempt at synthesis (part I of a triptych on causal chain systems). [sent-944, score-0.317]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('exclusion', 0.35), ('fb', 0.327), ('causal', 0.317), ('iv', 0.314), ('monotonicity', 0.224), ('counterfactual', 0.181), ('compliance', 0.175), ('instrumental', 0.168), ('ace', 0.144), ('pearl', 0.142), ('constraints', 0.127), ('polytope', 0.125), ('amsahai', 0.125), ('falsi', 0.125), ('lipid', 0.125), ('randomization', 0.123), ('balke', 0.117), ('treatment', 0.116), ('deterministic', 0.111), ('vertices', 0.1), ('dawid', 0.097), ('angrist', 0.096), ('onstraints', 0.096), ('bounds', 0.094), ('counterfactuals', 0.087), ('didelez', 0.087), ('restriction', 0.086), ('dag', 0.078), ('determinism', 0.075), ('ramsahai', 0.075), ('clinic', 0.075), ('lauritzen', 0.075), ('eu', 0.07), ('hull', 0.067), ('intervention', 0.067), ('imbens', 0.064), ('nde', 0.062), ('vitamin', 0.062), ('robins', 0.058), ('odels', 0.056), ('cde', 0.05), ('rrde', 0.05), ('supplementation', 0.05), ('augmented', 0.049), ('latent', 0.048), ('regime', 0.048), ('assumptions', 0.048), ('indirect', 0.041), ('observable', 0.04), ('ba', 0.039), ('obey', 0.039), ('conditional', 0.039), ('directed', 0.038), ('aalen', 0.037), ('axioms', 0.037), ('christof', 0.037), ('coronary', 0.037), ('gawrilow', 0.037), ('kaufman', 0.037), ('mendelian', 0.037), ('polymake', 0.037), ('sommer', 0.037), ('spirtes', 0.037), ('fa', 0.037), ('independence', 0.035), ('effects', 0.035), ('partial', 0.035), ('mechanisms', 0.035), ('mapping', 0.034), ('relations', 0.033), ('ea', 0.033), ('dags', 0.033), ('assignment', 0.032), ('weaker', 0.032), ('acyclic', 0.032), ('parameterised', 0.032), ('analyst', 0.032), ('unobservable', 0.032), ('manski', 0.032), ('villages', 0.032), ('convex', 0.032), ('da', 0.031), ('transformation', 0.03), ('extreme', 0.029), ('involving', 0.028), ('assigned', 0.028), ('behaviour', 0.027), ('graphical', 0.027), ('polytopes', 0.027), ('probabilistic', 0.026), ('clinics', 0.025), ('durbin', 0.025), ('frigessi', 0.025), ('idle', 0.025), ('joswig', 0.025), ('loebel', 0.025), ('porta', 0.025), ('shepherd', 0.025), ('strotz', 0.025), ('vaccine', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 24 jmlr-2012-Causal Bounds and Observable Constraints for Non-deterministic Models

Author: Roland R. Ramsahai

Abstract: Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively. Keywords: instrumental variables, instrumental inequality, causal bounds, convex polytope, latent variables, directed acyclic graph

2 0.2122111 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies

Author: Xiaogang Su, Joseph Kang, Juanjuan Fan, Richard A. Levine, Xin Yan

Abstract: Assessing treatment effects in observational studies is a multifaceted problem that not only involves heterogeneous mechanisms of how the treatment or cause is exposed to subjects, known as propensity, but also differential causal effects across sub-populations. We introduce a concept termed the facilitating score to account for both the confounding and interacting impacts of covariates on the treatment effect. Several approaches for estimating the facilitating score are discussed. In particular, we put forward a machine learning method, called causal inference tree (CIT), to provide a piecewise constant approximation of the facilitating score. With interpretable rules, CIT splits data in such a way that both the propensity and the treatment effect become more homogeneous within each resultant partition. Causal inference at different levels can be made on the basis of CIT. Together with an aggregated grouping procedure, CIT stratifies data into strata where causal effects can be conveniently assessed within each. Besides, a feasible way of predicting individual causal effects (ICE) is made available by aggregating ensemble CIT models. Both the stratified results and the estimated ICE provide an assessment of heterogeneity of causal effects and can be integrated for estimating the average causal effect (ACE). Mean square consistency of CIT is also established. We evaluate the performance of proposed methods with simulations and illustrate their use with the NSW data in Dehejia and Wahba (1999) where the objective is to assess the impact of c 2012 Xiaogang Su, Joseph Kang, Juanjuan Fan, Richard A. Levine and Xin Yan. S U , K ANG , FAN , L EVINE AND YAN a labor training program, the National Supported Work (NSW) demonstration, on post-intervention earnings. Keywords: CART, causal inference, confounding, interaction, observational study, personalized medicine, recursive partitioning

3 0.1320928 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

Author: Alain Hauser, Peter Bühlmann

Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence defines a finer partitioning of DAGs than observational Markov equivalence and hence improves the identifiability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search

4 0.11236013 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

Author: Ioannis Tsamardinos, Sofia Triantafillou, Vincenzo Lagani

Abstract: We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low. The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causallyinspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org. Keywords: integrative causal analysis, causal discovery, Bayesian networks, maximal ancestral graphs, structural equation models, causality, statistical matching, data fusion

5 0.084680289 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer

Abstract: Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identifies linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufficient conditions for identifiability, a search algorithm that is complete, and a discussion of what can be done when the identifiability conditions are not satisfied. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAM challenges. The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) and Hyttinen et al. (2010). Keywords: causality, graphical models, randomized experiments, structural equation models, latent variables, latent confounders, cycles

6 0.037038632 54 jmlr-2012-Large-scale Linear Support Vector Regression

7 0.03521391 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

8 0.035020664 115 jmlr-2012-Trading Regret for Efficiency: Online Convex Optimization with Long Term Constraints

9 0.033150889 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity

10 0.032747239 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing

11 0.02960518 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

12 0.029164588 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

13 0.026885148 58 jmlr-2012-Linear Fitted-Q Iteration with Multiple Reward Functions

14 0.025516262 82 jmlr-2012-On the Necessity of Irrelevant Variables

15 0.02422039 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

16 0.024187073 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

17 0.023724509 13 jmlr-2012-Active Learning via Perfect Selective Classification

18 0.023174332 97 jmlr-2012-Regularization Techniques for Learning with Matrices

19 0.023143226 40 jmlr-2012-Exact Covariance Thresholding into Connected Components for Large-Scale Graphical Lasso

20 0.02277115 3 jmlr-2012-A Geometric Approach to Sample Compression


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.128), (1, 0.055), (2, 0.079), (3, -0.168), (4, 0.094), (5, 0.167), (6, -0.091), (7, 0.332), (8, 0.343), (9, 0.049), (10, -0.061), (11, -0.024), (12, 0.026), (13, 0.183), (14, 0.123), (15, -0.129), (16, -0.054), (17, 0.017), (18, -0.077), (19, 0.009), (20, -0.062), (21, -0.011), (22, 0.053), (23, -0.017), (24, -0.025), (25, -0.005), (26, 0.031), (27, -0.025), (28, -0.026), (29, 0.04), (30, 0.069), (31, 0.071), (32, -0.035), (33, 0.038), (34, -0.085), (35, -0.014), (36, 0.061), (37, -0.018), (38, 0.083), (39, 0.02), (40, -0.065), (41, -0.052), (42, 0.062), (43, -0.013), (44, -0.009), (45, -0.021), (46, -0.027), (47, 0.029), (48, -0.004), (49, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96624041 24 jmlr-2012-Causal Bounds and Observable Constraints for Non-deterministic Models

Author: Roland R. Ramsahai

Abstract: Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively. Keywords: instrumental variables, instrumental inequality, causal bounds, convex polytope, latent variables, directed acyclic graph

2 0.83555585 42 jmlr-2012-Facilitating Score and Causal Inference Trees for Large Observational Studies

Author: Xiaogang Su, Joseph Kang, Juanjuan Fan, Richard A. Levine, Xin Yan

Abstract: Assessing treatment effects in observational studies is a multifaceted problem that not only involves heterogeneous mechanisms of how the treatment or cause is exposed to subjects, known as propensity, but also differential causal effects across sub-populations. We introduce a concept termed the facilitating score to account for both the confounding and interacting impacts of covariates on the treatment effect. Several approaches for estimating the facilitating score are discussed. In particular, we put forward a machine learning method, called causal inference tree (CIT), to provide a piecewise constant approximation of the facilitating score. With interpretable rules, CIT splits data in such a way that both the propensity and the treatment effect become more homogeneous within each resultant partition. Causal inference at different levels can be made on the basis of CIT. Together with an aggregated grouping procedure, CIT stratifies data into strata where causal effects can be conveniently assessed within each. Besides, a feasible way of predicting individual causal effects (ICE) is made available by aggregating ensemble CIT models. Both the stratified results and the estimated ICE provide an assessment of heterogeneity of causal effects and can be integrated for estimating the average causal effect (ACE). Mean square consistency of CIT is also established. We evaluate the performance of proposed methods with simulations and illustrate their use with the NSW data in Dehejia and Wahba (1999) where the objective is to assess the impact of c 2012 Xiaogang Su, Joseph Kang, Juanjuan Fan, Richard A. Levine and Xin Yan. S U , K ANG , FAN , L EVINE AND YAN a labor training program, the National Supported Work (NSW) demonstration, on post-intervention earnings. Keywords: CART, causal inference, confounding, interaction, observational study, personalized medicine, recursive partitioning

3 0.71928203 114 jmlr-2012-Towards Integrative Causal Analysis of Heterogeneous Data Sets and Studies

Author: Ioannis Tsamardinos, Sofia Triantafillou, Vincenzo Lagani

Abstract: We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low. The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causallyinspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org. Keywords: integrative causal analysis, causal discovery, Bayesian networks, maximal ancestral graphs, structural equation models, causality, statistical matching, data fusion

4 0.66744572 25 jmlr-2012-Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs

Author: Alain Hauser, Peter Bühlmann

Abstract: The investigation of directed acyclic graphs (DAGs) encoding the same Markov property, that is the same conditional independence relations of multivariate observational distributions, has a long tradition; many algorithms exist for model selection and structure learning in Markov equivalence classes. In this paper, we extend the notion of Markov equivalence of DAGs to the case of interventional distributions arising from multiple intervention experiments. We show that under reasonable assumptions on the intervention experiments, interventional Markov equivalence defines a finer partitioning of DAGs than observational Markov equivalence and hence improves the identifiability of causal models. We give a graph theoretic criterion for two DAGs being Markov equivalent under interventions and show that each interventional Markov equivalence class can, analogously to the observational case, be uniquely represented by a chain graph called interventional essential graph (also known as CPDAG in the observational case). These are key insights for deriving a generalization of the Greedy Equivalence Search algorithm aimed at structure learning from interventional data. This new algorithm is evaluated in a simulation study. Keywords: causal inference, interventions, graphical model, Markov equivalence, greedy equivalence search

5 0.42797679 56 jmlr-2012-Learning Linear Cyclic Causal Models with Latent Variables

Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer

Abstract: Identifying cause-effect relationships between variables of interest is a central problem in science. Given a set of experiments we describe a procedure that identifies linear models that may contain cycles and latent variables. We provide a detailed description of the model family, full proofs of the necessary and sufficient conditions for identifiability, a search algorithm that is complete, and a discussion of what can be done when the identifiability conditions are not satisfied. The algorithm is comprehensively tested in simulations, comparing it to competing algorithms in the literature. Furthermore, we adapt the procedure to the problem of cellular network inference, applying it to the biologically realistic data of the DREAM challenges. The paper provides a full theoretical foundation for the causal discovery procedure first presented by Eberhardt et al. (2010) and Hyttinen et al. (2010). Keywords: causality, graphical models, randomized experiments, structural equation models, latent variables, latent confounders, cycles

6 0.18997197 54 jmlr-2012-Large-scale Linear Support Vector Regression

7 0.17728527 48 jmlr-2012-High-Dimensional Gaussian Graphical Model Selection: Walk Summability and Local Separation Criterion

8 0.14639418 86 jmlr-2012-Optimistic Bayesian Sampling in Contextual-Bandit Problems

9 0.13913107 82 jmlr-2012-On the Necessity of Irrelevant Variables

10 0.13042553 20 jmlr-2012-Analysis of a Random Forests Model

11 0.12961093 78 jmlr-2012-Nonparametric Guidance of Autoencoder Representations using Label Information

12 0.12657173 11 jmlr-2012-A Unifying Probabilistic Perspective for Spectral Dimensionality Reduction: Insights and New Models

13 0.12513179 21 jmlr-2012-Bayesian Mixed-Effects Inference on Classification Performance in Hierarchical Data Sets

14 0.12368633 14 jmlr-2012-Activized Learning: Transforming Passive to Active with Improved Label Complexity

15 0.12279891 32 jmlr-2012-Discriminative Hierarchical Part-based Models for Human Parsing and Action Recognition

16 0.12198079 108 jmlr-2012-Sparse and Unique Nonnegative Matrix Factorization Through Data Preprocessing

17 0.11964964 3 jmlr-2012-A Geometric Approach to Sample Compression

18 0.11605822 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

19 0.11140987 36 jmlr-2012-Efficient Methods for Robust Classification Under Uncertainty in Kernel Matrices

20 0.11002271 97 jmlr-2012-Regularization Techniques for Learning with Matrices


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.031), (26, 0.036), (29, 0.033), (35, 0.012), (49, 0.018), (56, 0.015), (57, 0.013), (69, 0.011), (75, 0.037), (77, 0.016), (79, 0.028), (91, 0.485), (92, 0.103), (96, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74278039 24 jmlr-2012-Causal Bounds and Observable Constraints for Non-deterministic Models

Author: Roland R. Ramsahai

Abstract: Conditional independence relations involving latent variables do not necessarily imply observable independences. They may imply inequality constraints on observable parameters and causal bounds, which can be used for falsification and identification. The literature on computing such constraints often involve a deterministic underlying data generating process in a counterfactual framework. If an analyst is ignorant of the nature of the underlying mechanisms then they may wish to use a model which allows the underlying mechanisms to be probabilistic. A method of computation for a weaker model without any determinism is given here and demonstrated for the instrumental variable model, though applicable to other models. The approach is based on the analysis of mappings with convex polytopes in a decision theoretic framework and can be implemented in readily available polyhedral computation software. Well known constraints and bounds are replicated in a probabilistic model and novel ones are computed for instrumental variable models without non-deterministic versions of the randomization, exclusion restriction and monotonicity assumptions respectively. Keywords: instrumental variables, instrumental inequality, causal bounds, convex polytope, latent variables, directed acyclic graph

2 0.27274334 67 jmlr-2012-Minimax-Optimal Rates For Sparse Additive Models Over Kernel Classes Via Convex Programming

Author: Garvesh Raskutti, Martin J. Wainwright, Bin Yu

Abstract: Sparse additive models are families of d-variate functions with the additive decomposition f ∗ = ∑ j∈S f j∗ , where S is an unknown subset of cardinality s ≪ d. In this paper, we consider the case where each univariate component function f j∗ lies in a reproducing kernel Hilbert space (RKHS), and analyze a method for estimating the unknown function f ∗ based on kernels combined with ℓ1 -type convex regularization. Working within a high-dimensional framework that allows both the dimension d and sparsity s to increase with n, we derive convergence rates in the L2 (P) and L2 (Pn ) norms over the class Fd,s,H of sparse additive models with each univariate function f j∗ in the unit ball of a univariate RKHS with bounded kernel function. We complement our upper bounds by deriving minimax lower bounds on the L2 (P) error, thereby showing the optimality of our method. Thus, we obtain optimal minimax rates for many interesting classes of sparse additive models, including polynomials, splines, and Sobolev classes. We also show that if, in contrast to our univariate conditions, the d-variate function class is assumed to be globally bounded, then much √ faster estimation rates are possible for any sparsity s = Ω( n), showing that global boundedness is a significant restriction in the high-dimensional setting. Keywords: sparsity, kernel, non-parametric, convex, minimax

3 0.27203318 34 jmlr-2012-Dynamic Policy Programming

Author: Mohammad Gheshlaghi Azar, Vicenç Gómez, Hilbert J. Kappen

Abstract: In this paper, we propose a novel policy iteration method, called dynamic policy programming (DPP), to estimate the optimal policy in the infinite-horizon Markov decision processes. DPP is an incremental algorithm that forces a gradual change in policy update. This allows us to prove finite-iteration and asymptotic ℓ∞ -norm performance-loss bounds in the presence of approximation/estimation error which depend on the average accumulated error as opposed to the standard bounds which are expressed in terms of the supremum of the errors. The dependency on the average error is important in problems with limited number of samples per iteration, for which the average of the errors can be significantly smaller in size than the supremum of the errors. Based on these theoretical results, we prove that a sampling-based variant of DPP (DPP-RL) asymptotically converges to the optimal policy. Finally, we illustrate numerically the applicability of these results on some benchmark problems and compare the performance of the approximate variants of DPP with some existing reinforcement learning (RL) methods. Keywords: approximate dynamic programming, reinforcement learning, Markov decision processes, Monte-Carlo methods, function approximation

4 0.27194214 111 jmlr-2012-Structured Sparsity and Generalization

Author: Andreas Maurer, Massimiliano Pontil

Abstract: We present a data dependent generalization bound for a large class of regularized algorithms which implement structured sparsity constraints. The bound can be applied to standard squared-norm regularization, the Lasso, the group Lasso, some versions of the group Lasso with overlapping groups, multiple kernel learning and other regularization schemes. In all these cases competitive results are obtained. A novel feature of our bound is that it can be applied in an infinite dimensional setting such as the Lasso in a separable Hilbert space or multiple kernel learning with a countable number of kernels. Keywords: empirical processes, Rademacher average, sparse estimation.

5 0.27167147 82 jmlr-2012-On the Necessity of Irrelevant Variables

Author: David P. Helmbold, Philip M. Long

Abstract: This work explores the effects of relevant and irrelevant boolean variables on the accuracy of classifiers. The analysis uses the assumption that the variables are conditionally independent given the class, and focuses on a natural family of learning algorithms for such sources when the relevant variables have a small advantage over random guessing. The main result is that algorithms relying predominately on irrelevant variables have error probabilities that quickly go to 0 in situations where algorithms that limit the use of irrelevant variables have errors bounded below by a positive constant. We also show that accurate learning is possible even when there are so few examples that one cannot determine with high confidence whether or not any individual variable is relevant. Keywords: feature selection, generalization, learning theory

6 0.27136588 64 jmlr-2012-Manifold Identification in Dual Averaging for Regularized Stochastic Online Learning

7 0.27099004 8 jmlr-2012-A Primal-Dual Convergence Analysis of Boosting

8 0.27041349 85 jmlr-2012-Optimal Distributed Online Prediction Using Mini-Batches

9 0.26985601 73 jmlr-2012-Multi-task Regression using Minimal Penalties

10 0.26899818 80 jmlr-2012-On Ranking and Generalization Bounds

11 0.26725519 13 jmlr-2012-Active Learning via Perfect Selective Classification

12 0.26717666 117 jmlr-2012-Variable Selection in High-dimensional Varying-coefficient Models with Global Optimality

13 0.26704448 7 jmlr-2012-A Multi-Stage Framework for Dantzig Selector and LASSO

14 0.26696485 71 jmlr-2012-Multi-Instance Learning with Any Hypothesis Class

15 0.26673752 46 jmlr-2012-Finite-Sample Analysis of Least-Squares Policy Iteration

16 0.26632279 26 jmlr-2012-Coherence Functions with Applications in Large-Margin Classification Methods

17 0.26429144 29 jmlr-2012-Consistent Model Selection Criteria on High Dimensions

18 0.26361078 68 jmlr-2012-Minimax Manifold Estimation

19 0.26356193 103 jmlr-2012-Sampling Methods for the Nyström Method

20 0.26306087 15 jmlr-2012-Algebraic Geometric Comparison of Probability Distributions