jmlr jmlr2013 jmlr2013-41 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Randomized controlled experiments are often described as the most reliable tool available to scientists for discovering causal relationships among quantities of interest. However, it is often unclear how many and which different experiments are needed to identify the full (possibly cyclic) causal structure among some given (possibly causally insufficient) set of variables. Recent results in the causal discovery literature have explored various identifiability criteria that depend on the assumptions one is able to make about the underlying causal process, but these criteria are not directly constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions already exist in the combinatorics literature, albeit under terminology which is unfamiliar to most of the causal discovery community. In this paper we translate the theoretical results and apply them to the concrete problem of experiment selection. For a variety of settings we give explicit constructions of the optimal set of experiments and adapt some of the general combinatorics results to answer questions relating to the problem of experiment selection. Keywords: causality, randomized experiments, experiment selection, separating systems, completely separating systems, cut-coverings
Reference: text
sentIndex sentText sentNum sentScore
1 Causal Models, Experiments, and Identifiability We consider causal models which represent the relevant causal structure by a directed graph G = (V , D ), where V is the set of variables under consideration, and D ⊆ V × V is a set of directed edges among the variables. [sent-38, score-1.048]
2 In addition to the graph G, a fully specified causal model also needs to describe the causal processes that determine the value of each variable given its direct causes. [sent-41, score-0.75]
3 Assuming causal sufficiency, acyclicity and faithfulness, fewer experiments are needed: If one had started with an experiment only intervening on x, then a second single intervention experiment on y would be sufficient for unique identifiability. [sent-98, score-1.24]
4 If one had been lucky to start with an intervention on z then it turns out that one could have identified the true causal graph in this single experiment. [sent-103, score-1.083]
5 , EK } satisfies the unordered pair condition for an unordered pair of variables {xi , x j } ⊆ V whenever there is an experiment 3044 E XPERIMENT S ELECTION FOR C AUSAL D ISCOVERY (i) 89:; oo ? [sent-110, score-1.081]
6 , EK } satisfies the ordered pair condition for an ordered pair of variables (xi , x j ) ∈ V × V (with xi = x j ) whenever there is an experiment Ek = (Jk , Uk ) in {E1 , . [sent-150, score-0.727]
7 Since a single passive observation—a so-called null-experiment, / as J = 0—satisfies the covariance condition for all pairs of variables, under the stated assumptions the main challenge is to find experiments that satisfy the unordered pair condition for every pair of variables. [sent-163, score-0.944]
8 , EK } that satisfies the unordered pair condition for all pairs over n variables in V directly corresponds to a separating system over the variable 5. [sent-189, score-0.745]
9 3046 E XPERIMENT S ELECTION FOR C AUSAL D ISCOVERY set, while a set of experiments that satisfies the ordered pair condition for all ordered variable pairs corresponds to a completely separating system over the variables. [sent-191, score-0.78]
10 However, the edges representing direct causes in a causal graph G do not correspond to the edges representing the satisfaction of an ordered pair condition in the directed graph F. [sent-210, score-1.114]
11 Moreover, there is a difference in how the causal graph G is changed in light of an experiment E = (J , U ), and how the ordered pair graph F is changed in light of the (corresponding) cut E . [sent-212, score-0.92]
12 The experiment E results in the manipulated causal graph G′ , which is the same as the original causal graph G except that the edges into the variables in J are removed. [sent-213, score-0.975]
13 Similarly in the unordered case, the causal graph G (or its skeleton) must not be confused with the undirected graph H representing the unordered pairs that are not yet satisfied. [sent-216, score-1.31]
14 Bottom: Satisfication of the ordered pair condition: Directed double-lined edges indicate ordered pairs for which the ordered pair condition needs to be satisfied (graph (iv)). [sent-222, score-0.872]
15 Graph (i) gives the complete undirected graph H over {x, y, z} illustrating the three unordered pairs for which the unordered pair condition needs to be satisfied. [sent-226, score-1.065]
16 Graphs (ii) and (iii) show for which pairs the unordered pair condition is satisfied by a single intervention experiment on x (or y, respectively), that is, which pairs are in the cut (|). [sent-227, score-1.405]
17 Similarly, graph (iv) gives the complete directed graph F over the three variables, illustrating the six ordered pairs of variables for which the ordered pair condition needs to be satisfied. [sent-230, score-0.915]
18 Graphs (v-vii) show for which pairs the ordered pair condition is satisfied by a single intervention experiment on x (or y or z, respectively), that is which pairs are in the directed cut (|), while the others are again shown in gray. [sent-231, score-1.33]
19 The correspondence between the problem of finding experiments that satisfy the pair conditions on the one hand and finding separating systems or cut-coverings on the other, allows us to tap into the results in combinatorics to inform the selection of experiments in the causal discovery problem. [sent-233, score-0.813]
20 Many of the constructions of intervention sets J1 , . [sent-259, score-0.723]
21 That is, the i:th index set simply lists the indexes of the intervention sets that include the variable xi . [sent-266, score-0.826]
22 Clearly, K experiments (or separating sets) over n variables can be defined either in terms of the intervention sets J1 , . [sent-267, score-0.925]
23 1 Satisfying the Unordered Pair Condition The earliest results (that we are aware of) relevant to finding minimal sets of experiments satisfying the unordered pair condition are given in R´ nyi (1961)7 in the terminology of separating systems. [sent-276, score-0.83]
24 , JK } satisfy the unordered pair condition for all unordered pairs of variables if and only if the corresponding index sets I1 , . [sent-285, score-1.123]
25 Since the unordered pair condition is satisfied for the pair {xi , x j }, there is an experiment Ek where xi is intervened on and x j is not, or x j is intervened and xi is not. [sent-297, score-0.903]
26 The connection between antichains and completely separating systems (that is, satisfying the ordered pair condition) is then the following: Lemma 10 (Index sets form an antichain) The intervention sets {J1 , . [sent-318, score-1.18]
27 The one additional experiment sometimes required in this case derives from the need to satisfy the ordered pair condition, or unordered pair condition and the covariance condition, for each pair of variables, as discussed in Section 2. [sent-329, score-1.007]
28 3050 E XPERIMENT S ELECTION FOR C AUSAL D ISCOVERY Figure 4: Sufficient and necessary number of experiments needed to satisfy the unordered (blue, lower solid line) and the ordered (red, upper solid line) pair condition for models of different sizes (in log-scale). [sent-331, score-0.799]
29 For example, for a model with 100 variables, 7 experiments are needed to satisfy the unordered pair condition, while 9 experiments are needed to satisfy the ordered pair condition. [sent-333, score-0.954]
30 , K} and translate the index sets into intervention sets J1 , . [sent-344, score-0.819]
31 For graphs over three variables, the graph F (for the ordered pair condition) in Figure 2 (bottom, left) illustrates the point: For a directed cut-covering of the six directed edges, c(3) = 3 directed cuts are necessary and sufficient. [sent-368, score-0.797]
32 In some cases, the experiments might require a simultaneous intervention on half of the variables, but of course such experiments will in many scientific contexts not be feasible. [sent-375, score-0.804]
33 In this section we consider a generalization of the problem of finding the optimal set of experiments that can take into account additional constraints on the size of the intervention sets. [sent-376, score-0.765]
34 Given n variables and K experiments, find intervention sets J1 , . [sent-378, score-0.732]
35 Given n variables and a maximum allowed intervention set size r, find the minimum number of experiments m(n, r) for which there exists intervention sets J1 , . [sent-383, score-1.537]
36 The algorithms presented here can also be used to construct intervention sets that satisfy the bounds discussed in Section 4. [sent-393, score-0.716]
37 Finding a design which minimizes the average intervention set size is straightforward once one considers the problem in terms of the index sets. [sent-397, score-0.807]
38 k=1 i=1 (3) This identity, together with Lemma 8, implies that to obtain intervention sets with minimum average size, it is sufficient to find the n smallest distinct subsets of {1, . [sent-401, score-0.746]
39 Since the sum of the intervention set sizes is the same as the sum of the index set sizes (Equation 3), the average intervention set size obtained is meanK |Jk | = k=1 1 K 1 n 1 ∑ | Jk | = K ∑ | I i | = K K k=1 i=1 l−1 ∑j j=0 K + l(n − t) . [sent-409, score-1.501]
40 j (5) For K experiments this is the minimum average intervention set size possible that satisfies the unordered pair condition for all pairs among n variables. [sent-410, score-1.387]
41 For the case of n = 7 variables and K = 4 experiments, Figure 7 provides an example of the construction of intervention sets with minimal average size. [sent-411, score-0.795]
42 The size of an arbitrary intervention set Jk is equal to the number of index sets that contain the index k. [sent-445, score-0.915]
43 We say that index sets are selected fairly when the corresponding intervention sets satisfy |Jk | − |Jk′ | ≤ 1 ∀k, k′ . [sent-446, score-0.845]
44 (6) In the construction that minimizes the average intervention set size, the index sets I1 , . [sent-447, score-0.806]
45 k=1 k=1 (7) Thus, if the construction of index sets is fair, then both the minimum average and the smallest maximum intervention set size is achieved, simultaneously solving both Problem 1(a) and 1(b). [sent-457, score-0.881]
46 Note that in general it is possible to select index sets such that the maximum intervention set size is not minimized, even though the average is minimal (for example, if I7 had been {1, 4} in Figure 7). [sent-459, score-0.906]
47 3055 H YTTINEN , E BERHARDT AND H OYER Algorithm 2 Constructs K intervention sets that satisfy the unordered pair condition for all pairs among n variables with a minimum average and smallest maximum intervention set size. [sent-473, score-2.059]
48 FairUnordered( n, K ) Determine the maximum index set size l from Equation 4, if no such l exists, then the unordered pair condition cannot be satisfied for n variables with K experiments. [sent-474, score-0.719]
49 Algorithm 3 Constructs K intervention sets that satisfy the ordered pair condition for all ordered pairs among n variables and approximates (and sometimes achieves) the minimum average and smallest maximum intervention size. [sent-492, score-2.029]
50 1 without a constraint on the maximum intervention set size result in intervention sets with no more than n/2 variables. [sent-508, score-1.404]
51 For any practical values of n and r, the value of m(n, r) can be found by simply evaluating Equations 4, 5 and 7 for different values of K (starting from the lower bound in 9), so as to find the smallest value of K for which the maximum intervention set size is smaller than or equal to r. [sent-509, score-0.714]
52 2 Limiting the Intervention Set Size for the Ordered Pair Condition Recall from Lemma 10 that satisfaction of the ordered pair condition requires that the n index sets of a set of K experiments form an antichain over {1, . [sent-512, score-0.753]
53 Thus, no matter whether we seek to minimize the average or the maximum intervention set size, we have to ensure that the index sets form an antichain. [sent-516, score-0.829]
54 We begin by considering the (now ordered versions of) Problems 1(a) and 3056 E XPERIMENT S ELECTION FOR C AUSAL D ISCOVERY Figure 8: Satisfying the unordered pair condition while limiting the intervention set sizes. [sent-517, score-1.336]
55 Top: Lowest achievable average intervention set sizes for models with n variables using K experiments. [sent-518, score-0.738]
56 The lowest achievable maximum intervention set size is the ceiling of the average intervention set size shown in the figure. [sent-519, score-1.446]
57 Blank areas are uninteresting, since the average intervention set size can be lowered here by including irrelevant passive observational (null-)experiments. [sent-521, score-0.842]
58 Bottom: The number of experiments needed for n = 1024 variables, with a limit r on the maximum allowed intervention set size. [sent-522, score-0.753]
59 3057 H YTTINEN , E BERHARDT AND H OYER 1(b): Given n and K, we want to specify experiments minimizing either the average or maximum intervention set size. [sent-523, score-0.774]
60 Thus, a simple approach to attempt to minimize the maximum intervention set size (that is, solve Problem 1 (b)) is to select the n index sets all with sizes l and use Algorithm 3, exploiting Algorithm 1, to construct a fair set of index sets. [sent-528, score-0.957]
61 This construction is not fully optimal in all cases because all sets are chosen with size l while in some cases a smaller maximum intervention set size is achievable by combining index sets of different sizes. [sent-529, score-0.912]
62 It is easily seen that Algorithm 3 will generate sets of experiments that have an average and a maximum intervention set size of meanK |Jk | = k=1 maxK |Jk | = k=1 1 K 1 n n·l ∑ | Jk | = K ∑ | I i | = K , K k=1 i=1 meanK |Jk | = k=1 n·l . [sent-530, score-0.843]
63 K (11) (12) Figure 9 (top) shows the maximum intervention set size in the output of Algorithm 3 for several values of n and K. [sent-531, score-0.714]
64 11 While this scheme for solving the directed case of Problem 1(b) is quite good in practice, we are not aware of any efficient scheme that is always guaranteed to minimize the maximum intervention set size. [sent-533, score-0.807]
65 Note that flatness requires a selection of index sets that are themselves close in size, while fairness (Equation 6) requires a selection of index sets such that the intervention sets are close in size. [sent-541, score-0.948]
66 But, for example, when n = 30 and K = 8, Algorithm 3 results in a maximum intervention set size of ⌈ 30·3 ⌉ = 12, 8 although for this case Algorithm 4 can be used to construct suitable intervention sets with at most 11 members. [sent-543, score-1.404]
67 3058 E XPERIMENT S ELECTION FOR C AUSAL D ISCOVERY Figure 9: Satisfying the ordered pair condition while limiting the size of the intervention sets. [sent-544, score-1.029]
68 Bottom: Number of experiments needed to satisfy the ordered pair condition for n = 1024 variables with a limit r on the maximum intervention set size. [sent-548, score-1.159]
69 12 Since the sum of the index set sizes is identical to the sum of the intervention set sizes, Theorem 12 shows that whenever the ordered pair condition can be satisfied, the average intervention set size can be minimized by a set of flat index sets. [sent-550, score-1.915]
70 Thus, Algorithm 4 returns a set of index sets that minimize the average intervention set size, solving (the directed version of) Problem 1 (a). [sent-569, score-0.917]
71 Selecting 16 index sets of size 3 (up to {3, 4, 6}, in bold) and 14 index sets of size 2 (starting from {5, 6}, in bold), gives a total of 30 index sets and achieves the lowest possible average intervention set size for the given n and K. [sent-574, score-1.189]
72 If we were to select 17 index sets of size 3 (up to {1, 5, 6}), we could select 13 index sets of size 2 (from {1, 7}), and find 30 index sets, but the average intervention set size would then be 1/8 higher. [sent-577, score-1.135]
73 Algorithm 4 Obtains a set of K intervention sets satisfying the ordered pair condition for all ordered pairs among n variables that minimizes the average intervention set size. [sent-578, score-1.963]
74 Trivially, the ceiling of the minimum average intervention set size for n variables in K experiments gives a lower bound on the lowest maximum intervention set size, that is, for Problem 1 (b). [sent-601, score-1.544]
75 3061 H YTTINEN , E BERHARDT AND H OYER Problem 2 reverses the free parameters and asks for the minimum number of experiments m(n, r) given a limit r on the maximum size of any intervention set. [sent-603, score-0.805]
76 2 (15) With input K = ⌈2n/r⌉, Algorithm 3 generates intervention sets of at most size r (see Appendix B)— this verifies that m(n, r) ≤ ⌈2n/r⌉. [sent-605, score-0.725]
77 Cai’s result also gives an exact minimum number of experiments when the maximum intervention set size has to be small (see Figure 9 (bottom)). [sent-606, score-0.805]
78 It can also be used to construct a lower bound on the maximum intervention set size when the number of experiments K is given: If m(n, r) > K for some r and K, then the maximum intervention set size given n variables and K experiments must be at least r + 1. [sent-607, score-1.618]
79 2 (16) Again the upper bound can be easily verified: With the upper bound K as input, Algorithm 3 will generate intervention sets of at most size r (see Appendix C). [sent-611, score-0.725]
80 In many cases we can get an improved lower bound on m(n, r) using Algorithm 4 (which optimally minimizes the average number of interventions per experiment, for given n and K): Find the smallest K such that Algorithm 4 returns intervention sets with an average size less than r. [sent-613, score-0.789]
81 In this case we know that the minimum number of experiments given a maximum intervention set size of r must be at least K (see Figure 9 (bottom)). [sent-614, score-0.805]
82 (1998) have considered the problem equivalent to finding a set of experiments where instead of a limited maximum intervention set size, the intervention sets are constrained to have exactly some given size. [sent-616, score-1.443]
83 Algorithm 5 calls a graph coloring method (in the code package we use the simple approximation algorithm by Welsh and Powell, 1967) and constructs intervention sets based on the graph coloring. [sent-644, score-0.923]
84 3063 H YTTINEN , E BERHARDT AND H OYER Algorithm 5 Constructs a set of intervention sets satisfying the unordered pair condition for a given arbitrary set of unordered pairs represented by an undirected graph H over n variables. [sent-651, score-1.755]
85 Graph (iii) illustrates the remaining pairs for which the unordered pair condition is not satisfied given the MEC in (ii), and graph (iv) shows that a single intervention on y resolves these pairs, that is, it provides a cut-covering. [sent-687, score-1.344]
86 Given background knowledge obtained from a passive observational data set of graph (i), graph (v) shows the ordered pairs for which the ordered pair condition remains unsatisfied. [sent-688, score-0.951]
87 But for simple cases it can still be done: Given background knowledge derived from a passive observational data set over graph (i) in Figure 11, graph (v) is the directed graph F indicating the ordered pairs for which the ordered pair condition remains unsatisfied. [sent-691, score-1.166]
88 29) gave an upper bound on the minimum number of experiments sufficient to satisfy the unordered pair condition when the maximum intervention set size was restricted. [sent-702, score-1.355]
89 2 we noted that for the ordered pair condition we are not aware of a general algorithm that generates intervention sets for which the maximum intervention set size is minimized. [sent-710, score-1.759]
90 We also do not know whether the maximum and average intervention set size can be minimized simultaneously (as we showed is possible for the unordered pair condition in Section 5. [sent-711, score-1.259]
91 More generally, the type of background knowledge we considered in Section 6 may have to be integrated into a search procedure that is subject to constraints on the size of the intervention sets. [sent-714, score-0.75]
92 We used these results to specify the minimum number of experiments necessary and sufficient for identifiability when there is no background knowledge (Section 4), when there are limitations on the size of the intervention sets (Section 5), and when background knowledge is available (Section 6). [sent-727, score-0.934]
93 K Since the maximum intervention set size is an integer which is lower bounded by the average intervention set size, yet less than one above it, Equation 7 follows directly. [sent-753, score-1.391]
94 Upper Bound of Cai (1984b) We verify the upper bound in Equation 15: The minimum number of experiments m(n, r), given a limit r on the maximum intervention set size, has an upper bound of ⌈2n/r⌉, when n > 1 r2 . [sent-755, score-0.77]
95 Then, Algorithm 3 will produce intervention sets 3067 H YTTINEN , E BERHARDT AND H OYER with average size (Equation 11) bounded by r: meanK |Jk | = k=1 ≤ ≤ 1 r n·l || ≤ K K 2n nlr ||l ≤ 2 2n 2nr = r. [sent-760, score-0.746]
96 2n Because the index sets are chosen fairly, the maximum intervention set size (Equation 12) is also bounded by integer r: maxK |Jk | = ⌈meanK |Jk |⌉ k=1 k=1 ≤ r. [sent-761, score-0.843]
97 (2001) We verify the upper bound in Equation 16: The minimum number of experiments m(n, r), given a ′ limit r on the maximum intervention set size, has an upper bound of min{K ′ |n ≤ ⌊KK ′ r/n⌋ }, when ′ r ≤ n/2. [sent-764, score-0.77]
98 Then, Algorithm 3 will produce intervention sets with average size (Equation 11) bounded by r: meanK |Jk | = k=1 ≤ Kr n·l ||l ≤ K n nKr = r. [sent-769, score-0.746]
99 Kn Because the index sets are chosen fairly, the maximum intervention set size (Equation 12) is also bounded by integer r: maxK |Jk | = ⌈meanK |Jk |⌉ k=1 k=1 ≤ r. [sent-770, score-0.843]
100 Active learning of causal networks with intervention experiments and optimal designs. [sent-878, score-1.053]
wordName wordTfidf (topN-words)
[('intervention', 0.656), ('unordered', 0.342), ('causal', 0.323), ('jk', 0.163), ('ordered', 0.156), ('antichain', 0.146), ('pair', 0.128), ('separating', 0.119), ('eberhardt', 0.111), ('directed', 0.111), ('graph', 0.104), ('ausal', 0.098), ('berhardt', 0.098), ('oyer', 0.098), ('xperiment', 0.098), ('yttinen', 0.098), ('index', 0.095), ('hyttinen', 0.085), ('intervened', 0.085), ('passive', 0.078), ('iscovery', 0.076), ('experiments', 0.074), ('meank', 0.072), ('ek', 0.071), ('satisfaction', 0.066), ('intervening', 0.062), ('pairs', 0.06), ('cut', 0.06), ('colexicographical', 0.059), ('election', 0.058), ('condition', 0.054), ('cuts', 0.053), ('observational', 0.052), ('experiment', 0.045), ('identi', 0.043), ('variables', 0.042), ('minimal', 0.042), ('katona', 0.039), ('chromatic', 0.039), ('background', 0.039), ('combinatorics', 0.035), ('size', 0.035), ('acyclicity', 0.035), ('undirected', 0.035), ('sets', 0.034), ('edges', 0.034), ('discovery', 0.034), ('unsatis', 0.034), ('cai', 0.033), ('constructions', 0.033), ('completely', 0.033), ('causally', 0.033), ('halld', 0.033), ('passively', 0.033), ('sperner', 0.033), ('faithfulness', 0.031), ('edge', 0.03), ('satis', 0.029), ('cyclic', 0.027), ('satisfy', 0.026), ('hauser', 0.026), ('rsson', 0.026), ('welsh', 0.026), ('ss', 0.026), ('coloring', 0.025), ('rr', 0.024), ('helsinki', 0.023), ('graphs', 0.023), ('maximum', 0.023), ('indexes', 0.023), ('ability', 0.022), ('interventions', 0.022), ('ii', 0.022), ('roberts', 0.022), ('equation', 0.022), ('average', 0.021), ('uu', 0.021), ('terminology', 0.02), ('lowest', 0.02), ('freq', 0.02), ('ramsay', 0.02), ('knowledge', 0.02), ('antichains', 0.02), ('interventional', 0.02), ('jukna', 0.02), ('kleitman', 0.02), ('northern', 0.02), ('spencer', 0.02), ('territory', 0.02), ('sizes', 0.019), ('maxk', 0.019), ('xi', 0.018), ('iv', 0.018), ('relationships', 0.018), ('distinct', 0.018), ('vertices', 0.018), ('minimum', 0.017), ('skeleton', 0.017), ('concerning', 0.017), ('aware', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 41 jmlr-2013-Experiment Selection for Causal Discovery
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Randomized controlled experiments are often described as the most reliable tool available to scientists for discovering causal relationships among quantities of interest. However, it is often unclear how many and which different experiments are needed to identify the full (possibly cyclic) causal structure among some given (possibly causally insufficient) set of variables. Recent results in the causal discovery literature have explored various identifiability criteria that depend on the assumptions one is able to make about the underlying causal process, but these criteria are not directly constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions already exist in the combinatorics literature, albeit under terminology which is unfamiliar to most of the causal discovery community. In this paper we translate the theoretical results and apply them to the concrete problem of experiment selection. For a variety of settings we give explicit constructions of the optimal set of experiments and adapt some of the general combinatorics results to answer questions relating to the problem of experiment selection. Keywords: causality, randomized experiments, experiment selection, separating systems, completely separating systems, cut-coverings
2 0.12608305 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Author: Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, Ed Snelson
Abstract: This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the system performance. This work is illustrated by experiments on the ad placement system associated with the Bing search engine. Keywords: causation, counterfactual reasoning, computational advertising
3 0.070904106 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models
Author: Aapo Hyvärinen, Stephen M. Smith
Abstract: We present new measures of the causal direction, or direction of effect, between two non-Gaussian random variables. They are based on the likelihood ratio under the linear non-Gaussian acyclic model (LiNGAM). We also develop simple first-order approximations of the likelihood ratio and analyze them based on related cumulant-based measures, which can be shown to find the correct causal directions. We show how to apply these measures to estimate LiNGAM for more than two variables, and even in the case of more variables than observations. We further extend the method to cyclic and nonlinear models. The proposed framework is statistically at least as good as existing ones in the cases of few data points or noisy data, and it is computationally and conceptually very simple. Results on simulated fMRI data indicate that the method may be useful in neuroimaging where the number of time points is typically quite small. Keywords: structural equation model, Bayesian network, non-Gaussianity, causality, independent component analysis
4 0.063189611 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
Author: Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, Constantin F. Aliferis
Abstract: Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains. Keywords: Markov boundary discovery, variable/feature selection, information equivalence, violations of faithfulness
5 0.056978125 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
Author: Naftali Harris, Mathias Drton
Abstract: The PC algorithm uses conditional independence tests for model selection in graphical modeling with acyclic directed graphs. In Gaussian models, tests of conditional independence are typically based on Pearson correlations, and high-dimensional consistency results have been obtained for the PC algorithm in this setting. Analyzing the error propagation from marginal to partial correlations, we prove that high-dimensional consistency carries over to a broader class of Gaussian copula or nonparanormal models when using rank-based measures of correlation. For graph sequences with bounded degree, our consistency result is as strong as prior Gaussian results. In simulations, the ‘Rank PC’ algorithm works as well as the ‘Pearson PC’ algorithm for normal data and considerably better for non-normal data, all the while incurring a negligible increase of computation time. While our interest is in the PC algorithm, the presented analysis of error propagation could be applied to other algorithms that test the vanishing of low-order partial correlations. Keywords: Gaussian copula, graphical model, model selection, multivariate normal distribution, nonparanormal distribution
6 0.053292576 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
7 0.048259091 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
8 0.04670674 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
9 0.04360386 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning
10 0.039754175 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs
11 0.036664553 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
12 0.034168649 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs
13 0.033076983 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization
14 0.032185845 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
15 0.027854731 20 jmlr-2013-CODA: High Dimensional Copula Discriminant Analysis
16 0.026426954 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
17 0.025886092 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
18 0.025256658 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints
19 0.024779657 70 jmlr-2013-Maximum Volume Clustering: A New Discriminative Clustering Approach
20 0.023887977 106 jmlr-2013-Stationary-Sparse Causality Network Learning
topicId topicWeight
[(0, -0.131), (1, 0.022), (2, -0.004), (3, 0.062), (4, 0.118), (5, 0.158), (6, -0.006), (7, 0.036), (8, 0.007), (9, 0.061), (10, -0.045), (11, -0.095), (12, -0.095), (13, -0.159), (14, 0.048), (15, 0.218), (16, 0.102), (17, 0.08), (18, 0.03), (19, 0.195), (20, 0.213), (21, 0.027), (22, 0.201), (23, 0.219), (24, -0.156), (25, 0.007), (26, 0.124), (27, 0.132), (28, -0.133), (29, -0.098), (30, -0.052), (31, 0.042), (32, 0.106), (33, 0.001), (34, 0.025), (35, 0.019), (36, -0.021), (37, 0.043), (38, -0.115), (39, -0.089), (40, -0.046), (41, 0.037), (42, -0.071), (43, 0.002), (44, -0.023), (45, -0.126), (46, 0.066), (47, 0.061), (48, -0.004), (49, 0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.95594692 41 jmlr-2013-Experiment Selection for Causal Discovery
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Randomized controlled experiments are often described as the most reliable tool available to scientists for discovering causal relationships among quantities of interest. However, it is often unclear how many and which different experiments are needed to identify the full (possibly cyclic) causal structure among some given (possibly causally insufficient) set of variables. Recent results in the causal discovery literature have explored various identifiability criteria that depend on the assumptions one is able to make about the underlying causal process, but these criteria are not directly constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions already exist in the combinatorics literature, albeit under terminology which is unfamiliar to most of the causal discovery community. In this paper we translate the theoretical results and apply them to the concrete problem of experiment selection. For a variety of settings we give explicit constructions of the optimal set of experiments and adapt some of the general combinatorics results to answer questions relating to the problem of experiment selection. Keywords: causality, randomized experiments, experiment selection, separating systems, completely separating systems, cut-coverings
2 0.78151548 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
Author: Léon Bottou, Jonas Peters, Joaquin Quiñonero-Candela, Denis X. Charles, D. Max Chickering, Elon Portugaly, Dipankar Ray, Patrice Simard, Ed Snelson
Abstract: This work shows how to leverage causal inference to understand the behavior of complex learning systems interacting with their environment and predict the consequences of changes to the system. Such predictions allow both humans and algorithms to select the changes that would have improved the system performance. This work is illustrated by experiments on the ad placement system associated with the Bing search engine. Keywords: causation, counterfactual reasoning, computational advertising
3 0.46735215 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models
Author: Aapo Hyvärinen, Stephen M. Smith
Abstract: We present new measures of the causal direction, or direction of effect, between two non-Gaussian random variables. They are based on the likelihood ratio under the linear non-Gaussian acyclic model (LiNGAM). We also develop simple first-order approximations of the likelihood ratio and analyze them based on related cumulant-based measures, which can be shown to find the correct causal directions. We show how to apply these measures to estimate LiNGAM for more than two variables, and even in the case of more variables than observations. We further extend the method to cyclic and nonlinear models. The proposed framework is statistically at least as good as existing ones in the cases of few data points or noisy data, and it is computationally and conceptually very simple. Results on simulated fMRI data indicate that the method may be useful in neuroimaging where the number of time points is typically quite small. Keywords: structural equation model, Bayesian network, non-Gaussianity, causality, independent component analysis
4 0.43669689 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
Author: Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, Constantin F. Aliferis
Abstract: Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains. Keywords: Markov boundary discovery, variable/feature selection, information equivalence, violations of faithfulness
5 0.28051609 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
Author: Rami Mahdi, Jason Mezey
Abstract: Constraint-based learning of Bayesian networks (BN) from limited data can lead to multiple testing problems when recovering dense areas of the skeleton and to conflicting results in the orientation of edges. In this paper, we present a new constraint-based algorithm, light mutual min (LMM) for improved accuracy of BN learning from small sample data. LMM improves the assessment of candidate edges by using a ranking criterion that considers conditional independence on neighboring variables at both sides of an edge simultaneously. The algorithm also employs an adaptive relaxation of constraints that, selectively, allows some nodes not to condition on some neighbors. This relaxation aims at reducing the incorrect rejection of true edges connecting high degree nodes due to multiple testing. LMM additionally incorporates a new criterion for ranking v-structures that is used to recover the completed partially directed acyclic graph (CPDAG) and to resolve conflicting v-structures, a common problem in small sample constraint-based learning. Using simulated data, each of these components of LMM is shown to significantly improve network inference compared to commonly applied methods when learning from limited data, including more accurate recovery of skeletons and CPDAGs compared to the PC, MaxMin, and MaxMin hill climbing algorithms. A proof of asymptotic correctness is also provided for LMM for recovering the correct skeleton and CPDAG. Keywords: Bayesian networks, skeleton, constraint-based learning, mutual min
6 0.24154651 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs
7 0.23844889 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
8 0.23806694 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning
9 0.22666959 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
10 0.20637687 106 jmlr-2013-Stationary-Sparse Causality Network Learning
11 0.18690021 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing
12 0.18589145 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
13 0.1691214 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
14 0.16569985 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
15 0.1565018 94 jmlr-2013-Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections
16 0.14473711 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs
17 0.14251201 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization
18 0.14188391 40 jmlr-2013-Efficient Program Synthesis Using Constraint Satisfaction in Inductive Logic Programming
19 0.12972957 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
20 0.12733687 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit
topicId topicWeight
[(0, 0.023), (5, 0.168), (6, 0.049), (10, 0.065), (19, 0.387), (20, 0.017), (23, 0.026), (68, 0.056), (70, 0.026), (75, 0.035), (85, 0.014), (87, 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.68904257 41 jmlr-2013-Experiment Selection for Causal Discovery
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Randomized controlled experiments are often described as the most reliable tool available to scientists for discovering causal relationships among quantities of interest. However, it is often unclear how many and which different experiments are needed to identify the full (possibly cyclic) causal structure among some given (possibly causally insufficient) set of variables. Recent results in the causal discovery literature have explored various identifiability criteria that depend on the assumptions one is able to make about the underlying causal process, but these criteria are not directly constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions already exist in the combinatorics literature, albeit under terminology which is unfamiliar to most of the causal discovery community. In this paper we translate the theoretical results and apply them to the concrete problem of experiment selection. For a variety of settings we give explicit constructions of the optimal set of experiments and adapt some of the general combinatorics results to answer questions relating to the problem of experiment selection. Keywords: causality, randomized experiments, experiment selection, separating systems, completely separating systems, cut-coverings
2 0.65018171 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
Author: Nima Noorshams, Martin J. Wainwright
Abstract: The sum-product or belief propagation (BP) algorithm is a widely used message-passing technique for computing approximate marginals in graphical models. We introduce a new technique, called stochastic orthogonal series message-passing (SOSMP), for computing the BP fixed point in models with continuous random variables. It is based on a deterministic approximation of the messages via orthogonal series basis expansion, and a stochastic estimation of the basis coefficients via Monte Carlo techniques and damped updates. We prove that the SOSMP iterates converge to a δ-neighborhood of the unique BP fixed point for any tree-structured graph, and for any graphs with cycles in which the BP updates satisfy a contractivity condition. In addition, we demonstrate how to choose the number of basis coefficients as a function of the desired approximation accuracy δ and smoothness of the compatibility functions. We illustrate our theory with both simulated examples and in application to optical flow estimation. Keywords: graphical models, sum-product for continuous state spaces, low-complexity belief propagation, stochastic approximation, Monte Carlo methods, orthogonal basis expansion
3 0.45184913 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
Author: Yuchen Zhang, John C. Duchi, Martin J. Wainwright
Abstract: We analyze two communication-efficient algorithms for distributed optimization in statistical settings involving large-scale data sets. The first algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves √ mean-squared error (MSE) that decays as O (N −1 + (N/m)−2 ). Whenever m ≤ N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O (N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O (N −1 + (N/m)−3/2 ), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N ≈ 2.4 × 108 samples and d ≈ 740,000 covariates. Keywords: distributed learning, stochastic optimization, averaging, subsampling
4 0.4444795 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
Author: Tony Cai, Wen-Xin Zhou
Abstract: We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed. Keywords: 1-bit matrix completion, low-rank matrix, max-norm, trace-norm, constrained optimization, maximum likelihood estimate, optimal rate of convergence
5 0.44366217 15 jmlr-2013-Bayesian Canonical Correlation Analysis
Author: Arto Klami, Seppo Virtanen, Samuel Kaski
Abstract: Canonical correlation analysis (CCA) is a classical method for seeking correlations between two multivariate data sets. During the last ten years, it has received more and more attention in the machine learning community in the form of novel computational formulations and a plethora of applications. We review recent developments in Bayesian models and inference methods for CCA which are attractive for their potential in hierarchical extensions and for coping with the combination of large dimensionalities and small sample sizes. The existing methods have not been particularly successful in fulfilling the promise yet; we introduce a novel efficient solution that imposes group-wise sparsity to estimate the posterior of an extended model which not only extracts the statistical dependencies (correlations) between data sets but also decomposes the data into shared and data set-specific components. In statistics literature the model is known as inter-battery factor analysis (IBFA), for which we now provide a Bayesian treatment. Keywords: Bayesian modeling, canonical correlation analysis, group-wise sparsity, inter-battery factor analysis, variational Bayesian approximation
7 0.44285762 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
8 0.44173488 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
9 0.44133684 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
11 0.43993184 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
12 0.43805477 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
13 0.4370172 114 jmlr-2013-The Rate of Convergence of AdaBoost
14 0.43669784 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
15 0.43503612 73 jmlr-2013-Multicategory Large-Margin Unified Machines
16 0.43268701 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
17 0.43239841 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
18 0.43230793 68 jmlr-2013-Machine Learning with Operational Costs
19 0.43133295 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs
20 0.43112978 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion