jmlr jmlr2013 jmlr2013-11 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, Constantin F. Aliferis
Abstract: Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains. Keywords: Markov boundary discovery, variable/feature selection, information equivalence, violations of faithfulness
Reference: text
sentIndex sentText sentNum sentScore
1 Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. [sent-15, score-0.614]
2 The Markov boundary M is a minimal set of variables conditioned on which all the remaining variables in the data set, excluding the response variable T, are rendered statistically independent of the response variable T. [sent-33, score-0.582]
3 There are at least three practical benefits of algorithms that could systematically discover from data multiple Markov boundaries of the response variable of interest: First, such algorithms would improve discovery of the underlying mechanisms by not missing causative variables. [sent-48, score-0.67]
4 , these protocols can be thought of Markov boundaries of the diagnostic response variable) but different cost and 500 D ISCOVERY OF M ULTIPLE M ARKOV B OUNDARIES radiation exposure level. [sent-58, score-0.559]
5 Algorithms for induction of multiple Markov boundaries can be helpful for de-novo identification of such protocols from patient data. [sent-59, score-0.581]
6 Section 3 lists prior algorithms for induction of multiple Markov boundaries and variable sets. [sent-87, score-0.626]
7 4 Markov Boundary Theory In this subsection we first define the concepts of Markov blanket and Markov boundary and theoretically characterize distributions with multiple Markov boundaries of the same response variable. [sent-179, score-0.936]
8 The following two lemmas allow us to explicitly construct and verify multiple Markov blankets and Markov boundaries when the distribution violates the intersection property (proofs are given in Appendix A). [sent-193, score-0.56]
9 Notice that IAMB identifies a Markov boundary of T by essentially implementing its definition and conditioning on the entire Markov boundary when testing variables for independence from the response T. [sent-272, score-0.573]
10 Prior Algorithms for Learning Multiple Markov Boundaries and Variable Sets Table 1 summarizes the properties of prior algorithms for learning multiple Markov boundaries and variable sets, while a detailed description of the algorithms and their theoretical analysis is presented in Appendix C. [sent-330, score-0.561]
11 TIE∗ : A Family of Multiple Markov Boundary Induction Algorithms In this section we present a generative anytime algorithm TIE∗ (which is an acronym for “Target Information Equivalence”) for learning from data all Markov boundaries of the response variable. [sent-339, score-0.595]
12 Use algorithm X to learn a Markov boundary Mnew of T from the dataset De If Mnew is a Markov boundary of T in the original distribution according to criterion Z, output Mnew Until all datasets De generated by procedure Y have been considered. [sent-362, score-0.567]
13 The Markov boundary induction algorithm X can correctly identify a Markov boundary of T both in the dataset D (from the original distribution) and in all datasets De (from the embedded distributions) that are generated by procedure Y. [sent-367, score-0.609]
14 For every Markov boundary of T (M) that exists in the original distribution, the procedure Y generates a dataset De = D(V \ G) such that M can be discovered by the Markov boundary induction algorithm X applied to the dataset De. [sent-368, score-0.593]
15 The motivation is that De may lead to identification of a new Markov boundary of T that was previously “invisible” to a single Markov boundary induction algorithm because it was “masked” by another Markov boundary of T. [sent-376, score-0.677]
16 Similarly, when the Markov boundary induction algorithm X is run on the data set De = V \ G where G = {B} or G = {A, B}, two additional Markov boundaries of T in the original distribution, {A, D, E, F} or {C, D, E, F}, respectively, are found and output. [sent-400, score-0.756]
17 This can be implemented by setting a counter of all identified Markov boundaries in the original distribution (i) and a counter of all identified Markov boundaries in the embedded distribution that are not Markov boundaries the original distribution ( j). [sent-436, score-1.573]
18 , Gn that were used in previous calls to IGS to generate datasets from the embedded distributions that led to discovery of the above Markov boundaries (we use G1=); * * x subsets G1 ,. [sent-444, score-0.629]
19 , Gm that were used in previous calls to IGS to generate datasets from the embedded distributions that did not lead to Markov boundaries in the original distribution. [sent-447, score-0.563]
20 Input Z: This is a criterion that can verify whether M new , a Markov boundary in the embedded distribution (that was found by application of the Markov boundary induction algorithm X in step 5 of TIE∗ to the data set De ) is also a Markov boundary in the original distribution. [sent-466, score-0.791]
21 Theorem 10 The generative algorithm TIE∗ outputs all and only Markov boundaries of T that exist in the joint probability distribution P if the inputs X, Y, Z are admissible (i. [sent-489, score-0.583]
22 Specifically, we need to require that all members of all Markov boundaries retain marginal and conditional dependence on T, except for certain violations of the intersection property. [sent-497, score-0.56]
23 As mentioned in the beginning of Section 4, the generative nature of TIE∗ facilitates design of new algorithms for discovery of multiple Markov boundaries by simply instantiating TIE∗ with input components X, Y, Z. [sent-505, score-0.589]
24 As with all sound and complete computational causal discovery algorithms, discovery of all Markov boundaries (and even one Markov boundary) is worst-case intractable. [sent-520, score-0.654]
25 When N Markov boundaries with the average size |M | are present in the distribution, TIE∗ with IGS procedure will invoke a Markov boundary induction algorithm no more than O(N2|M | ) times. [sent-527, score-0.775]
26 In this case, induction of the first Markov boundary still takes O(|V |2|M | ) independence tests, but all consecutive Markov boundaries typically require less than O(|V |) conditional independence tests. [sent-530, score-0.822]
27 Finally, we build Cartesian product of information equivalence relations for subsets of M that are stored in Θ and obtain 4 Markov boundaries of T : {A, B, F}, {A, D, F}, {C, B, F}, and {C, D, F}. [sent-564, score-0.556]
28 The iTIE∗ algorithm correctly identifies all Markov boundaries under the following sufficient assumptions: (a) all equivalence relations in the underlying distribution follow from context-independent equivalence relations of individual variables, and (b) the assumptions of Theorem 12 hold. [sent-565, score-0.585]
29 As before, |M | denotes the average size of a Markov boundary and the above complexity bound assumes that the size of a candidate Markov boundary obtained in the Forward phase is close to the size of a true Markov boundary obtained at the end of the Backward phase (see Figure 5). [sent-576, score-0.695]
30 Empirical Experiments In this section, we present experimental results obtained by applying methods for learning multiple Markov boundaries and variable sets on simulated and real data. [sent-587, score-0.581]
31 These methods were chosen for our evaluation as they are the current state-of-the-art techniques for discovery of multiple Markov boundaries and variable sets. [sent-589, score-0.598]
32 A detailed description of parameters of prior methods for induction of multiple Markov boundaries and variable sets is provided in Appendix C. [sent-592, score-0.626]
33 Assessment of classification performance of the extracted Markov boundaries and variable sets was done using Support Vector Machines (SVMs) (Vapnik, 1998). [sent-594, score-0.621]
34 1 Experiments with Simulated Data Below we present an evaluation of methods for extraction of multiple Markov boundaries and variable sets in simulated data. [sent-624, score-0.581]
35 Simulated data allows us to evaluate methods in a controlled setting where the underlying causal process and all Markov boundaries of the response variable T are known exactly. [sent-625, score-0.697]
36 T IED1000 allows us to study the behavior of different methods for learning multiple Markov boundaries and variable sets in an environment where the fraction of variables carrying relevant information about T is small. [sent-634, score-0.621]
37 All methods for extracting multiple Markov boundaries and variable sets were assessed based on the following six performance criteria: I. [sent-636, score-0.582]
38 The number of true Markov boundaries identified exactly, that is, without false positives and false negatives. [sent-641, score-0.612]
39 , 1 or 2 variables in each subset) to be the presumed Markov boundaries of the response variable T . [sent-722, score-0.664]
40 Similarly, criterion III (N) can be maximized independently of the response T by simply taking all 2|V |−1 − 1 non-empty subsets of the variable set V \{T } to be the presumed Markov boundaries of T . [sent-726, score-0.671]
41 527 S TATNIKOV, LYTKIN , L EMEIRE AND A LIFERIS Finally, given ranks on the individual criteria I (PV) and II (AUC), we defined a combined (PV, AUC) ranking criterion which reflects the ability of methods to find Markov boundaries in real data. [sent-739, score-0.569]
42 15 Figure 15 shows a 2dimensional plot of PV versus AUC and a 3-dimensional plot of PV versus AUC versus the number of extracted distinct Markov boundaries or variable sets (N). [sent-748, score-0.6]
43 It is worth noting that use of the AUC metric for verification of Markov boundaries in the Predictivity criterion of TIE∗ can result in some spurious multiplicity of the output Markov boundaries. [sent-773, score-0.57]
44 76 Log2(Average number of output distinct Markov boundaries or variable sets) Average number of MBs (log2) 0. [sent-785, score-0.557]
45 855 Table 3: Number of distinct Markov boundaries or variable sets identified by the evaluated methods (N), proportion of variables in them (PV) and their classification performance (AUC) averaged across all 13 real data sets for each method. [sent-881, score-0.633]
46 The average number of minimal Markov boundaries identified by TIE∗ was 1,484 (versus the average number of all Markov boundaries identified by TIE∗ equal to 1,993). [sent-893, score-1.012]
47 872 AUC) of the minimal Markov boundaries were statistically indistinguishable from the results obtained on all Markov boundaries identified by TIE∗ and so were the ranks on the PV, AUC and (PV, AUC) criteria. [sent-896, score-1.018]
48 In summary, TIE∗ extracted multiple compact Markov boundaries with high classification performance and surpassed all other methods on the combined (PV, AUC) criterion. [sent-897, score-0.605]
49 In this paper, we showed that TIE∗ parameterized with Semi-Interleaved HITON-PC as the base Markov boundary induction algorithm was able to identify multiple compact Markov boundaries with consistently high classification performance in real data. [sent-901, score-0.806]
50 For example, in the ACPJ Etiology data set, TIE∗ identified 5,330 distinct Markov boundaries (and 4,263 minimal ones) that on average contained 18 variables out of 28,228 and had an AUC of 0. [sent-902, score-0.566]
51 Out of all prior methods for learning multiple Markov boundaries and variable sets applied to the same data set, Resampling+UAF had the highest classification performance with an AUC of 0. [sent-904, score-0.582]
52 A similar pattern can be observed in the Dexter data set where TIE∗ identified 4,791 distinct Markov boundaries (and 3,498 minimal ones) with an average size of 17 variables out of 19,999 and an AUC of 0. [sent-906, score-0.566]
53 The best performer among prior methods in the same data was EGS-CMIM with Markov boundaries containing 50 variables each and an average AUC of 0. [sent-908, score-0.566]
54 The compactness of Markov boundaries extracted by TIE∗ coupled with their high classification performance provides strong evidence that there are indeed multiple Markov boundaries in many real-life problem domains. [sent-910, score-1.092]
55 Second, we conducted an empirical comparison of TIE∗ with 26 state-of-the-art methods for discovery of multiple Markov boundaries and variable sets. [sent-935, score-0.598]
56 We found that unlike prior methods, TIE∗ identifies exactly all true Markov boundaries in simulated data, and in real data it yields Markov boundaries with simultaneously better classification performance and smaller number of variables compared to prior methods. [sent-937, score-1.075]
57 Other notable contributions of this work include: (i) developing a deeper theoretical understanding of distributions with multiple Markov boundaries of the same variable (Sections 2. [sent-938, score-0.561]
58 4), (ii) theoretical analysis of prior state-of-the-art algorithms for discovery of multiple Markov boundaries and variable sets (Appendix C), (iii) a novel simple and fast algorithm iTIE∗ for learning multiple Markov boundaries in special distributions (Section 4. [sent-940, score-1.114]
59 In the example stated above, this would lead to running a Markov boundary induction algorithm 27 = 128 times (because there are 7 members in the union of all Markov boundaries) to find all Markov boundaries that exist in the distribution. [sent-953, score-0.78]
60 , G∗ ⊂ G) that did not result in discovery of a Markov boundary when the Markov boundary induction algorithm has been previously run on the data for variables in V \ G∗ . [sent-957, score-0.57]
61 In the example stated above, this principle as exemplified in IGS procedure would lead to running a single Markov boundary induction algorithm only 8 times in order to find all Markov boundaries that exist in the distribution. [sent-959, score-0.756]
62 We found that in a sufficiently large sample size (≥ 2,000), TIE∗ can discover all 25 true Markov boundaries with only 1 false positive in each extracted Markov boundary. [sent-985, score-0.599]
63 Also, as we have pointed out, the use of the AUC metric for verification of Markov boundaries in the Predictivity criterion of TIE∗ can result in a small percentage of spurious Markov boundaries in the output of the algorithm. [sent-992, score-1.037]
64 In this paper we experimented with one approach to reduce spurious multiplicity of TIE∗ by filtering extracted Markov boundaries to the minimal ones. [sent-994, score-0.575]
65 Finally, another limitation of this study is that we included in empirical experiments both algorithms for discovery of multiple Markov boundaries and algorithms for discovery of multiple variable sets. [sent-1001, score-0.664]
66 This is because eventually the procedure will generate a data set De for every Markov boundary of T such that each data set 538 D ISCOVERY OF M ULTIPLE M ARKOV B OUNDARIES contains all members of only one Markov boundary and thus a single Markov boundary induction algorithm X can discover it. [sent-1130, score-0.701]
67 Since M new is a Markov boundary of T in the embedded distribution and it is a Markov blanket of T in the original distribution, it is also a Markov boundary of T in the original distribution. [sent-1147, score-0.628]
68 If M new is a Markov blanket of T in the original distribution, it is also a Markov boundary of T in the original distribution because M new is a Markov boundary of T in the embedded distribution. [sent-1152, score-0.628]
69 Description and Theoretical Analysis of Prior Algorithms for Learning Multiple Markov Boundaries and Variable Sets This appendix provides description and theoretical analysis of prior algorithms for learning multiple Markov boundaries and variable sets. [sent-1207, score-0.561]
70 n Description: Recall that the IAMB algorithm (Figure 4) requires only the local composition property for its correctness (per Theorem 8) which is compatible with the existence of multiple Markov boundaries of the response variable T . [sent-1211, score-0.683]
71 Theoretically, KIAMB can identify all Markov boundaries if given the chance to explore a large enough number of different sequences of additions of associated variables into the current Markov boundary in Phase I. [sent-1281, score-0.751]
72 Thus, in order to produce the complete set of Markov boundaries, the value of parameter K and the number of runs of KIAMB must be determined based on the topology of the causal graph and the number of Markov boundaries of T , neither of which are known in real-world causal discovery applications. [sent-1313, score-0.755]
73 Description: These algorithms attempt to extract multiple Markov boundaries by repeatedly invoking single Markov boundary extraction methods CMIM (Fleuret, 2004) and NCMIGS (Liu et al. [sent-1318, score-0.72]
74 Note that while admissible values of t (and therefore of l) are by design bounded from above by the number of variables in the data, the actual number of true Markov boundaries may be much higher. [sent-1341, score-0.589]
75 First, the number of Markov boundaries output by EGSG is an arbitrary parameter and is independent of the data-generating causal graph. [sent-1366, score-0.605]
76 Second, Markov boundaries output by EGSG may contain an arbitrary number of false positives as well as false negatives. [sent-1367, score-0.637]
77 Description: Iterative removal methods identify multiple Markov boundaries (IR-HITON-PC) or multiple variable sets (IR-SPLR) by repeatedly executing the following two steps. [sent-1409, score-0.615]
78 output disjoint Markov boundaries or variable sets, while in general multiple Markov boundaries may share a number of variables. [sent-1439, score-1.073]
79 We discuss Markov boundary induction algorithm (X), procedure to generate data sets from the embedded distributions (Y), and criterion to verify Markov boundaries of T (Z). [sent-1447, score-0.852]
80 Criterion Independence to verify Markov boundaries (Figure 10): This criterion was implemented using statistical tests that were described above for the Markov boundary induction algorithms. [sent-1496, score-0.794]
81 This matching was performed by finding a minimum-weight matching in a complete bipartite graph G =< V 1 ∪ V 2 , E >, where vertices in V 1 corresponded to the true Markov boundaries and vertices in V 2 corresponded to the extracted Markov boundaries/variable sets. [sent-1520, score-0.575]
82 In addition, since the true Markov boundaries are unknown in practical applications, the average classification performance (criterion VI) was computed over all distinct Markov boundaries/variable sets extracted by a method. [sent-1525, score-0.595]
83 The average classification performance of Markov boundaries extracted by KIAMB was about 20% lower than of the MAP-BN classifier in both data sets. [sent-1531, score-0.595]
84 In addition, the average size of Markov boundaries extracted by EGSG increased almost 10-fold, from 7 in T IED to 67 in T IED1000, while the number of variables conditionally dependent on T in the underlying network remained unchanged. [sent-1543, score-0.653]
85 Classification performance of Markov boundaries extracted by EGSG was lower than performance of the MAP-BN classifier by 9-23% (depending on parameter settings) in T IED and by 27-55% in T IED1000. [sent-1546, score-0.597]
86 However, once that variable was found to be in a Markov boundary 550 D ISCOVERY OF M ULTIPLE M ARKOV B OUNDARIES by an iterative removal method, it was then removed from further consideration thus preventing all other extracted Markov boundaries from containing this variable. [sent-1561, score-0.829]
87 Markov boundaries extracted by IR-HITON-PC had 8-10% PFP (depending on the data set) and 10-20% FNR. [sent-1562, score-0.555]
88 As a result of high FNR in T IED, the average classification performance of Markov boundaries extracted by IR-HITONPC was about 2% lower than of the MAP-BN classifier in the same data set. [sent-1565, score-0.595]
89 (2010b) EGS-NCMIGS • l = 7, K = 10 • l = 5000, K = 10 • l = 7, K = 50 • l = 5000, K = 50 VARIABLE GROUPING-BASED MARKOV BOUNDARY DISCOVERY • ♯ of Markov boundaries = 30,t = 15 • ♯ of Markov boundaries = 5000,t = 15 EGSG Liu et al. [sent-1591, score-0.974]
90 (2010) • ♯ of Markov boundaries = 30,t = 10 • ♯ of Markov boundaries = 5000,t = 10 • ♯ of Markov boundaries = 30,t = 5 • ♯ of Markov boundaries = 5000,t = 5 RESAMPLING-BASED VARIABLE SELECTION • w/o statistical comparison of classification performance estimates Ein-Dor et al. [sent-1592, score-1.969]
91 Table 9: Parameterizations of methods for discovery of multiple Markov boundaries and variable sets. [sent-1611, score-0.598]
92 Small sizes of the extracted Markov boundaries were, to a large extent, due to KIAMB’s sample inefficiency resulting in inability to perform some of the required tests of independence as discussed in Appendix C. [sent-2065, score-0.588]
93 As a result, classification performance of Markov boundaries extracted by KIAMB was lower than of most other methods with KIAMB ranking 4 out of 5 by AUC. [sent-2066, score-0.576]
94 Markov boundaries extracted by EGSG were larger than Markov boundaries identified by many other methods and had the lowest average classification performance. [sent-2074, score-1.061]
95 Markov boundaries extracted by IR-HITON-PC had an average PV of 2. [sent-2082, score-0.574]
96 996 (Continued on the next page) Table 13: Results showing the number of distinct Markov boundaries or variable sets (N) extracted by each method, their average size in terms of the number of variables (S) and average classification performance (AUC) in each of 13 real data sets. [sent-2201, score-0.719]
97 Large-scale feature selection using markov blanket induction for the prediction of protein-drug binding. [sent-2499, score-0.62]
98 Hiton: a novel markov blanket algorithm for optimal variable selection. [sent-2506, score-0.6]
99 Local causal and markov blanket induction for causal discovery and feature selection for classification. [sent-2525, score-0.843]
100 Local causal and markov blanket induction for causal discovery and feature selection for classification. [sent-2536, score-0.843]
wordName wordTfidf (topN-words)
[('boundaries', 0.487), ('markov', 0.411), ('tie', 0.338), ('boundary', 0.204), ('aliferis', 0.176), ('blanket', 0.144), ('kiamb', 0.135), ('auc', 0.13), ('resampling', 0.128), ('uaf', 0.128), ('lytkin', 0.125), ('rfe', 0.125), ('pv', 0.122), ('igs', 0.118), ('emeire', 0.115), ('liferis', 0.115), ('tatnikov', 0.115), ('iamb', 0.111), ('oundaries', 0.111), ('itie', 0.108), ('egsg', 0.104), ('arkov', 0.095), ('causal', 0.093), ('tsamardinos', 0.089), ('iscovery', 0.086), ('ultiple', 0.086), ('statnikov', 0.075), ('response', 0.072), ('extracted', 0.068), ('induction', 0.065), ('ied', 0.061), ('variables', 0.06), ('dataset', 0.06), ('embedded', 0.058), ('parameterizations', 0.058), ('pfp', 0.054), ('mnew', 0.052), ('admissibility', 0.051), ('predictivity', 0.051), ('variable', 0.045), ('false', 0.044), ('fnr', 0.044), ('holdout', 0.043), ('admissible', 0.042), ('lemeire', 0.04), ('criterion', 0.038), ('instantiations', 0.037), ('positives', 0.037), ('discovery', 0.037), ('generative', 0.036), ('ancestral', 0.035), ('classi', 0.034), ('cmim', 0.034), ('ncmigs', 0.034), ('independence', 0.033), ('phase', 0.032), ('mbs', 0.03), ('xdom', 0.03), ('parameterization', 0.03), ('subsets', 0.029), ('multiple', 0.029), ('correctness', 0.027), ('intersection', 0.026), ('pareto', 0.026), ('runs', 0.025), ('output', 0.025), ('removal', 0.025), ('identi', 0.024), ('statistically', 0.024), ('members', 0.024), ('instantiation', 0.024), ('spouses', 0.024), ('ydom', 0.024), ('criteria', 0.024), ('frontier', 0.023), ('violations', 0.023), ('mani', 0.023), ('cancer', 0.023), ('composition', 0.023), ('equivalence', 0.022), ('guyon', 0.022), ('removing', 0.022), ('parents', 0.021), ('performance', 0.021), ('directed', 0.02), ('delong', 0.02), ('ranked', 0.02), ('vss', 0.02), ('explorer', 0.02), ('multiplicity', 0.02), ('simulated', 0.02), ('graph', 0.02), ('proportion', 0.02), ('ranks', 0.02), ('network', 0.019), ('average', 0.019), ('association', 0.018), ('distribution', 0.018), ('relations', 0.018), ('datasets', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999708 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
Author: Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, Constantin F. Aliferis
Abstract: Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains. Keywords: Markov boundary discovery, variable/feature selection, information equivalence, violations of faithfulness
2 0.063189611 41 jmlr-2013-Experiment Selection for Causal Discovery
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Randomized controlled experiments are often described as the most reliable tool available to scientists for discovering causal relationships among quantities of interest. However, it is often unclear how many and which different experiments are needed to identify the full (possibly cyclic) causal structure among some given (possibly causally insufficient) set of variables. Recent results in the causal discovery literature have explored various identifiability criteria that depend on the assumptions one is able to make about the underlying causal process, but these criteria are not directly constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions already exist in the combinatorics literature, albeit under terminology which is unfamiliar to most of the causal discovery community. In this paper we translate the theoretical results and apply them to the concrete problem of experiment selection. For a variety of settings we give explicit constructions of the optimal set of experiments and adapt some of the general combinatorics results to answer questions relating to the problem of experiment selection. Keywords: causality, randomized experiments, experiment selection, separating systems, completely separating systems, cut-coverings
3 0.051587332 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
Author: Rami Mahdi, Jason Mezey
Abstract: Constraint-based learning of Bayesian networks (BN) from limited data can lead to multiple testing problems when recovering dense areas of the skeleton and to conflicting results in the orientation of edges. In this paper, we present a new constraint-based algorithm, light mutual min (LMM) for improved accuracy of BN learning from small sample data. LMM improves the assessment of candidate edges by using a ranking criterion that considers conditional independence on neighboring variables at both sides of an edge simultaneously. The algorithm also employs an adaptive relaxation of constraints that, selectively, allows some nodes not to condition on some neighbors. This relaxation aims at reducing the incorrect rejection of true edges connecting high degree nodes due to multiple testing. LMM additionally incorporates a new criterion for ranking v-structures that is used to recover the completed partially directed acyclic graph (CPDAG) and to resolve conflicting v-structures, a common problem in small sample constraint-based learning. Using simulated data, each of these components of LMM is shown to significantly improve network inference compared to commonly applied methods when learning from limited data, including more accurate recovery of skeletons and CPDAGs compared to the PC, MaxMin, and MaxMin hill climbing algorithms. A proof of asymptotic correctness is also provided for LMM for recovering the correct skeleton and CPDAG. Keywords: Bayesian networks, skeleton, constraint-based learning, mutual min
4 0.051010713 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
Author: Naftali Harris, Mathias Drton
Abstract: The PC algorithm uses conditional independence tests for model selection in graphical modeling with acyclic directed graphs. In Gaussian models, tests of conditional independence are typically based on Pearson correlations, and high-dimensional consistency results have been obtained for the PC algorithm in this setting. Analyzing the error propagation from marginal to partial correlations, we prove that high-dimensional consistency carries over to a broader class of Gaussian copula or nonparanormal models when using rank-based measures of correlation. For graph sequences with bounded degree, our consistency result is as strong as prior Gaussian results. In simulations, the ‘Rank PC’ algorithm works as well as the ‘Pearson PC’ algorithm for normal data and considerably better for non-normal data, all the while incurring a negligible increase of computation time. While our interest is in the PC algorithm, the presented analysis of error propagation could be applied to other algorithms that test the vanishing of low-order partial correlations. Keywords: Gaussian copula, graphical model, model selection, multivariate normal distribution, nonparanormal distribution
5 0.049546197 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
Author: Vinayak Rao, Yee Whye Teh
Abstract: Markov jump processes (or continuous-time Markov chains) are a simple and important class of continuous-time dynamical systems. In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a finite set of virtual jump times given the current path, and then sampling a new path given the set of extant and virtual jump times. The first step involves simulating a piecewise-constant inhomogeneous Poisson process, while for the second, we use a standard hidden Markov model forward filtering-backward sampling algorithm. Our method is exact and does not involve approximations like time-discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks, and show significant computational benefits over state-ofthe-art MCMC samplers for these models. Keywords: Markov jump process, MCMC, Gibbs sampler, uniformization, Markov-modulated Poisson process, continuous-time Bayesian network
6 0.048736587 95 jmlr-2013-Ranking Forests
7 0.044731729 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
8 0.041911744 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
9 0.039784484 31 jmlr-2013-Derivative Estimation with Local Polynomial Fitting
10 0.039082486 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
11 0.029598456 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
12 0.029419571 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
13 0.028657712 85 jmlr-2013-Pairwise Likelihood Ratios for Estimation of Non-Gaussian Structural Equation Models
14 0.027090058 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model
15 0.025464926 58 jmlr-2013-Language-Motivated Approaches to Action Recognition
16 0.025386315 106 jmlr-2013-Stationary-Sparse Causality Network Learning
17 0.023988871 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
18 0.023108259 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints
19 0.022592889 22 jmlr-2013-Classifying With Confidence From Incomplete Information
topicId topicWeight
[(0, -0.128), (1, -0.011), (2, -0.039), (3, 0.038), (4, 0.066), (5, 0.097), (6, -0.007), (7, 0.057), (8, -0.014), (9, 0.052), (10, -0.033), (11, -0.079), (12, -0.079), (13, -0.222), (14, -0.031), (15, 0.058), (16, 0.069), (17, 0.0), (18, 0.072), (19, 0.189), (20, 0.043), (21, 0.033), (22, -0.125), (23, 0.072), (24, 0.075), (25, 0.03), (26, 0.202), (27, 0.066), (28, -0.101), (29, -0.12), (30, -0.094), (31, 0.099), (32, -0.123), (33, 0.006), (34, 0.136), (35, -0.055), (36, -0.056), (37, 0.035), (38, 0.032), (39, 0.137), (40, -0.087), (41, -0.04), (42, -0.007), (43, -0.033), (44, 0.229), (45, 0.04), (46, -0.017), (47, 0.068), (48, 0.043), (49, 0.09)]
simIndex simValue paperId paperTitle
same-paper 1 0.97579086 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
Author: Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, Constantin F. Aliferis
Abstract: Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains. Keywords: Markov boundary discovery, variable/feature selection, information equivalence, violations of faithfulness
2 0.46974075 41 jmlr-2013-Experiment Selection for Causal Discovery
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Randomized controlled experiments are often described as the most reliable tool available to scientists for discovering causal relationships among quantities of interest. However, it is often unclear how many and which different experiments are needed to identify the full (possibly cyclic) causal structure among some given (possibly causally insufficient) set of variables. Recent results in the causal discovery literature have explored various identifiability criteria that depend on the assumptions one is able to make about the underlying causal process, but these criteria are not directly constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions already exist in the combinatorics literature, albeit under terminology which is unfamiliar to most of the causal discovery community. In this paper we translate the theoretical results and apply them to the concrete problem of experiment selection. For a variety of settings we give explicit constructions of the optimal set of experiments and adapt some of the general combinatorics results to answer questions relating to the problem of experiment selection. Keywords: causality, randomized experiments, experiment selection, separating systems, completely separating systems, cut-coverings
3 0.46304327 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions
Author: Vinayak Rao, Yee Whye Teh
Abstract: Markov jump processes (or continuous-time Markov chains) are a simple and important class of continuous-time dynamical systems. In this paper, we tackle the problem of simulating from the posterior distribution over paths in these models, given partial and noisy observations. Our approach is an auxiliary variable Gibbs sampler, and is based on the idea of uniformization. This sets up a Markov chain over paths by alternately sampling a finite set of virtual jump times given the current path, and then sampling a new path given the set of extant and virtual jump times. The first step involves simulating a piecewise-constant inhomogeneous Poisson process, while for the second, we use a standard hidden Markov model forward filtering-backward sampling algorithm. Our method is exact and does not involve approximations like time-discretization. We demonstrate how our sampler extends naturally to MJP-based models like Markov-modulated Poisson processes and continuous-time Bayesian networks, and show significant computational benefits over state-ofthe-art MCMC samplers for these models. Keywords: Markov jump process, MCMC, Gibbs sampler, uniformization, Markov-modulated Poisson process, continuous-time Bayesian network
4 0.43115094 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
Author: Rami Mahdi, Jason Mezey
Abstract: Constraint-based learning of Bayesian networks (BN) from limited data can lead to multiple testing problems when recovering dense areas of the skeleton and to conflicting results in the orientation of edges. In this paper, we present a new constraint-based algorithm, light mutual min (LMM) for improved accuracy of BN learning from small sample data. LMM improves the assessment of candidate edges by using a ranking criterion that considers conditional independence on neighboring variables at both sides of an edge simultaneously. The algorithm also employs an adaptive relaxation of constraints that, selectively, allows some nodes not to condition on some neighbors. This relaxation aims at reducing the incorrect rejection of true edges connecting high degree nodes due to multiple testing. LMM additionally incorporates a new criterion for ranking v-structures that is used to recover the completed partially directed acyclic graph (CPDAG) and to resolve conflicting v-structures, a common problem in small sample constraint-based learning. Using simulated data, each of these components of LMM is shown to significantly improve network inference compared to commonly applied methods when learning from limited data, including more accurate recovery of skeletons and CPDAGs compared to the PC, MaxMin, and MaxMin hill climbing algorithms. A proof of asymptotic correctness is also provided for LMM for recovering the correct skeleton and CPDAG. Keywords: Bayesian networks, skeleton, constraint-based learning, mutual min
5 0.42409432 95 jmlr-2013-Ranking Forests
Author: Stéphan Clémençon, Marine Depecker, Nicolas Vayatis
Abstract: The present paper examines how the aggregation and feature randomization principles underlying the algorithm R ANDOM F OREST (Breiman, 2001) can be adapted to bipartite ranking. The approach taken here is based on nonparametric scoring and ROC curve optimization in the sense of the AUC criterion. In this problem, aggregation is used to increase the performance of scoring rules produced by ranking trees, as those developed in Cl´ mencon and Vayatis (2009c). The present work e ¸ describes the principles for building median scoring rules based on concepts from rank aggregation. Consistency results are derived for these aggregated scoring rules and an algorithm called R ANK ING F OREST is presented. Furthermore, various strategies for feature randomization are explored through a series of numerical experiments on artificial data sets. Keywords: bipartite ranking, nonparametric scoring, classification data, ROC optimization, AUC criterion, tree-based ranking rules, bootstrap, bagging, rank aggregation, median ranking, feature randomization
6 0.39516234 16 jmlr-2013-Bayesian Nonparametric Hidden Semi-Markov Models
7 0.39037701 30 jmlr-2013-Counterfactual Reasoning and Learning Systems: The Example of Computational Advertising
8 0.31978843 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections
9 0.31204325 73 jmlr-2013-Multicategory Large-Margin Unified Machines
10 0.27337715 22 jmlr-2013-Classifying With Confidence From Incomplete Information
11 0.24780551 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints
12 0.24519856 31 jmlr-2013-Derivative Estimation with Local Polynomial Fitting
14 0.22880989 106 jmlr-2013-Stationary-Sparse Causality Network Learning
15 0.21876949 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
16 0.20775835 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation
17 0.20666966 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
18 0.20459552 104 jmlr-2013-Sparse Single-Index Model
19 0.1973969 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks
20 0.19514434 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
topicId topicWeight
[(0, 0.022), (5, 0.119), (6, 0.032), (10, 0.046), (20, 0.021), (23, 0.028), (68, 0.54), (70, 0.024), (71, 0.011), (75, 0.029), (85, 0.013), (87, 0.012)]
simIndex simValue paperId paperTitle
same-paper 1 0.90965676 11 jmlr-2013-Algorithms for Discovery of Multiple Markov Boundaries
Author: Alexander Statnikov, Nikita I. Lytkin, Jan Lemeire, Constantin F. Aliferis
Abstract: Algorithms for Markov boundary discovery from data constitute an important recent development in machine learning, primarily because they offer a principled solution to the variable/feature selection problem and give insight on local causal structure. Over the last decade many sound algorithms have been proposed to identify a single Markov boundary of the response variable. Even though faithful distributions and, more broadly, distributions that satisfy the intersection property always have a single Markov boundary, other distributions/data sets may have multiple Markov boundaries of the response variable. The latter distributions/data sets are common in practical data-analytic applications, and there are several reasons why it is important to induce multiple Markov boundaries from such data. However, there are currently no sound and efficient algorithms that can accomplish this task. This paper describes a family of algorithms TIE* that can discover all Markov boundaries in a distribution. The broad applicability as well as efficiency of the new algorithmic family is demonstrated in an extensive benchmarking study that involved comparison with 26 state-of-the-art algorithms/variants in 15 data sets from a diversity of application domains. Keywords: Markov boundary discovery, variable/feature selection, information equivalence, violations of faithfulness
2 0.83931124 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
Author: Julien Mairal, Bin Yu
Abstract: We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs. Keywords: convex and non-convex optimization, network flow optimization, graph sparsity
3 0.48907194 27 jmlr-2013-Consistent Selection of Tuning Parameters via Variable Selection Stability
Author: Wei Sun, Junhui Wang, Yixin Fang
Abstract: Penalized regression models are popularly used in high-dimensional data analysis to conduct variable selection and model fitting simultaneously. Whereas success has been widely reported in literature, their performances largely depend on the tuning parameters that balance the trade-off between model fitting and model sparsity. Existing tuning criteria mainly follow the route of minimizing the estimated prediction error or maximizing the posterior model probability, such as cross validation, AIC and BIC. This article introduces a general tuning parameter selection criterion based on variable selection stability. The key idea is to select the tuning parameters so that the resultant penalized regression model is stable in variable selection. The asymptotic selection consistency is established for both fixed and diverging dimensions. Its effectiveness is also demonstrated in a variety of simulated examples as well as an application to the prostate cancer data. Keywords: kappa coefficient, penalized regression, selection consistency, stability, tuning
4 0.46719062 41 jmlr-2013-Experiment Selection for Causal Discovery
Author: Antti Hyttinen, Frederick Eberhardt, Patrik O. Hoyer
Abstract: Randomized controlled experiments are often described as the most reliable tool available to scientists for discovering causal relationships among quantities of interest. However, it is often unclear how many and which different experiments are needed to identify the full (possibly cyclic) causal structure among some given (possibly causally insufficient) set of variables. Recent results in the causal discovery literature have explored various identifiability criteria that depend on the assumptions one is able to make about the underlying causal process, but these criteria are not directly constructive for selecting the optimal set of experiments. Fortunately, many of the needed constructions already exist in the combinatorics literature, albeit under terminology which is unfamiliar to most of the causal discovery community. In this paper we translate the theoretical results and apply them to the concrete problem of experiment selection. For a variety of settings we give explicit constructions of the optimal set of experiments and adapt some of the general combinatorics results to answer questions relating to the problem of experiment selection. Keywords: causality, randomized experiments, experiment selection, separating systems, completely separating systems, cut-coverings
5 0.43152195 76 jmlr-2013-Nonparametric Sparsity and Regularization
Author: Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, Alessandro Verri
Abstract: In this work we are interested in the problems of supervised learning and variable selection when the input-output dependence is described by a nonlinear function depending on a few variables. Our goal is to consider a sparse nonparametric model, hence avoiding linear or additive models. The key idea is to measure the importance of each variable in the model by making use of partial derivatives. Based on this intuition we propose a new notion of nonparametric sparsity and a corresponding least squares regularization scheme. Using concepts and results from the theory of reproducing kernel Hilbert spaces and proximal methods, we show that the proposed learning algorithm corresponds to a minimization problem which can be provably solved by an iterative procedure. The consistency properties of the obtained estimator are studied both in terms of prediction and selection performance. An extensive empirical analysis shows that the proposed method performs favorably with respect to the state-of-the-art methods. Keywords: sparsity, nonparametric, variable selection, regularization, proximal methods, RKHS ∗. Also at Istituto Italiano di Tecnologia, Via Morego 30, 16163 Genova, Italy and Massachusetts Institute of Technology, Bldg. 46-5155, 77 Massachusetts Avenue, Cambridge, MA 02139, USA. c 2013 Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro and Alessandro Verri. ROSASCO , V ILLA , M OSCI , S ANTORO AND V ERRI
6 0.43133754 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems
7 0.42771024 42 jmlr-2013-Fast Generalized Subset Scan for Anomalous Pattern Detection
8 0.41210914 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
9 0.40249225 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs
10 0.40057036 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
11 0.39673656 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion
13 0.39253804 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
14 0.39208475 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs
15 0.39159635 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning
17 0.38810337 77 jmlr-2013-On the Convergence of Maximum Variance Unfolding
18 0.38576433 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators
19 0.38356537 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
20 0.38300428 46 jmlr-2013-GURLS: A Least Squares Library for Supervised Learning