nips nips2008 nips2008-108 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Danks, Clark Glymour, Robert E. Tillman
Abstract: In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. We present a novel, asymptotically correct procedure that discovers a minimal equivalence class of causal DAG structures using local independence information from distributed data of this form and evaluate its performance using synthetic and real-world data against causal discovery algorithms for single datasets and applying Structural EM, a heuristic DAG structure learning procedure for data with missing values, to the concatenated data.
Reference: text
sentIndex sentText sentNum sentScore
1 Integrating locally learned causal structures with overlapping variables Robert E. [sent-1, score-0.473]
2 edu Abstract In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. [sent-6, score-0.122]
3 While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. [sent-7, score-0.563]
4 Such predictions require knowledge of causal relationships between observed variables. [sent-10, score-0.295]
5 There are existing asymptotically correct algorithms for learning such relationships from data, possibly with missing values and hidden variables [1][2][3], but these algorithms all assume that every variable is measured in a single study. [sent-11, score-0.409]
6 However, given the increasing availability of large amounts of data, it is often possible to obtain several similar studies that individually measure subsets of the variables a researcher is interested in and together include all such variables. [sent-13, score-0.142]
7 In these cases, if each dataset has overlapping variable(s) with at least one other dataset, e. [sent-18, score-0.18]
8 if two datasets D1 and D2 , which measure variables V1 and V2 , respectively, have at least one variable in common (V1 ∩ V2 = ∅), then we should be able to learn many of the causal relationships between the observed variables using this set of datasets. [sent-20, score-0.486]
9 The existing algorithms, however, cannot in general be directly applied to such cases, since they may require joint observations for variables that are not all measured in a single dataset. [sent-21, score-0.069]
10 While this problem has been discussed in [4] and [5], there are no general, useful algorithms for learning causal relationships from data of this form. [sent-22, score-0.295]
11 A typical response is to concatenate the datasets to form a single common dataset with missing values for the variables that are not measured in each of the original datasets. [sent-23, score-0.255]
12 Statistical matching [6] or multiple imputation [7] procedures may then be used to fill in the missing values by assuming an underlying model (or small class of models), estimating model parameters using the available data, and then using this model to interpolate the 1 missing values. [sent-24, score-0.202]
13 The Structural EM algorithm [8] avoids this problem by iteratively updating the assumed model using the current interpolated dataset and then reestimating values for the missing data to form a new interpolated dataset until the model converges. [sent-28, score-0.27]
14 The Structural EM algorithm is only justified, however, when missing data are missing at random (or indicator variables can be used to make them so) [8]. [sent-29, score-0.241]
15 The pattern of missing values in the concatenated datasets described above is highly structured. [sent-30, score-0.177]
16 While this may not be problematic in practice when doing prediction, it is problematic when learning causal relationships. [sent-32, score-0.224]
17 We present a novel, asymptotically correct algorithm—the Integration of Overlapping Networks (ION) algorithm—for learning causal relationships (or more properly, the complete set of possible causal DAG structures) from data of this form. [sent-34, score-0.662]
18 A directed graph G = V, E is a set of nodes V, which represent variables, and a set of directed edges E connecting distinct nodes. [sent-40, score-0.329]
19 If two nodes are connected by an edge then the nodes are adjacent. [sent-41, score-0.344]
20 For pairs of nodes {X, Y } ⊆ V, X is a parent (child) of Y , if there is a directed edge from X to Y (Y to X) in E. [sent-42, score-0.334]
21 A trail in G is a sequence of nodes such that each consecutive pair of nodes in the sequence is adjacent in G and no node appears more than once in the sequence. [sent-43, score-0.391]
22 A trail is a directed path if every edge between consecutive pairs of nodes points in the same direction. [sent-44, score-0.581]
23 X is an ancestor (descendant) of Y if there is a directed path from X to Y (Y to X). [sent-45, score-0.074]
24 G is a directed acyclic graph (DAG) if for every pair {X, Y } ⊆ V, X is not both an ancestor and a descendent of Y (no directed cycles). [sent-46, score-0.251]
25 A collider (v-structure) is a triple of nodes X, Y, Z such that X and Z are parents of Y . [sent-47, score-0.114]
26 A trail is active given C ⊆ V if (i) for every collider X, Y, Z in the trail either Y ∈ C or some descendant of Y is in C and (ii) no other node in the trail is in C. [sent-48, score-0.668]
27 For disjoint sets of nodes X, Y, and Z, X is d-separated (d-connected) from Y given Z if and only if there are no (at least one) active trails between any X ∈ X and any Y ∈ Y given Z. [sent-49, score-0.216]
28 B is a causal Bayesian network if an edge from X to Y indicates that X is a direct cause of Y relative to V. [sent-52, score-0.4]
29 Most algorithms for causal discovery, or learning causal relationships from nonexperimental data, assume that the distribution over the observed variables P is decomposable according to a DAG G and P is faithful to G. [sent-53, score-0.611]
30 Most causal discovery algorithms return a set of possible DAGs which entail the same d-separations and d-connections, e. [sent-55, score-0.308]
31 The DAGs in this set have the same adjacencies but only some of the same directed edges. [sent-58, score-0.074]
32 The directed edges common to each DAG represent causal relationships that are learned from the data. [sent-59, score-0.417]
33 A partial ancestral graph (PAG) represents the set of DAGs in a particular Markov equivalence class when latent common causes may be present. [sent-61, score-0.172]
34 Edges are of four types: − ◦− ◦− and ◭−◮, where a ◦ indicates either an ◮ or − orientation, ◮, ◮, ◦ bidirected edges indicate the presence of a latent common cause, and fully directed edges (−◮) 2 indicate that the directed edge is present in every DAG, e. [sent-63, score-0.524]
35 For {X, Y } ⊆ V, a possibly active trail between X and Y given Z ⊆ V/{X, Y } is a trail in a PAG between X and Y such that some orientation of ◦’s on edges between consecutive nodes in the trail, to either − or ◮, makes the trail active given Z. [sent-66, score-0.904]
36 3 Integration of Overlapping Networks (ION) algorithm The ION algorithm uses conditional independence information to discover the complete set of PAGs over a set of variables V that are consistent with a set of datasets over subsets of V which have overlapping variables. [sent-67, score-0.435]
37 A standard causal discovery algorithm that checks for latent common causes, such as FCI [1] or GES [3] with latent variable postprocessing steps1 , must first be applied to each of the original datasets to learn these PAGs that will be input to ION. [sent-69, score-0.515]
38 Input : PAGs Gi ∈ G with nodes Vi ⊆ V for i = 1, . [sent-72, score-0.084]
39 , k Output: PAGs Hi ∈ H with nodes Vi = V for i = 1, . [sent-75, score-0.084]
40 ’ from the edge between X and Y in Ai PR ← every combination of removing or not removing ‘? [sent-80, score-0.23]
41 if X and Y are not adjacent in Gi then remove the edge between X and Y , if X is directed into Y in Gi then set the endpoint at Y on the edge between X and Y to ◮. [sent-83, score-0.534]
42 Once these orientations and edge removals are made, the changes to the complete graph are propagated using the rules in [10], which provably make every change that is entailed by the current changes made to the graph. [sent-84, score-0.655]
43 Lines 4-9 find every possibly active trail for every {X, Y } ⊆ V given Z ⊆ V/{X, Y } such that X and Y are d-separated given Z in some Gi ∈ G. [sent-85, score-0.359]
44 The constructed set PC includes all minimal hitting sets of graphical changes, e. [sent-86, score-0.134]
45 unique sets of minimal changes that are not subsets of other sets of changes, which make these paths no longer active. [sent-88, score-0.187]
46 For each minimal hitting set, a new graph is constructed by making the changes in the set and propagating these changes. [sent-89, score-0.315]
47 the graph does not imply a d-separation for some {X, Y } ⊆ V given Z ⊆ V/{X, Y } such that X and Y are d-connected in some Gi ∈ G, then this graph is added to the current set of possible graphs. [sent-92, score-0.098]
48 3 19 attempt to discover any additional PAGs that may be consistent with each Gi ∈ G after deleting edges from PAGs in the current set and propagating the changes. [sent-94, score-0.179]
49 If some pair of nodes {X, Y } ⊆ V that are adjacent in a current PAG are d-connected given ∅ in some Gi ∈ G, then we do not consider sets of edge removals which remove this edge. [sent-95, score-0.382]
50 The ION algorithm is provably sound in the sense that the output PAGs are consistent with every Gi ∈ G, e. [sent-96, score-0.191]
51 Every structure Ai constructed at line 7 provably entails every d-separation entailed by some Gi ∈ G. [sent-104, score-0.167]
52 Such structures are only added to A if they do not entail a d-separation corresponding to a d-connection in some Gi ∈ G. [sent-105, score-0.085]
53 The only changes made (other than changes resulting from propagating other changes which are provably correct by [10]) in lines 10-19 are edge removals, which can only create new d-separations. [sent-106, score-0.542]
54 The ION algorithm is provably complete in the sense that if there is some structure Hi over the variables V that is consistent with every Gi ∈ G, then Hi ∈ H. [sent-108, score-0.275]
55 Let Hi be a PAG over the variables V such that for every pair {X, Y } ⊆ V, if X and Y are d-separated (d-connected) given Z ⊆ V/{X, Y } in some Gi ∈ G, then X and Y are d-separated (d-connected) given Z in Hi . [sent-111, score-0.123]
56 Every change made at line 3 is provably necessary to ensure soundness. [sent-114, score-0.068]
57 At least one graph added to A at line 8 provably has every adjacency (possibly more) in Hi and no non-◦ endpoints on an edge found in Hi that is not also present in Hi . [sent-115, score-0.372]
58 Some sequence of edge removals will provably produce Hi at line 16 and it will be added to the output set since it is consistent with every Gi ∈ G. [sent-116, score-0.435]
59 Finding all minimal hitting sets is an NP-complete problem [11]. [sent-122, score-0.134]
60 In practice, however, we can break the minimal hitting set problem into a sequence of smaller subproblems and use a branch and bound approach that is tractable in many cases and still results in an asymptotically correct algorithm. [sent-124, score-0.259]
61 For each DAG, we then randomly chose two subsets of size 2 or 3 of the nodes in the DAG such that the union of the subsets included all 4 nodes and at least one overlapping variable between the two subsets was present. [sent-129, score-0.553]
62 samples of sizes N = 50, N = 100, N = 500, N = 1000 and N = 2500 from the DAGs for only the variables in each subset. [sent-133, score-0.114]
63 We used both FCI and GES with latent variable postprocessing to 4 5 Edge omissions 4 3 2 1 0 3 2. [sent-134, score-0.254]
64 5 Edge commissions FCI−baseline ION−FCI GES−baseline ION−GES Structural EM 2 1. [sent-135, score-0.102]
65 To evaluate the accuracy of ION, we counted the number of edge omission, edge commision, and orientation errors (◮ instead of −) for each PAG in the ION output set and averaged the results. [sent-138, score-0.595]
66 ION-FCI and ION-GES refer the the performance of ION when the input PAGs are obtained using the FCI algorithm and the GES algorithm with latent variable postprocessing, respectively. [sent-141, score-0.074]
67 For Structural EM, we took each of the datasets over subsets of the nodes in each DAG and formed a concatenated dataset, as described in section 1, which was input to the Structural EM algorithm. [sent-142, score-0.272]
68 sample of sizes N = 50, N = 100, N = 500, N = 1000 and N = 2500 for all of the variables in each DAG and used these datasets as input for the FCI and GES with latent variable postprocessing algorithms, respectively, to obtain a measure for how well these algorithms perform when no data is missing. [sent-146, score-0.343]
69 Almost no edge omission errors are made, but more edge commissions errors are made than any of the other methods and the edge commission errors do not decrease as the sample size increases. [sent-150, score-1.109]
70 When we looked at the results, we found that Structural EM always returned either the complete graph or a graph that was almost complete, indicating that Structural EM is not a reliable method for causal discovery in this scenario where there is a highly structured pattern to the missing data. [sent-151, score-0.502]
71 For the larger sample sizes (where more missing values need to be estimated at each iteration), a single run required several hours in some instances. [sent-153, score-0.165]
72 Random “chains” of nodes were used as the initial models. [sent-155, score-0.084]
73 5 Edge commissions FCI−baseline ION−FCI GES−baseline ION−GES 7 0. [sent-159, score-0.102]
74 1 0 N=50 N=100 N=500 N=1000N=2500 Sample size (a) 3 2 1 1 0 4 0 N=50 N=100 N=500 N=1000N=2500 Sample size (b) N=50 N=100 N=500 N=1000N=2500 Sample size (c) Figure 2: (a) edge omissions, (b) edge comissions, and (c) orientation errors 8 5 4 3 2 5 0. [sent-162, score-0.661]
75 5 Edge commissions FCI−baseline ION−FCI GES−baseline ION−GES 7 0. [sent-164, score-0.102]
76 The ION-FCI and ION-GES methods performed similarly to the FCI-baseline and GESbaseline methods but made slightly more errors and showed slower convergence (due to the missing data). [sent-169, score-0.185]
77 Slightly more edge omission errors were made, but these errors decrease as the sample size increases. [sent-171, score-0.515]
78 Some edge orientation errors were made even for the larger sample sizes. [sent-172, score-0.42]
79 Even if the correct equivalence class is discovered, errors result after comparing the ground truth DAG to every DAG in the equivalence class and averaging. [sent-174, score-0.287]
80 We also note that there are fewer orientation errors for the GES-baseline and ION-GES methods on the two smallest sample sizes than all of the other sample sizes. [sent-175, score-0.323]
81 While this may seem surprising, it is simply a result of the fact that more edge omission errors are made for these cases. [sent-176, score-0.349]
82 samples were generated for random subsets of sizes 2-5 with only 1 variable that is not overlapping between the two subsets; (ii) two i. [sent-180, score-0.251]
83 samples were generated for random subsets of sizes 2-5 with only 2 variables that are not overlapping between the two subsets; (iii) three i. [sent-183, score-0.32]
84 samples were generated for random subsets of sizes 2-5 with only 1 variable that is not overlapping between any pair of subsets. [sent-186, score-0.251]
85 Figures 2, 3, and 4 show edge omission, edge commission, and orientation errors for each of these cases, respectively. [sent-187, score-0.562]
86 We also tested the performance of ION-FCI using a real world dataset measuring IQ and various neuroanatomical and other traits [14]. [sent-189, score-0.108]
87 Figures 5a and 5b show the FCI output PAGs when only the data for each of these subsets of the variables is provided as input, respectively. [sent-191, score-0.175]
88 We also ran FCI on the complete dataset to have a comparison. [sent-193, score-0.095]
89 4 Orientation errors 5 Edge commissions Edge omissions 6 6 0. [sent-196, score-0.337]
90 We note that the graphical structure of the complete PAG (figures 5c and 5d) is the union of the structures shown in figures 5a and 5b. [sent-202, score-0.095]
91 5 Conclusions In practice, researchers are often unable to find or construct a single, complete dataset containing every variable they may be interested in (or doing so is very costly). [sent-204, score-0.149]
92 We thus need some way of integrating information about causal relationships that can be discovered from a collection of datasets with related variables [5]. [sent-205, score-0.444]
93 Standard causal discovery algorithms cannot be used, since they take only a single dataset as input. [sent-206, score-0.317]
94 To address this open problem, we proposed the ION algorithm, an asymptotically correct algorithm for discovering the complete set of causal DAG structures that are consistent with such data. [sent-207, score-0.45]
95 Probably the most significant problem is resolving contradictory information among overlapping variables in different 7 input PAGs, i. [sent-209, score-0.25]
96 X is a parent of Y in one PAG and a child of Y in another PAG, resulting from statistical errors or when the input samples are not identifically distributed. [sent-211, score-0.123]
97 In some of such cases, ION will not discover any PAGs which entail the correct d-separations and d-connections. [sent-215, score-0.107]
98 When performing conditional independence tests or evaluating score functions, statistical errors occur more frequently as the dimensionality of a dataset increases, unless the sample size also increases at an exponential rate (resulting from the so-called curse of dimensionality). [sent-217, score-0.213]
99 Furthermore, while the branch and bound approach described in section 3 is a significant improvement over other methods we tested for computing minimal hitting sets, its memory requirements are still considerable in some instances. [sent-219, score-0.164]
100 A characterization of markov equivalence classes for causal models with latent variables. [sent-274, score-0.318]
wordName wordTfidf (topN-words)
[('ion', 0.382), ('fci', 0.3), ('pags', 0.297), ('dag', 0.241), ('pag', 0.238), ('gi', 0.228), ('causal', 0.224), ('ges', 0.221), ('edge', 0.176), ('trail', 0.17), ('omissions', 0.136), ('overlapping', 0.133), ('hi', 0.13), ('orientation', 0.111), ('structural', 0.103), ('commissions', 0.102), ('dags', 0.102), ('iq', 0.102), ('errors', 0.099), ('em', 0.089), ('missing', 0.086), ('circumference', 0.085), ('trails', 0.085), ('nodes', 0.084), ('hitting', 0.08), ('omission', 0.074), ('directed', 0.074), ('subsets', 0.073), ('propagating', 0.072), ('relationships', 0.071), ('variables', 0.069), ('postprocessing', 0.068), ('removals', 0.068), ('provably', 0.068), ('ai', 0.062), ('changes', 0.06), ('sex', 0.059), ('genotype', 0.059), ('minimal', 0.054), ('endpoint', 0.054), ('every', 0.054), ('datasets', 0.053), ('surface', 0.052), ('collosum', 0.051), ('comissions', 0.051), ('latent', 0.05), ('head', 0.05), ('asymptotically', 0.049), ('graph', 0.049), ('edges', 0.048), ('complete', 0.048), ('structures', 0.047), ('dataset', 0.047), ('active', 0.047), ('discovery', 0.046), ('correct', 0.046), ('sizes', 0.045), ('brain', 0.045), ('interpolated', 0.045), ('entailed', 0.045), ('equivalence', 0.044), ('commission', 0.041), ('baseline', 0.04), ('entail', 0.038), ('concatenated', 0.038), ('consistent', 0.036), ('neuroanatomical', 0.034), ('nonadjacencies', 0.034), ('runtimes', 0.034), ('sample', 0.034), ('possibly', 0.034), ('output', 0.033), ('size', 0.033), ('pc', 0.031), ('adjacent', 0.03), ('collider', 0.03), ('imputation', 0.03), ('pat', 0.03), ('pri', 0.03), ('branch', 0.03), ('causes', 0.029), ('area', 0.028), ('pci', 0.027), ('mcdonnell', 0.027), ('descendant', 0.027), ('glymour', 0.027), ('spirtes', 0.027), ('traits', 0.027), ('body', 0.027), ('discovered', 0.027), ('orientations', 0.027), ('endpoints', 0.025), ('input', 0.024), ('united', 0.024), ('resolving', 0.024), ('remove', 0.024), ('discover', 0.023), ('consecutive', 0.023), ('faithful', 0.023), ('volume', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
Author: David Danks, Clark Glymour, Robert E. Tillman
Abstract: In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. We present a novel, asymptotically correct procedure that discovers a minimal equivalence class of causal DAG structures using local independence information from distributed data of this form and evaluate its performance using synthetic and real-world data against causal discovery algorithms for single datasets and applying Structural EM, a heuristic DAG structure learning procedure for data with missing values, to the concatenated data.
Author: Jean-philippe Pellet, AndrĂŠElisseeff
Abstract: Causal structure-discovery techniques usually assume that all causes of more than one variable are observed. This is the so-called causal sufficiency assumption. In practice, it is untestable, and often violated. In this paper, we present an efficient causal structure-learning algorithm, suited for causally insufficient data. Similar to algorithms such as IC* and FCI, the proposed approach drops the causal sufficiency assumption and learns a structure that indicates (potential) latent causes for pairs of observed variables. Assuming a constant local density of the data-generating graph, our algorithm makes a quadratic number of conditionalindependence tests w.r.t. the number of variables. We show with experiments that our algorithm is comparable to the state-of-the-art FCI algorithm in accuracy, while being several orders of magnitude faster on large problems. We conclude that MBCS* makes a new range of causally insufficient problems computationally tractable. Keywords: Graphical Models, Structure Learning, Causal Inference. 1 Introduction: Task Definition & Related Work The statistical definition of causality pioneered by Pearl (2000) and Spirtes et al. (2001) has shed new light on how to detect causation. Central in this approach is the automated detection of causeeffect relationships using observational (i.e., non-experimental) data. This can be a necessary task, as in many situations, performing randomized controlled experiments to unveil causation can be impossible, unethical , or too costly. When the analysis deals with variables that cannot be manipulated, being able to learn from data collected by observing the running system is the only possibility. It turns out that learning the full causal structure of a set of variables is, in its most general form , impossible. If we suppose that the
3 0.17304684 153 nips-2008-Nonlinear causal discovery with additive noise models
Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf
Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
4 0.11399356 152 nips-2008-Non-stationary dynamic Bayesian networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: A principled mechanism for identifying conditional dependencies in time-series data is provided through structure learning of dynamic Bayesian networks (DBNs). An important assumption of DBN structure learning is that the data are generated by a stationary process—an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical models called non-stationary dynamic Bayesian networks, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. 1
5 0.10780049 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
6 0.085501194 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
7 0.07991448 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
8 0.073862478 214 nips-2008-Sparse Online Learning via Truncated Gradient
9 0.063882142 58 nips-2008-Dependence of Orientation Tuning on Recurrent Excitation and Inhibition in a Network Model of V1
10 0.062217198 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues
11 0.059106845 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
12 0.055532254 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
13 0.051592026 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
14 0.049764626 211 nips-2008-Simple Local Models for Complex Dynamical Systems
15 0.049417049 69 nips-2008-Efficient Exact Inference in Planar Ising Models
16 0.04837773 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
17 0.047604416 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
18 0.045497015 65 nips-2008-Domain Adaptation with Multiple Sources
19 0.045188434 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
20 0.044204477 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
topicId topicWeight
[(0, -0.144), (1, -0.01), (2, 0.023), (3, 0.034), (4, 0.059), (5, -0.1), (6, 0.029), (7, 0.07), (8, 0.047), (9, -0.064), (10, -0.058), (11, 0.123), (12, -0.003), (13, -0.129), (14, 0.121), (15, -0.104), (16, 0.098), (17, -0.079), (18, -0.158), (19, -0.049), (20, 0.047), (21, 0.175), (22, 0.138), (23, -0.061), (24, 0.355), (25, -0.216), (26, -0.173), (27, -0.091), (28, -0.095), (29, 0.184), (30, -0.021), (31, -0.003), (32, 0.066), (33, -0.191), (34, 0.087), (35, -0.054), (36, -0.055), (37, -0.053), (38, -0.063), (39, -0.009), (40, -0.048), (41, 0.029), (42, 0.011), (43, 0.101), (44, -0.007), (45, 0.019), (46, -0.005), (47, -0.003), (48, -0.007), (49, -0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.96357656 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
Author: David Danks, Clark Glymour, Robert E. Tillman
Abstract: In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. We present a novel, asymptotically correct procedure that discovers a minimal equivalence class of causal DAG structures using local independence information from distributed data of this form and evaluate its performance using synthetic and real-world data against causal discovery algorithms for single datasets and applying Structural EM, a heuristic DAG structure learning procedure for data with missing values, to the concatenated data.
2 0.93220794 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
Author: Jean-philippe Pellet, AndrĂŠElisseeff
Abstract: Causal structure-discovery techniques usually assume that all causes of more than one variable are observed. This is the so-called causal sufficiency assumption. In practice, it is untestable, and often violated. In this paper, we present an efficient causal structure-learning algorithm, suited for causally insufficient data. Similar to algorithms such as IC* and FCI, the proposed approach drops the causal sufficiency assumption and learns a structure that indicates (potential) latent causes for pairs of observed variables. Assuming a constant local density of the data-generating graph, our algorithm makes a quadratic number of conditionalindependence tests w.r.t. the number of variables. We show with experiments that our algorithm is comparable to the state-of-the-art FCI algorithm in accuracy, while being several orders of magnitude faster on large problems. We conclude that MBCS* makes a new range of causally insufficient problems computationally tractable. Keywords: Graphical Models, Structure Learning, Causal Inference. 1 Introduction: Task Definition & Related Work The statistical definition of causality pioneered by Pearl (2000) and Spirtes et al. (2001) has shed new light on how to detect causation. Central in this approach is the automated detection of causeeffect relationships using observational (i.e., non-experimental) data. This can be a necessary task, as in many situations, performing randomized controlled experiments to unveil causation can be impossible, unethical , or too costly. When the analysis deals with variables that cannot be manipulated, being able to learn from data collected by observing the running system is the only possibility. It turns out that learning the full causal structure of a set of variables is, in its most general form , impossible. If we suppose that the
3 0.68874002 153 nips-2008-Nonlinear causal discovery with additive noise models
Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf
Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
4 0.41334873 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
Author: Paolo Frasconi, Andrea Passerini
Abstract: Metal binding is important for the structural and functional characterization of proteins. Previous prediction efforts have only focused on bonding state, i.e. deciding which protein residues act as metal ligands in some binding site. Identifying the geometry of metal-binding sites, i.e. deciding which residues are jointly involved in the coordination of a metal ion is a new prediction problem that has been never attempted before from protein sequence alone. In this paper, we formulate it in the framework of learning with structured outputs. Our solution relies on the fact that, from a graph theoretical perspective, metal binding has the algebraic properties of a matroid, enabling the application of greedy algorithms for learning structured outputs. On a data set of 199 non-redundant metalloproteins, we obtained precision/recall levels of 75%/46% correct ligand-ion assignments, which improves to 88%/88% in the setting where the metal binding state is known. 1
5 0.41135651 152 nips-2008-Non-stationary dynamic Bayesian networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: A principled mechanism for identifying conditional dependencies in time-series data is provided through structure learning of dynamic Bayesian networks (DBNs). An important assumption of DBN structure learning is that the data are generated by a stationary process—an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical models called non-stationary dynamic Bayesian networks, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. 1
6 0.39632311 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
7 0.36676982 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues
8 0.35808653 211 nips-2008-Simple Local Models for Complex Dynamical Systems
9 0.29264826 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring
10 0.2597248 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
11 0.24555735 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
12 0.23486058 69 nips-2008-Efficient Exact Inference in Planar Ising Models
13 0.23217826 134 nips-2008-Mixed Membership Stochastic Blockmodels
14 0.23101446 94 nips-2008-Goal-directed decision making in prefrontal cortex: a computational framework
15 0.22897092 71 nips-2008-Efficient Sampling for Gaussian Process Inference using Control Variables
16 0.22321014 225 nips-2008-Supervised Bipartite Graph Inference
17 0.21512401 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
18 0.21470931 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
19 0.2144545 74 nips-2008-Estimating the Location and Orientation of Complex, Correlated Neural Activity using MEG
20 0.21437649 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
topicId topicWeight
[(6, 0.081), (7, 0.06), (12, 0.028), (21, 0.032), (28, 0.183), (57, 0.039), (59, 0.018), (63, 0.022), (64, 0.345), (77, 0.034), (78, 0.023), (83, 0.036)]
simIndex simValue paperId paperTitle
1 0.80989611 105 nips-2008-Improving on Expectation Propagation
Author: Manfred Opper, Ulrich Paquet, Ole Winther
Abstract: A series of corrections is developed for the fixed points of Expectation Propagation (EP), which is one of the most popular methods for approximate probabilistic inference. These corrections can lead to improvements of the inference approximation or serve as a sanity check, indicating when EP yields unrealiable results.
same-paper 2 0.75724596 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
Author: David Danks, Clark Glymour, Robert E. Tillman
Abstract: In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. We present a novel, asymptotically correct procedure that discovers a minimal equivalence class of causal DAG structures using local independence information from distributed data of this form and evaluate its performance using synthetic and real-world data against causal discovery algorithms for single datasets and applying Structural EM, a heuristic DAG structure learning procedure for data with missing values, to the concatenated data.
3 0.7283842 123 nips-2008-Linear Classification and Selective Sampling Under Low Noise Conditions
Author: Giovanni Cavallanti, Nicolò Cesa-bianchi, Claudio Gentile
Abstract: We provide a new analysis of an efficient margin-based algorithm for selective sampling in classification problems. Using the so-called Tsybakov low noise condition to parametrize the instance distribution, we show bounds on the convergence rate to the Bayes risk of both the fully supervised and the selective sampling versions of the basic algorithm. Our analysis reveals that, excluding logarithmic factors, the average risk of the selective sampler converges to the Bayes risk at rate N −(1+α)(2+α)/2(3+α) where N denotes the number of √ queried labels, and α > 0 is the exponent in the low noise condition. For all α > 3 − 1 ≈ 0.73 this convergence rate is asymptotically faster than the rate N −(1+α)/(2+α) achieved by the fully supervised version of the same classifier, which queries all labels, and for α → ∞ the two rates exhibit an exponential gap. Experiments on textual data reveal that simple variants of the proposed selective sampler perform much better than popular and similarly efficient competitors. 1
4 0.7005108 151 nips-2008-Non-parametric Regression Between Manifolds
Author: Florian Steinke, Matthias Hein
Abstract: This paper discusses non-parametric regression between Riemannian manifolds. This learning problem arises frequently in many application areas ranging from signal processing, computer vision, over robotics to computer graphics. We present a new algorithmic scheme for the solution of this general learning problem based on regularized empirical risk minimization. The regularization functional takes into account the geometry of input and output manifold, and we show that it implements a prior which is particularly natural. Moreover, we demonstrate that our algorithm performs well in a difficult surface registration problem. 1
5 0.53624886 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
Author: Jean-philippe Pellet, AndrĂŠElisseeff
Abstract: Causal structure-discovery techniques usually assume that all causes of more than one variable are observed. This is the so-called causal sufficiency assumption. In practice, it is untestable, and often violated. In this paper, we present an efficient causal structure-learning algorithm, suited for causally insufficient data. Similar to algorithms such as IC* and FCI, the proposed approach drops the causal sufficiency assumption and learns a structure that indicates (potential) latent causes for pairs of observed variables. Assuming a constant local density of the data-generating graph, our algorithm makes a quadratic number of conditionalindependence tests w.r.t. the number of variables. We show with experiments that our algorithm is comparable to the state-of-the-art FCI algorithm in accuracy, while being several orders of magnitude faster on large problems. We conclude that MBCS* makes a new range of causally insufficient problems computationally tractable. Keywords: Graphical Models, Structure Learning, Causal Inference. 1 Introduction: Task Definition & Related Work The statistical definition of causality pioneered by Pearl (2000) and Spirtes et al. (2001) has shed new light on how to detect causation. Central in this approach is the automated detection of causeeffect relationships using observational (i.e., non-experimental) data. This can be a necessary task, as in many situations, performing randomized controlled experiments to unveil causation can be impossible, unethical , or too costly. When the analysis deals with variables that cannot be manipulated, being able to learn from data collected by observing the running system is the only possibility. It turns out that learning the full causal structure of a set of variables is, in its most general form , impossible. If we suppose that the
6 0.52954602 153 nips-2008-Nonlinear causal discovery with additive noise models
8 0.52813321 48 nips-2008-Clustering via LP-based Stabilities
9 0.52756208 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
10 0.52744991 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost
11 0.52726692 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
12 0.52653182 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions
13 0.52626437 202 nips-2008-Robust Regression and Lasso
14 0.52616644 196 nips-2008-Relative Margin Machines
15 0.52509707 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
16 0.52495444 231 nips-2008-Temporal Dynamics of Cognitive Control
17 0.524737 62 nips-2008-Differentiable Sparse Coding
18 0.5245508 101 nips-2008-Human Active Learning
19 0.52404201 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
20 0.5235247 106 nips-2008-Inferring rankings under constrained sensing