nips nips2008 nips2008-153 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf
Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Nonlinear causal discovery with additive noise models Patrik O. [sent-1, score-0.659]
2 For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. [sent-3, score-0.688]
3 In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. [sent-4, score-0.495]
4 In this contribution we show that the basic linear framework can be generalized to nonlinear models. [sent-5, score-0.137]
5 In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. [sent-6, score-0.422]
6 1 Introduction Causal relationships are fundamental to science because they enable predictions of the consequences of actions [1]. [sent-8, score-0.091]
7 While controlled randomized experiments constitute the primary tool for identifying causal relationships, such experiments are in many cases either unethical, too expensive, or technically impossible. [sent-9, score-0.371]
8 The development of causal discovery methods to infer causal relationships from uncontrolled data constitutes an important current research topic [1, 2, 3, 4, 5, 6, 7, 8]. [sent-10, score-1.009]
9 If the observed data is continuous-valued, methods based on linear causal models (aka structural equation models) are commonly applied [1, 2, 9]. [sent-11, score-0.469]
10 This is not necessarily because the true causal relationships are really believed to be linear, but rather it reflects the fact that linear models are well understood and easy to work with. [sent-12, score-0.523]
11 For continuous variables, the independence tests often assume linear models with additive Gaussian noise [2]. [sent-14, score-0.336]
12 Recently, however, it has been shown that for linear models, non-Gaussianity in the data can actually aid in distinguishing the causal directions and allow one to uniquely identify the generating graph under favourable conditions [7]. [sent-15, score-0.486]
13 Thus the practical case of non-Gaussian data which long was considered a nuisance turned out to be helpful in the causal discovery setting. [sent-16, score-0.462]
14 In this contribution we show that nonlinearities can play a role quite similar to that of nonGaussianity: When causal relationships are nonlinear it typically helps break the symmetry between the observed variables and allows the identification of causal directions. [sent-17, score-1.1]
15 As Friedman and Nachman have pointed out [10], non-invertible functional relationships between the observed variables can provide clues to the generating causal model. [sent-18, score-0.638]
16 However, we show that the phenomenon is much more general; for nonlinear models with additive noise almost any nonlinearities (invertible or not) will typically yield identifiable models. [sent-19, score-0.352]
17 We describe a practical method for inferring the generating model from a sample of data vectors in Section 4, and show its utility in simulations and on real data (Section 5). [sent-22, score-0.091]
18 2 Model definition We assume that the observed data has been generated in the following way: Each observed variable xi is associated with a node i in a directed acyclic graph G, and the value of xi is obtained as a function of its parents in G, plus independent additive noise ni , i. [sent-23, score-0.431]
19 Our data then consists of a number of vectors x sampled independently, each using G, the same functions fi , and the ni sampled independently from the same densities pni (ni ). [sent-26, score-0.378]
20 Note that this model includes the special case when all the fi are linear and all the pni are Gaussian, yielding the standard linear–Gaussian model family [2, 3, 9]. [sent-27, score-0.286]
21 When the functions are linear but the densities pni are non-Gaussian we obtain the linear–non-Gaussian models described in [7]. [sent-28, score-0.289]
22 The goal of causal discovery is, given the data vectors, to infer as much as possible about the generating mechanism; in particular, we seek to infer the generating graph G. [sent-29, score-0.678]
23 In the next section we discuss the prospects of this task in the theoretical case where the joint distribution px (x) of the observed data can be estimated exactly. [sent-30, score-0.303]
24 Denoting the two variables x and y, we are considering the generative model y := f (x) + n where x and n are c 0. [sent-35, score-0.084]
25 020 0 0 1 2 x 33 p(x) xvals[theinds] e noise f(x) ! [sent-44, score-0.093]
26 1 0 0 x 1 2 3 3 xvals[theinds] Figure 1: Identification of causal direction based on constancy of conditionals. [sent-65, score-0.371]
27 (g) shows an example of a joint density p(x y) generated by a causal model x y with y := f (x) + n where f is nonlinear, the supports of the densities px (x) and pn (n) are compact regions, and the function f is constant on each connected component of the support of px . [sent-67, score-1.281]
28 The support of the joint density is now given by the two gray squares. [sent-68, score-0.078]
29 Note that the input distribution px , the noise distribution pn and f can in fact be chosen such that the joint density is symmetrical with respect to the two variables, i. [sent-69, score-0.64]
30 p(x y) = p(y x), making it obvious that there will also be a valid backward model. [sent-71, score-0.332]
31 In panel (a) we plot the joint density p(x, y) of the observed variables, for the linear case of f (x) = x. [sent-73, score-0.213]
32 As a trivial consequence of the model, the conditional density p(y | x) has identical shape for all values of x and is simply shifted by the function f (x); this is illustrated in panel (b). [sent-74, score-0.107]
33 In general, there is no reason to believe that this relationship would also hold for the conditionals p(x | y) for different values of y but, as is well known, for the linear–Gaussian model this is actually the case, as illustrated in panel (c). [sent-75, score-0.138]
34 Panels (d-f) show the corresponding joint and conditional densities for the corresponding model with a nonlinear function f (x) = x + x3 . [sent-76, score-0.273]
35 Notice how the conditionals p(x | y) look different for different values of y, indicating that a reverse causal model of the form x := g(y) + n (with y and n statistically ˜ ˜ independent) would not be able to fit the joint density. [sent-77, score-0.642]
36 To see the latter, we first show that there exist models other than the linear–Gaussian and the independent case which admit both a forward x → y and a backward x ← y model. [sent-79, score-0.551]
37 Panel (g) of Figure 1 presents a nonlinear functional model with additive non-Gaussian noise and non-Gaussian input distributions that nevertheless admits a backward model. [sent-80, score-0.677]
38 Note that the example of panel (g) in Figure 1 is somewhat artificial: p has compact support, and x, y are independent inside the connected components of the support. [sent-82, score-0.065]
39 Roughly speaking, the nonlinearity of f does not matter since it occurs where p is zero — an artifical situation which is avoided by the requirement that from now on, we will assume that all probability densities are strictly positive. [sent-83, score-0.146]
40 In this case, the following theorem shows that for generic choices of f , px (x), and pn (n), there exists no backward model. [sent-85, score-0.835]
41 Theorem 1 Let the joint probability density of x and y be given by p(x, y) = pn (y − f (x))px (x) , (2) where pn , px are probability densities on R. [sent-86, score-0.883]
42 If there is a backward model of the same form, i. [sent-87, score-0.368]
43 Moreover, if for a fixed pair (f, ν) there exists y ∈ R such that ν (y − f (x))f (x) = 0 for all but a countable set of points x ∈ R, the set of all px for which p has a backward model is contained in a 3-dimensional affine space. [sent-90, score-0.598]
44 Loosely speaking, the statement that the differential equation for ξ has a 3-dimensional space of solutions (while a priori, the space of all possible log-marginals ξ is infinite dimensional) amounts to saying that in the generic case, our forward model cannot be inverted. [sent-91, score-0.212]
45 A simple corollary is that if both the marginal density px (x) and the noise density pn (y − f (x)) are Gaussian then the existence of a backward model implies linearity of f : Corollary 1 Assume that ν =ξ = 0 everywhere. [sent-92, score-1.053]
46 Finally, we note that even when f is linear and pn and px are non-Gaussian, although a linear backward model has previously been ruled out [7], there exist special cases where there is a nonlinear backward model with independent additive noise. [sent-95, score-1.451]
47 One such case is when f (x) = −x and px and pn are Gumbel distributions: px (x) = exp(−x − exp(−x)) and pn (n) = exp(−n − exp(−n)). [sent-96, score-0.938]
48 Then taking py (y) = exp(−y − 2 log(1 + exp(−y))), pn (˜ ) = exp(−2˜ − exp(−˜ )) and n n ˜ n g(y) = log(1 + exp(−y)) one obtains p(x, y) = pn (y − f (x))px (x) = pn (x − g(y))py (y). [sent-97, score-0.785]
49 4 Model estimation Section 3 established for the two-variable case that given knowledge of the exact densities, the true model is (in the generic case) identifiable. [sent-100, score-0.07]
50 We now consider practical estimation methods which infer the generating graph from sample data. [sent-101, score-0.108]
51 Again, we begin by considering the case of two observed scalar variables x and y. [sent-102, score-0.085]
52 If they are not, we continue as described in the following manner: We test whether a model y := f (x) + n is consistent ˆ with the data, simply by doing a nonlinear regression of y on x (to get an estimate f of f ), calculating ˆ(x), and testing whether n is independent of x. [sent-104, score-0.236]
53 If so, we the corresponding residuals n = y − f ˆ ˆ accept the model y := f (x) + n; if not, we reject it. [sent-105, score-0.432]
54 We then similarly test whether the reverse model x := g(y) + n fits the data. [sent-106, score-0.17]
55 First, if x and y are deemed mutually independent we infer that there is no causal relationship between the two, and no further analysis is performed. [sent-108, score-0.452]
56 On the other hand, if they are dependent but both directional models are accepted we conclude that either model may be correct but we cannot infer it from the data. [sent-109, score-0.209]
57 A more positive result is when we are able to reject one of the directions and (tentatively) accept the other. [sent-110, score-0.105]
58 Finally, it may be the case that neither direction is consistent with the data, in which case we conclude that the generating mechanism is more complex and cannot be described using this model. [sent-111, score-0.1]
59 On the other hand, if none of the independence tests are rejected, Gi is consistent with the data. [sent-114, score-0.151]
60 Furthermore, the above algorithm returns all DAGs consistent with the data, including all those for which consistent subgraphs exist. [sent-116, score-0.09]
61 The selection of the nonlinear regressor and of the particular independence tests are not constrained. [sent-118, score-0.21]
62 Any prior information on the types of functional relationships or the distributions of the noise should optimally be utilized here. [sent-119, score-0.22]
63 In our implementation, we perform the regression using Gaussian Processes [12] and the independence tests using kernel methods [13]. [sent-120, score-0.157]
64 Note that one must take care to avoid overfitting, as overfitting may lead one to falsely accept models which should be rejected. [sent-121, score-0.084]
65 5 Experiments To show the ability of our method to find the correct model when all the assumptions hold we have applied our implementation to a variety of simulated and real data. [sent-122, score-0.101]
66 1 In principle, any regression method can be used; we have verified that our results do not depend significantly on the choice of the regression method by comparing with ν-SVR [15] and with thinplate spline kernel regression [16]. [sent-124, score-0.183]
67 For the independence test, we implemented the HSIC [13] with a Gaussian kernel, where we used the gamma distribution as an approximation for the distribution of the HSIC under the null hypothesis of independence in order to calculate the p-value of the test result. [sent-125, score-0.15]
68 We simulated data using the model y = x + bx3 + n; the random variables x and n were sampled from a Gaussian distribution and their absolute values were raised to the power q while keeping the original sign. [sent-128, score-0.084]
69 q=1 b=0 1 correct reverse paccept paccept 1 0 0 0. [sent-131, score-0.357]
70 The parameter b controls the strength of the nonlinearity of the function, b = 0 corresponding to the linear case. [sent-134, score-0.082]
71 We used 300 (x, y) samples for each trial and used a significance level of 2% for rejecting the null hypothesis of independence of residuals and cause. [sent-136, score-0.38]
72 By plotting the acceptance probability of the correct and the reverse model as a function of non-Gaussianity we can see that when the distributions are sufficiently non-Gaussian the method is able to infer the correct causal direction. [sent-139, score-0.755]
73 Then, in panel (b) we similarly demonstrate that we can identify the correct direction for the Gaussian marginal and Gaussian noise model when the functional relationship is sufficiently nonlinear. [sent-140, score-0.295]
74 We also did experiments for 4 variables w, x, y and z with a diamond-like causal structure. [sent-142, score-0.419]
75 We took w ∼ U (−3, 3), x = w2 + nx with nx ∼ U (−1, 1), y = 4 |w|+ny with ny ∼ U (−1, 1), z = 2 sin x+2 sin y+nz with nz ∼ U (−1, 1). [sent-143, score-0.13]
76 The simplest DAG that was consistent with the data (with significance level 2% for each test) turned out to be precisely the true causal structure. [sent-145, score-0.416]
77 The first dataset, the “Old Faithful” dataset [17] contains data about the duration of an eruption and the time interval between subsequent eruptions of the Old Faithful geyser in Yellowstone National Park, USA. [sent-150, score-0.384]
78 5 for the (forward) model “current duration causes next interval length” and a p-value of 4. [sent-152, score-0.444]
79 4 × 10−9 for the (backward) model “next interval length causes current duration”. [sent-153, score-0.396]
80 Thus, we accept the model where the time interval between the current and the next eruption is a function of the duration of the current eruption, but reject the reverse model. [sent-154, score-0.67]
81 Figure 3 illustrates the data, the forward and backward fit and the residuals for both fits. [sent-156, score-0.765]
82 Note that for the forward model, the residuals seem to be independent of the duration, whereas for the backward model, the residuals are clearly dependent on the interval length. [sent-157, score-1.151]
83 Time-shifting the data by one time step, we obtain for the (forward) model “current interval length causes next duration” a p-value smaller than 10−15 and for the (backward) model “next duration causes current interval length” we get a p-value of 1. [sent-158, score-0.84]
84 Hence, our simple nonlinear model with independent additive noise is not consistent with the data in either direction. [sent-160, score-0.354]
85 The second dataset, the “Abalone” dataset from the UCI ML repository [18], contains measurements of the number of rings in the shell of abalone (a group of shellfish), which indicate their age, and the length of the shell. [sent-161, score-0.317]
86 The correct model “age causes length” leads to a p-value of 0. [sent-163, score-0.257]
87 2 0 0 10 20 0 10 20 20 0 0 30 rings (b) 20 10 −0. [sent-167, score-0.138]
88 5 length 1 Figure 4: Abalone data: (a) forward fit corresponding to “age (rings) causes length”; (b) residuals for forward fit; (c) backward fit corresponding to “length causes age (rings)”; (d) residuals for backward fit. [sent-175, score-1.996]
89 0 1000 2000 altitude 0 −1 −2 0 3000 (b) 1000 2000 altitude 2000 1000 0 −10 3000 (c) 400 residuals of (c) 5 3000 1 altitude 10 −5 0 (a) 2 residuals of (a) temperature 15 0 10 temperature 200 0 −200 −400 −10 20 (d) 0 10 temperature 20 Figure 5: Altitude–temperature data. [sent-176, score-1.593]
90 (a) forward fit corresponding to “altitude causes temperature”; (b) residuals for forward fit; (c) backward fit corresponding to “temperature causes altitude”; (d) residuals for backward fit. [sent-177, score-1.842]
91 Note that our method favors the correct direction although the assumption of independent additive noise is only approximately correct here; indeed, the variance of the length is dependent on age. [sent-180, score-0.376]
92 Finally, we assay the method on a simple example involving two observed variables: The altitude above sea level (in meters) and the local yearly average outdoor temperature in centigrade, for 349 weather stations in Germany, collected over the time period of 1961–1990 [19]. [sent-181, score-0.374]
93 The correct model “altitude causes temperature” leads to p = 0. [sent-182, score-0.257]
94 017, while “temperature causes altitude” can clearly be rejected (p = 8 × 10−15 ), in agreement with common understanding of causality in this case. [sent-183, score-0.244]
95 6 Conclusions In this paper, we have shown that the linear–non-Gaussian causal discovery framework can be generalized to admit nonlinear functional dependencies as long as the noise on the variables remains additive. [sent-185, score-0.792]
96 In this approach nonlinear relationships are in fact helpful rather than a hindrance, as they tend to break the symmetry between the variables and allow the correct causal directions to be identified. [sent-186, score-0.706]
97 Although there exist special cases which admit reverse models we have shown that in the generic case the model is identifiable. [sent-187, score-0.281]
98 A Proof of Theorem 1 Set π(x, y) := log p(x, y) = ν(y − f (x)) + ξ(x) , (5) and ν := log pn , η := log py . [sent-195, score-0.307]
99 Given fixed f and ν, the set of all ξ admitting a backward model is contained in this subspace. [sent-212, score-0.368]
100 A linear non-Gaussian acyclic model for a causal discovery. [sent-270, score-0.499]
wordName wordTfidf (topN-words)
[('causal', 0.371), ('backward', 0.332), ('residuals', 0.291), ('pn', 0.239), ('px', 0.23), ('altitude', 0.207), ('duration', 0.157), ('causes', 0.156), ('forward', 0.142), ('rings', 0.138), ('reverse', 0.134), ('pni', 0.131), ('temperature', 0.13), ('theinds', 0.105), ('nonlinear', 0.104), ('ni', 0.1), ('densities', 0.097), ('interval', 0.095), ('noise', 0.093), ('relationships', 0.091), ('discovery', 0.091), ('dag', 0.089), ('glymour', 0.084), ('eruption', 0.079), ('janzing', 0.079), ('paccept', 0.079), ('length', 0.077), ('age', 0.077), ('additive', 0.076), ('identi', 0.074), ('faithful', 0.071), ('py', 0.068), ('panel', 0.065), ('correct', 0.065), ('causation', 0.063), ('independence', 0.061), ('mpi', 0.061), ('acyclic', 0.059), ('dags', 0.059), ('accept', 0.056), ('abalone', 0.056), ('generating', 0.055), ('old', 0.055), ('germany', 0.055), ('infer', 0.053), ('geyser', 0.053), ('hoyer', 0.053), ('xpa', 0.053), ('xvals', 0.053), ('yvals', 0.053), ('bingen', 0.051), ('nonlinearities', 0.051), ('regression', 0.051), ('fi', 0.05), ('admit', 0.049), ('nonlinearity', 0.049), ('rejected', 0.049), ('reject', 0.049), ('variables', 0.048), ('gaussian', 0.047), ('cybernetics', 0.046), ('gpml', 0.046), ('nz', 0.046), ('shell', 0.046), ('tests', 0.045), ('consistent', 0.045), ('gi', 0.042), ('finland', 0.042), ('nx', 0.042), ('density', 0.042), ('causality', 0.039), ('sch', 0.039), ('corollary', 0.039), ('conditionals', 0.037), ('cooper', 0.037), ('hsic', 0.037), ('observed', 0.037), ('model', 0.036), ('joint', 0.036), ('functional', 0.036), ('helsinki', 0.035), ('sun', 0.034), ('exp', 0.034), ('generic', 0.034), ('linear', 0.033), ('current', 0.032), ('principle', 0.032), ('acceptance', 0.031), ('invertible', 0.031), ('biological', 0.03), ('spline', 0.03), ('concern', 0.029), ('parents', 0.029), ('models', 0.028), ('mutually', 0.028), ('null', 0.028), ('statistically', 0.028), ('distinguishing', 0.027), ('accepted', 0.027), ('symmetry', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 153 nips-2008-Nonlinear causal discovery with additive noise models
Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf
Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
2 0.17304684 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
Author: David Danks, Clark Glymour, Robert E. Tillman
Abstract: In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. We present a novel, asymptotically correct procedure that discovers a minimal equivalence class of causal DAG structures using local independence information from distributed data of this form and evaluate its performance using synthetic and real-world data against causal discovery algorithms for single datasets and applying Structural EM, a heuristic DAG structure learning procedure for data with missing values, to the concatenated data.
3 0.17035225 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
Author: Tong Zhang
Abstract: Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
4 0.13753437 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
Author: Jean-philippe Pellet, AndrĂŠElisseeff
Abstract: Causal structure-discovery techniques usually assume that all causes of more than one variable are observed. This is the so-called causal sufficiency assumption. In practice, it is untestable, and often violated. In this paper, we present an efficient causal structure-learning algorithm, suited for causally insufficient data. Similar to algorithms such as IC* and FCI, the proposed approach drops the causal sufficiency assumption and learns a structure that indicates (potential) latent causes for pairs of observed variables. Assuming a constant local density of the data-generating graph, our algorithm makes a quadratic number of conditionalindependence tests w.r.t. the number of variables. We show with experiments that our algorithm is comparable to the state-of-the-art FCI algorithm in accuracy, while being several orders of magnitude faster on large problems. We conclude that MBCS* makes a new range of causally insufficient problems computationally tractable. Keywords: Graphical Models, Structure Learning, Causal Inference. 1 Introduction: Task Definition & Related Work The statistical definition of causality pioneered by Pearl (2000) and Spirtes et al. (2001) has shed new light on how to detect causation. Central in this approach is the automated detection of causeeffect relationships using observational (i.e., non-experimental) data. This can be a necessary task, as in many situations, performing randomized controlled experiments to unveil causation can be impossible, unethical , or too costly. When the analysis deals with variables that cannot be manipulated, being able to learn from data collected by observing the running system is the only possibility. It turns out that learning the full causal structure of a set of variables is, in its most general form , impossible. If we suppose that the
5 0.11154523 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
Author: Steven J. Phillips, Miroslav Dudík
Abstract: We apply robust Bayesian decision theory to improve both generative and discriminative learners under bias in class proportions in labeled training data, when the true class proportions are unknown. For the generative case, we derive an entropybased weighting that maximizes expected log likelihood under the worst-case true class proportions. For the discriminative case, we derive a multinomial logistic model that minimizes worst-case conditional log loss. We apply our theory to the modeling of species geographic distributions from presence data, an extreme case of labeling bias since there is no absence data. On a benchmark dataset, we find that entropy-based weighting offers an improvement over constant estimates of class proportions, consistently reducing log loss on unbiased test data. 1
6 0.097792819 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues
7 0.071139589 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks
8 0.068471678 138 nips-2008-Modeling human function learning with Gaussian processes
9 0.062259793 217 nips-2008-Sparsity of SVMs that use the epsilon-insensitive loss
10 0.061641362 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
11 0.060346343 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
12 0.057539929 12 nips-2008-Accelerating Bayesian Inference over Nonlinear Differential Equations with Gaussian Processes
13 0.054714307 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
14 0.054090839 112 nips-2008-Kernel Measures of Independence for non-iid Data
15 0.05330053 76 nips-2008-Estimation of Information Theoretic Measures for Continuous Random Variables
16 0.052341942 101 nips-2008-Human Active Learning
17 0.052236777 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
18 0.051570799 202 nips-2008-Robust Regression and Lasso
19 0.051311363 24 nips-2008-An improved estimator of Variance Explained in the presence of noise
20 0.050658241 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints
topicId topicWeight
[(0, -0.168), (1, 0.003), (2, 0.014), (3, 0.052), (4, 0.069), (5, -0.056), (6, -0.018), (7, 0.085), (8, 0.051), (9, 0.03), (10, -0.003), (11, 0.053), (12, 0.002), (13, -0.158), (14, 0.002), (15, -0.048), (16, 0.063), (17, -0.029), (18, -0.103), (19, 0.017), (20, -0.009), (21, 0.209), (22, 0.122), (23, -0.081), (24, 0.323), (25, -0.173), (26, -0.03), (27, -0.049), (28, -0.059), (29, 0.075), (30, 0.025), (31, -0.094), (32, 0.068), (33, 0.042), (34, 0.009), (35, 0.001), (36, 0.024), (37, -0.075), (38, 0.055), (39, 0.001), (40, 0.106), (41, -0.069), (42, 0.024), (43, 0.045), (44, -0.07), (45, -0.002), (46, -0.024), (47, -0.146), (48, 0.024), (49, 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.93288362 153 nips-2008-Nonlinear causal discovery with additive noise models
Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf
Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
2 0.7738564 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
Author: Jean-philippe Pellet, AndrĂŠElisseeff
Abstract: Causal structure-discovery techniques usually assume that all causes of more than one variable are observed. This is the so-called causal sufficiency assumption. In practice, it is untestable, and often violated. In this paper, we present an efficient causal structure-learning algorithm, suited for causally insufficient data. Similar to algorithms such as IC* and FCI, the proposed approach drops the causal sufficiency assumption and learns a structure that indicates (potential) latent causes for pairs of observed variables. Assuming a constant local density of the data-generating graph, our algorithm makes a quadratic number of conditionalindependence tests w.r.t. the number of variables. We show with experiments that our algorithm is comparable to the state-of-the-art FCI algorithm in accuracy, while being several orders of magnitude faster on large problems. We conclude that MBCS* makes a new range of causally insufficient problems computationally tractable. Keywords: Graphical Models, Structure Learning, Causal Inference. 1 Introduction: Task Definition & Related Work The statistical definition of causality pioneered by Pearl (2000) and Spirtes et al. (2001) has shed new light on how to detect causation. Central in this approach is the automated detection of causeeffect relationships using observational (i.e., non-experimental) data. This can be a necessary task, as in many situations, performing randomized controlled experiments to unveil causation can be impossible, unethical , or too costly. When the analysis deals with variables that cannot be manipulated, being able to learn from data collected by observing the running system is the only possibility. It turns out that learning the full causal structure of a set of variables is, in its most general form , impossible. If we suppose that the
3 0.73113048 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
Author: David Danks, Clark Glymour, Robert E. Tillman
Abstract: In many domains, data are distributed among datasets that share only some variables; other recorded variables may occur in only one dataset. While there are asymptotically correct, informative algorithms for discovering causal relationships from a single dataset, even with missing values and hidden variables, there have been no such reliable procedures for distributed data with overlapping variables. We present a novel, asymptotically correct procedure that discovers a minimal equivalence class of causal DAG structures using local independence information from distributed data of this form and evaluate its performance using synthetic and real-world data against causal discovery algorithms for single datasets and applying Structural EM, a heuristic DAG structure learning procedure for data with missing values, to the concatenated data.
4 0.46138388 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues
Author: Rama Natarajan, Iain Murray, Ladan Shams, Richard S. Zemel
Abstract: We explore a recently proposed mixture model approach to understanding interactions between conflicting sensory cues. Alternative model formulations, differing in their sensory noise models and inference methods, are compared based on their fit to experimental data. Heavy-tailed sensory likelihoods yield a better description of the subjects’ response behavior than standard Gaussian noise models. We study the underlying cause for this result, and then present several testable predictions of these models. 1
5 0.45454961 14 nips-2008-Adaptive Forward-Backward Greedy Algorithm for Sparse Learning with Linear Models
Author: Tong Zhang
Abstract: Consider linear prediction models where the target function is a sparse linear combination of a set of basis functions. We are interested in the problem of identifying those basis functions with non-zero coefficients and reconstructing the target function from noisy observations. Two heuristics that are widely used in practice are forward and backward greedy algorithms. First, we show that neither idea is adequate. Second, we propose a novel combination that is based on the forward greedy algorithm but takes backward steps adaptively whenever beneficial. We prove strong theoretical results showing that this procedure is effective in learning sparse representations. Experimental results support our theory. 1
6 0.43349379 91 nips-2008-Generative and Discriminative Learning with Unknown Labeling Bias
7 0.35970467 211 nips-2008-Simple Local Models for Complex Dynamical Systems
8 0.33867201 149 nips-2008-Near-minimax recursive density estimation on the binary hypercube
9 0.32911903 152 nips-2008-Non-stationary dynamic Bayesian networks
10 0.32897654 217 nips-2008-Sparsity of SVMs that use the epsilon-insensitive loss
11 0.31526724 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence
12 0.30878991 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
13 0.30157128 155 nips-2008-Nonparametric regression and classification with joint sparsity constraints
14 0.30147943 249 nips-2008-Variational Mixture of Gaussian Process Experts
15 0.27873653 41 nips-2008-Breaking Audio CAPTCHAs
16 0.26904383 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
17 0.26340565 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
18 0.26203462 126 nips-2008-Localized Sliced Inverse Regression
19 0.26131138 186 nips-2008-Probabilistic detection of short events, with application to critical care monitoring
20 0.2545929 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
topicId topicWeight
[(6, 0.067), (7, 0.081), (12, 0.021), (15, 0.02), (21, 0.309), (28, 0.159), (57, 0.075), (59, 0.02), (63, 0.02), (64, 0.02), (71, 0.02), (77, 0.059), (83, 0.056)]
simIndex simValue paperId paperTitle
1 0.86884254 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues
Author: Rama Natarajan, Iain Murray, Ladan Shams, Richard S. Zemel
Abstract: We explore a recently proposed mixture model approach to understanding interactions between conflicting sensory cues. Alternative model formulations, differing in their sensory noise models and inference methods, are compared based on their fit to experimental data. Heavy-tailed sensory likelihoods yield a better description of the subjects’ response behavior than standard Gaussian noise models. We study the underlying cause for this result, and then present several testable predictions of these models. 1
2 0.85534942 13 nips-2008-Adapting to a Market Shock: Optimal Sequential Market-Making
Author: Sanmay Das, Malik Magdon-Ismail
Abstract: We study the profit-maximization problem of a monopolistic market-maker who sets two-sided prices in an asset market. The sequential decision problem is hard to solve because the state space is a function. We demonstrate that the belief state is well approximated by a Gaussian distribution. We prove a key monotonicity property of the Gaussian state update which makes the problem tractable, yielding the first optimal sequential market-making algorithm in an established model. The algorithm leads to a surprising insight: an optimal monopolist can provide more liquidity than perfectly competitive market-makers in periods of extreme uncertainty, because a monopolist is willing to absorb initial losses in order to learn a new valuation rapidly so she can extract higher profits later. 1
same-paper 3 0.77566725 153 nips-2008-Nonlinear causal discovery with additive noise models
Author: Patrik O. Hoyer, Dominik Janzing, Joris M. Mooij, Jan R. Peters, Bernhard Schölkopf
Abstract: The discovery of causal relationships between a set of observed variables is a fundamental problem in science. For continuous-valued data linear acyclic causal models with additive noise are often used because these models are well understood and there are well-known methods to fit them to data. In reality, of course, many causal relationships are more or less nonlinear, raising some doubts as to the applicability and usefulness of purely linear methods. In this contribution we show that the basic linear framework can be generalized to nonlinear models. In this extended framework, nonlinearities in the data-generating process are in fact a blessing rather than a curse, as they typically provide information on the underlying causal system and allow more aspects of the true data-generating mechanisms to be identified. In addition to theoretical results we show simulations and some simple real data experiments illustrating the identification power provided by nonlinearities. 1
4 0.73050272 134 nips-2008-Mixed Membership Stochastic Blockmodels
Author: Edoardo M. Airoldi, David M. Blei, Stephen E. Fienberg, Eric P. Xing
Abstract: In many settings, such as protein interactions and gene regulatory networks, collections of author-recipient email, and social networks, the data consist of pairwise measurements, e.g., presence or absence of links between pairs of objects. Analyzing such data with probabilistic models requires non-standard assumptions, since the usual independence or exchangeability assumptions no longer hold. In this paper, we introduce a class of latent variable models for pairwise measurements: mixed membership stochastic blockmodels. Models in this class combine a global model of dense patches of connectivity (blockmodel) with a local model to instantiate node-specific variability in the connections (mixed membership). We develop a general variational inference algorithm for fast approximate posterior inference. We demonstrate the advantages of mixed membership stochastic blockmodel with applications to social networks and protein interaction networks. 1
5 0.60084414 48 nips-2008-Clustering via LP-based Stabilities
Author: Nikos Komodakis, Nikos Paragios, Georgios Tziritas
Abstract: A novel center-based clustering algorithm is proposed in this paper. We first formulate clustering as an NP-hard linear integer program and we then use linear programming and the duality theory to derive the solution of this optimization problem. This leads to an efficient and very general algorithm, which works in the dual domain, and can cluster data based on an arbitrary set of distances. Despite its generality, it is independent of initialization (unlike EM-like methods such as K-means), has guaranteed convergence, can automatically determine the number of clusters, and can also provide online optimality bounds about the quality of the estimated clustering solutions. To deal with the most critical issue in a centerbased clustering algorithm (selection of cluster centers), we also introduce the notion of stability of a cluster center, which is a well defined LP-based quantity that plays a key role to our algorithm’s success. Furthermore, we also introduce, what we call, the margins (another key ingredient in our algorithm), which can be roughly thought of as dual counterparts to stabilities and allow us to obtain computationally efficient approximations to the latter. Promising experimental results demonstrate the potentials of our method.
6 0.57605267 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
7 0.57338834 62 nips-2008-Differentiable Sparse Coding
8 0.57245684 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
9 0.57086629 151 nips-2008-Non-parametric Regression Between Manifolds
10 0.57030141 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
11 0.57029724 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
12 0.5679872 194 nips-2008-Regularized Learning with Networks of Features
13 0.56688172 200 nips-2008-Robust Kernel Principal Component Analysis
14 0.56597912 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
15 0.56447524 226 nips-2008-Supervised Dictionary Learning
16 0.56440032 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks
17 0.56439567 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
18 0.56404018 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
19 0.56396067 118 nips-2008-Learning Transformational Invariants from Natural Movies
20 0.56350172 245 nips-2008-Unlabeled data: Now it helps, now it doesn't