jmlr jmlr2010 jmlr2010-36 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer
Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality
Reference: text
sentIndex sentText sentNum sentScore
1 FI Department of Computer Science and HIIT University of Helsinki Helsinki, Finland Editor: Peter Dayan Abstract Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. [sent-13, score-1.169]
2 Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. [sent-15, score-0.701]
3 We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. [sent-18, score-0.328]
4 Introduction Analysis of causal influences or effects has become an important topic in statistics and machine learning, and has recently found applications in, for example, neuroinformatics (Roebroeck et al. [sent-21, score-0.476]
5 a ¨ H YV ARINEN , Z HANG , S HIMIZU AND H OYER consider the problem here from a practical viewpoint, where coefficients in conventional statistical models are interpreted as causal influences. [sent-29, score-0.328]
6 First, if the time-resolution of the measurements is higher than the time-scale of causal influences, one can estimate a classic autoregressive (AR) model with time-lagged variables and interpret the autoregressive coefficients as causal effects. [sent-31, score-1.354]
7 Second, if the measurements have a lower time resolution than the causal influences, or if the data has no temporal structure at all, one can use a model in which the influences are instantaneous, leading to Bayesian networks or structural equation models (SEM); see Bollen (1989). [sent-32, score-0.402]
8 1 While estimation of autoregressive methods can be solved by classic regression methods, the case of instantaneous effects is much more difficult. [sent-33, score-0.908]
9 ) Here, we consider the general case where causal influences can occur either instantaneously or with considerable time lags. [sent-41, score-0.328]
10 Such models are one example of structural vector autoregressive (SVAR) models popular in econometric theory, in which numerous attempts have been made for its estimation, see, for example, Swanson and Granger (1997), Demiralp and Hoover (2003) and Moneta and Spirtes (2006). [sent-42, score-0.366]
11 a The proposed non-Gaussian model not only allows estimation of both instantaneous and lagged effects; it also shows that taking instantaneous influences into account can change the values of the time-lagged coefficients quite drastically. [sent-48, score-0.969]
12 Thus, we see that neglecting instantaneous influences can lead to misleading interpretations of causal effects. [sent-49, score-0.69]
13 If the inputs to the system are known, methods such as dynamic causal modelling can be used (Friston et al. [sent-64, score-0.354]
14 In autoregressive modelling, we would model the dynamics by a model of the form k x(t) = ∑ Bτ x(t − τ) + e(t) (1) τ=1 where k is the number of time-delays used, that is, the order of the autoregressive model, Bτ , τ = 1, . [sent-78, score-0.678]
15 2 Definition of Our Model In many applications, the influences between the xi (t) can be both instantaneous and lagged. [sent-86, score-0.362]
16 Denote by Bτ the n × n matrix of the causal effects between the variables xi with time lag τ, τ = 0, . [sent-88, score-0.476]
17 For τ > 0, the effects are ordinary autoregressive effects from the past to the present, while for τ = 0, the effects are “instantaneous”. [sent-92, score-0.742]
18 The ei (t) are non-Gaussian, which is an important assumption which distinguishes our model from classic models, whether autoregressive models, structural-equation models, or Bayesian networks. [sent-98, score-0.516]
19 The matrix modelling instantaneous effects, B0 , corresponds to an acyclic graph, as is typical in causal analysis. [sent-100, score-0.75]
20 If the order of the autoregressive part is zero, that is, k = 0, the model is nothing else than the LiNGAM model, modelling instantaneous effects only. [sent-112, score-0.875]
21 Under the assumptions of the model, in particular the independence and non-Gaussianity of the disturbances ei , the model can be essentially estimated (Comon, 1994). [sent-121, score-0.369]
22 (3) is a classic vector autoregressive model in which future observations are linearly predicted from preceding ones. [sent-134, score-0.4]
23 However, our model would still be different from classic autoregressive models because the disturbances ei (t) are non-Gaussian. [sent-137, score-0.7]
24 1712 E STIMATION OF SVAR MODEL USING NON -G AUSSIANITY It is important to note here that an autoregressive model can serve two different goals: prediction and analysis of causality. [sent-138, score-0.339]
25 Our goal here is the latter: We estimate the parameter matrices Bτ in order to interpret them as causal effects between the variables. [sent-139, score-0.476]
26 ; for such prediction, an ordinary autoregressive model is likely to be just as good. [sent-145, score-0.339]
27 3 S TRUCTURAL V ECTOR AUTOREGRESSIVE M ODELS Combinations of SEM and vector autoregressive models have been proposed in the econometric literature, and called structural vector autoregressive models (SVAR). [sent-148, score-0.664]
28 Even if the Central Limit Theorem is applicable in the sense that ei (t) is a sum of many different latent independent variables, the disturbances can be very non-Gaussian if, for some reason, the variance of the ei (t) is changing. [sent-160, score-0.469]
29 Heteroscedastity can be seen in some important application areas of causal modelling, in particular: 1. [sent-170, score-0.328]
30 The DAG structure means that for the right permutation of its rows (corresponding to the causal ordering), W is lower-triangular. [sent-219, score-0.352]
31 The determinant of a triangular matrix is equal to the product of its diagonal elements, and a permutation does not change the determinant, so the determinant of W is equal to the product of the diagonal elements when the variables are ordered in the causal order. [sent-220, score-0.352]
32 The method combines classic least-squares estimation of an autoregressive (AR) model with LiNGAM estimation. [sent-246, score-0.439]
33 Estimate a classic autoregressive model for the data k x(t) = ∑ Mτ x(t − τ) + n(t) (10) τ=1 using any conventional implementation of a least-squares method. [sent-254, score-0.4]
34 This gives the estimate of the matrix B0 as the solution of the instantaneous causal model ˆ ˆ n(t) = B0 n(t) + e(t). [sent-262, score-0.731]
35 Finally, compute the estimates of the causal effect matrices Bτ for τ > 0 as ˆ ˆ ˆ Bτ = (I − B0 )Mτ for τ > 0. [sent-264, score-0.328]
36 Suppose we just estimate an AR model as in (1), and interpret the coefficients as causal effects. [sent-281, score-0.369]
37 If the innovations are not independent, the causal interpretation may not be justified. [sent-283, score-0.389]
38 The problem with such an approach is that the interpretation of the obtained results in the framework of causal analysis would be quite difficult. [sent-287, score-0.328]
39 Our solution is to fit a causal model like LiNGAM to the residuals, which leads to a straightforward causal interpretation of the analysis of residuals which is logically consistent with the AR model. [sent-288, score-0.756]
40 The estimation of the autoregressive part takes in no way non-Gaussianity into account and is thus likely to be suboptimal. [sent-291, score-0.337]
41 (3) is, in fact, closely related to the multichannel blind deconvolution problem with causal finite impulse response (FIR) filters (Cichocki and Amari, 2002; Hyv¨ rinen et al. [sent-295, score-0.599]
42 Using MBD methods is justified here due to the possibility or transforming an autoregressive model into a moving-average model: In Eq. [sent-301, score-0.339]
43 (3), the observed variables xi (t) can be considered as convolutive mixtures of the disturbances ei (t). [sent-302, score-0.334]
44 The basic statistical principle to estimate the MBD model is that the disturbances ei (t) should be mutually independent for different i and different t. [sent-306, score-0.341]
45 This implies that our SVAR model is identifiable by MBD if at most one of the disturbances ei is Gaussian. [sent-308, score-0.341]
46 To obtain the causal order in the instantaneous effects, find the permutation matrix P (applied equally to both rows and columns) of B0 which makes B0 = PB0 PT as close as possible to strictly lower triangular. [sent-325, score-0.714]
47 3 Sparsification of the Causal Connections For the purposes of interpretability and generalizability, it is often useful to sparsify the causal ˆ connections, that is, to set insignificant entries of Bτ to zero. [sent-327, score-0.328]
48 To make the causal effects sparse, we set about 60% of the entries in the matrix B1 and the lower-triangular part of B0 to zero, while the magnitude of the others is uniformly distributed between 0. [sent-354, score-0.476]
49 1720 E STIMATION OF SVAR MODEL USING NON -G AUSSIANITY disturbances ei (t) were generated by passing standardized i. [sent-366, score-0.332]
50 Assessment of the Significance of Causality In practice, we also need to assess the significance of the estimated causal relations. [sent-391, score-0.356]
51 3 is related to this goal, here we propose a more principled approach for testing the significance of the causal influences. [sent-393, score-0.328]
52 For the instantaneous effects xi (t) → x j (t) (i = j), the significance of causality is obtained by ˆ assessing if the entries of B0 are statistically significantly different from zero. [sent-394, score-0.645]
53 One is a measure of instantaneous variance contributed by xi (t) to x j (t): S0 (i ← j) = [B0 ]2j · var(xi (t))/var(x j (t)). [sent-397, score-0.387]
54 The other measures how strong the total lagged causal ini fluence from xi (t) to x j (t) is; it is a measure of contributed variance from xi (t − τ), τ > 0 to x j (t): Slag (i ← j) = var(∑τ>0 [Bτ ]i j x j (t − τ))/var(x j (t)). [sent-399, score-0.518]
55 ) The asymptotic distributions of these statistics under the null hypothesis (with no causal effects) are very difficult to derive, and they may also behave poorly in the finite sample case. [sent-402, score-0.354]
56 Remarks on the Interpretation of the Parameters In this section, we discuss how the autoregressive parameters are changed by taking into account the instantaneous effects, and how our model can be interpreted in the framework of Granger causality. [sent-456, score-0.701]
57 Taking instantaneous effects into account changes the estimation procedure for all the autoregressive matrices, if we want consistent estimators as we usually do. [sent-459, score-0.847]
58 Of course, this is only the case if there are instantaneous effects, that is, B0 = 0; otherwise, the estimates are not changed. [sent-460, score-0.362]
59 Next we present some theoretical examples of how the instantaneous and lagged effects interact based on the formula in (11). [sent-464, score-0.675]
60 1 E XAMPLE 1: A N I NSTANTANEOUS E FFECT M AY S EEM TO BE L AGGED Consider first the case where the instantaneous and lagged matrices are as follows: B0 = 0 1 , 0 0 B1 = 0. [sent-467, score-0.527]
61 9 That is, there is an instantaneous effect x2 → x1 , and no lagged effects (other than the purely autoregressive xi (t − 1) → xi (t)). [sent-470, score-0.973]
62 Now, if an AR(1) model is estimated for data coming from this model, without taking the instantaneous effects into account, we get the autoregressive matrix M1 = (I − B0 )−1 B1 = 0. [sent-471, score-0.877]
63 2 E XAMPLE 2: S PURIOUS E FFECTS A PPEAR Consider three variables with the instantaneous effects x1 → x2 and x2 → x3 , and no lagged effects other than xi (t − 1) → xi (t), as given by 0. [sent-478, score-0.823]
64 9 This means that the estimation of the simple autoregressive model leads to the inference of a direct lagged effect x1 → x3 , although no such direct effect exists in the model generating the data, for any time lag. [sent-489, score-0.584]
65 A more reassuring result is the following: if the data follows the same causal ordering for all time lags, that ordering is not contradicted by the neglect of instantaneous effect. [sent-490, score-0.778]
66 1723 (15) ¨ H YV ARINEN , Z HANG , S HIMIZU AND H OYER (In the purely instantaneous case, existence of such an ordering is equivalent to acyclicity of the effects as noted in Section 2. [sent-496, score-0.624]
67 What this theorem means is that if the variables really follow a single “causal ordering” for all time lags, that ordering is preserved even if instantaneous effects are neglected and a classic AR model is estimated for the data. [sent-504, score-0.684]
68 Thus, there is some limit to how (11) can change the causal interpretation of the results. [sent-505, score-0.328]
69 2 Generalizations of Granger Causality The classic interpretation of causality in instantaneous SEMs would be that xi causes x j if the ( j, i)th coefficient in B0 is non-zero. [sent-507, score-0.558]
70 A simple operational definition of Granger causality can be based on the autoregressive coefficients Mτ : If at least one of the coefficients from xi (t − τ), τ ≥ 1 to x j (t) is (significantly) non-zero, then xi Granger-causes x j . [sent-509, score-0.433]
71 First we can combine the two aspects of instantaneous and lagged effects. [sent-512, score-0.527]
72 In fact, such a concept of instantaneous causality was already alluded to by Granger (1969), but presumably due to lack of proper estimation methods, that paper as well as most future developments considered mainly non-instantaneous causality. [sent-513, score-0.536]
73 The condition for τ is different from Granger causality since the value τ = 0, corresponding to instantaneous effects, is included. [sent-519, score-0.497]
74 Moreover, since estimation of the instantaneous effects changes the estimates of the lagged ones, the lagged effects used in our definition are different from those usually used with Granger causality. [sent-520, score-1.027]
75 (3) to find the causal relations among several world stock indices. [sent-530, score-0.363]
76 The kurtoses of the estimated disturbances ei are 3. [sent-540, score-0.328]
77 It was found that B0 can be permuted to a strictly lower-triangular matrix, meaning that the instantaneous effects follow a linear acyclic causal model. [sent-548, score-0.872]
78 Finally, based on B0 and B1 , one can plot the causal diagram, which is shown in Fig. [sent-549, score-0.328]
79 Second, the causal relations DJIt−1 → N225t → DJIt and DJIt−1 → HSIt → DJIt are consistent with the time difference between Asia and USA. [sent-554, score-0.328]
80 That is, the causal effects from N225t and HSIt to DJIt , although seeming to be instantaneous, may actually be mainly caused by the time difference. [sent-555, score-0.476]
81 2 Application on MEG Data Second, we applied the proposed model on the magnitudes of brain sources obtained from magnetoencephalographic (MEG) signals to analyze their causal relationships. [sent-559, score-0.502]
82 Next, we fitted an ordinary vector autoregressive model with 10 lags on the estimated sources, finding the corresponding innovation series which we denote by yi (t), i = 1, . [sent-581, score-0.441]
83 The autoregressive model order 10 was chosen because it was the smallest order that gave approximately white innovations. [sent-587, score-0.339]
84 For both the instantaneous and lagged effects, one needs to perform 17 × 16 = 272 tests; therefore, the significance level for each individual test is then 0. [sent-600, score-0.527]
85 4 shows the resulting diagram of causal analysis with instantaneous effects between the magnitudes of the selected MEG sources, with the influences significant at 5% level (corrected for multiple testing). [sent-607, score-0.87]
86 Extensions of the Model We have here assumed that B0 is acyclic, as is typical in causal analysis. [sent-630, score-0.328]
87 However, development of methods for estimating cyclic models is orthogonal to the main contribution of our paper in the sense that we can use any such new method to estimate the instantaneous model in our framework. [sent-635, score-0.403]
88 The idea is to combine blind source separation with a linear autoregressive model of the latent sources. [sent-643, score-0.477]
89 (2008) separate linear sources and o analyze their (causal) connections whereas we analyze connections between the observed variables, and second, we estimate instantaneous causal influences whereas G´ mez-Herrero et al. [sent-647, score-0.747]
90 Conclusion We showed how non-Gaussianity enables estimation of a causal discovery model in which the linear effects can be either instantaneous or time-lagged. [sent-650, score-0.918]
91 Like in the purely instantaneous case (Shimizu et al. [sent-651, score-0.362]
92 , 2006), non-Gaussianity makes the model identifiable without explicit prior assumptions on existence or non-existence of given causal effects. [sent-652, score-0.369]
93 From the practical viewpoint, an important implication is that considering instantaneous effects changes the coefficient of the time-lagged effects as well. [sent-655, score-0.658]
94 Searching for the causal structure of a vector autoregression. [sent-708, score-0.328]
95 Investigating causal relations by econometric models and cross-spectral methods. [sent-743, score-0.363]
96 Estimation of causal effects using linear non-Gaussian causal models with hidden variables. [sent-767, score-0.804]
97 Causal modelling combining instantaneous and lagged a effects: an identifiable model based on non-Gaussianity. [sent-793, score-0.594]
98 Graphical models for the identification of causal structures in multivariate time series models. [sent-845, score-0.328]
99 Blind separation of instantaneous mixtures of non stationary sources. [sent-862, score-0.447]
100 A linear non-Gaussian acyclic model for a causal discovery. [sent-889, score-0.403]
wordName wordTfidf (topN-words)
[('instantaneous', 0.362), ('causal', 0.328), ('autoregressive', 0.298), ('lingam', 0.25), ('svar', 0.204), ('disturbances', 0.184), ('mbd', 0.182), ('granger', 0.175), ('lagged', 0.165), ('shimizu', 0.165), ('hyv', 0.163), ('effects', 0.148), ('djit', 0.136), ('causality', 0.135), ('rinen', 0.135), ('arinen', 0.125), ('aussianity', 0.125), ('himizu', 0.125), ('oyer', 0.125), ('ica', 0.123), ('ei', 0.116), ('stimation', 0.096), ('slag', 0.091), ('meg', 0.079), ('yv', 0.078), ('blind', 0.073), ('acyclicity', 0.07), ('sem', 0.068), ('hang', 0.068), ('helsinki', 0.067), ('uences', 0.065), ('mse', 0.064), ('ar', 0.064), ('innovations', 0.061), ('classic', 0.061), ('residuals', 0.059), ('pham', 0.058), ('sources', 0.057), ('hsit', 0.057), ('spirtes', 0.053), ('non', 0.048), ('likelihood', 0.046), ('pi', 0.046), ('demiralp', 0.045), ('heteroscedasticity', 0.045), ('lags', 0.045), ('moneta', 0.045), ('swanson', 0.045), ('bootstrapping', 0.045), ('finland', 0.045), ('ordering', 0.044), ('brain', 0.044), ('model', 0.041), ('estimation', 0.039), ('cardoso', 0.039), ('hoyer', 0.038), ('separation', 0.037), ('wi', 0.036), ('econometric', 0.035), ('kurtosis', 0.035), ('entropies', 0.035), ('stock', 0.035), ('acyclic', 0.034), ('autoregression', 0.034), ('convolutive', 0.034), ('garrat', 0.034), ('hari', 0.034), ('heteroscedastic', 0.034), ('hiit', 0.034), ('hoover', 0.034), ('kun', 0.034), ('multichannel', 0.034), ('osaka', 0.034), ('patrik', 0.034), ('ssec', 0.034), ('dag', 0.034), ('structural', 0.033), ('neuroimage', 0.032), ('sparsi', 0.032), ('standardized', 0.032), ('magnitudes', 0.032), ('deconvolution', 0.029), ('innovation', 0.029), ('hsi', 0.029), ('cance', 0.029), ('estimated', 0.028), ('nancial', 0.028), ('latent', 0.028), ('disturbance', 0.026), ('modelled', 0.026), ('cichocki', 0.026), ('modelling', 0.026), ('null', 0.026), ('coef', 0.026), ('cients', 0.025), ('variance', 0.025), ('ps', 0.025), ('identi', 0.024), ('temporally', 0.024), ('permutation', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer
Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality
2 0.27913588 56 jmlr-2010-Introduction to Causal Inference
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classiÄ?Ĺš cation. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-deÄ?Ĺš ned sufÄ?Ĺš cient conditions. In a Ä?Ĺš rst set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distribuc 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS tions, types of classiÄ?Ĺš ers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we Ä?Ĺš nd that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. SpeciÄ?Ĺš cally, we investigate the empirical convergence of GLL to the true local neighborhood as a function of sample size. Moreover, we study how predictivity improves with increasing sample size. Then we investigate how sensitive are the algorithms to multiple statistical testing, especially in the presence of many irrelevant features. Next we discuss the role of the algorithm parameters and also show that Markov blanket and causal graph concepts can be used to understand deviations from optimality of state-of-the-art non-causal algorithms. The present paper also introduces the following extensions to the core GLL framework: parallel and distributed versions of GLL algorithms, versions with false discovery rate control, strategies for constructing novel heuristics for speciÄ?Ĺš c domains, and divide-and-conquer local-to-global learning (LGL) strategies. We test the generality of the LGL approach by deriving a novel LGL-based algorithm that compares favorably c 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS to the state-of-the-art global learning algorithms. In addition, we investigate the use of non-causal feature selection methods to facilitate global learning. Open problems and future research paths related to local and local-to-global causal learning are discussed. Keywords: local causal discovery, Markov blanket induction, feature selection, classiÄ?Ĺš cation, causal structure learning, learning of Bayesian networks
5 0.10292453 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
Author: Jitkomut Songsiri, Lieven Vandenberghe
Abstract: An algorithm is presented for topology selection in graphical models of autoregressive Gaussian time series. The graph topology of the model represents the sparsity pattern of the inverse spectrum of the time series and characterizes conditional independence relations between the variables. The method proposed in the paper is based on an ℓ1 -type nonsmooth regularization of the conditional maximum likelihood estimation problem. We show that this reduces to a convex optimization problem and describe a large-scale algorithm that solves the dual problem via the gradient projection method. Results of experiments with randomly generated and real data sets are also included. Keywords: graphical models, time series, topology selection, convex optimization
6 0.073323734 69 jmlr-2010-Lp-Nested Symmetric Distributions
7 0.052227549 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
8 0.048487544 6 jmlr-2010-A Rotation Test to Verify Latent Structure
9 0.040301301 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
10 0.034518138 78 jmlr-2010-Model Selection: Beyond the Bayesian Frequentist Divide
11 0.034242686 90 jmlr-2010-Permutation Tests for Studying Classifier Performance
12 0.032264367 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models
14 0.031272478 10 jmlr-2010-An Exponential Model for Infinite Rankings
15 0.030267948 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis
16 0.0292842 112 jmlr-2010-Training and Testing Low-degree Polynomial Data Mappings via Linear SVM
17 0.028722238 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations
18 0.028215019 109 jmlr-2010-Stochastic Composite Likelihood
19 0.025402494 99 jmlr-2010-Restricted Eigenvalue Properties for Correlated Gaussian Designs
20 0.025098644 83 jmlr-2010-On Over-fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation
topicId topicWeight
[(0, -0.165), (1, 0.301), (2, 0.217), (3, -0.043), (4, -0.021), (5, -0.083), (6, 0.07), (7, -0.059), (8, 0.01), (9, -0.027), (10, -0.056), (11, -0.016), (12, 0.003), (13, -0.072), (14, -0.007), (15, 0.028), (16, 0.029), (17, 0.072), (18, 0.006), (19, -0.044), (20, 0.09), (21, -0.033), (22, -0.059), (23, -0.085), (24, 0.122), (25, 0.127), (26, -0.039), (27, 0.459), (28, 0.002), (29, 0.126), (30, -0.001), (31, -0.03), (32, 0.125), (33, -0.107), (34, 0.156), (35, -0.075), (36, 0.059), (37, -0.037), (38, 0.024), (39, 0.077), (40, -0.192), (41, -0.02), (42, -0.073), (43, -0.097), (44, -0.008), (45, -0.045), (46, -0.06), (47, -0.091), (48, 0.022), (49, -0.098)]
simIndex simValue paperId paperTitle
same-paper 1 0.93863976 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer
Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality
2 0.80638605 56 jmlr-2010-Introduction to Causal Inference
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classiÄ?Ĺš cation. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-deÄ?Ĺš ned sufÄ?Ĺš cient conditions. In a Ä?Ĺš rst set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distribuc 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS tions, types of classiÄ?Ĺš ers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we Ä?Ĺš nd that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show
4 0.31904769 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
Author: Jitkomut Songsiri, Lieven Vandenberghe
Abstract: An algorithm is presented for topology selection in graphical models of autoregressive Gaussian time series. The graph topology of the model represents the sparsity pattern of the inverse spectrum of the time series and characterizes conditional independence relations between the variables. The method proposed in the paper is based on an ℓ1 -type nonsmooth regularization of the conditional maximum likelihood estimation problem. We show that this reduces to a convex optimization problem and describe a large-scale algorithm that solves the dual problem via the gradient projection method. Results of experiments with randomly generated and real data sets are also included. Keywords: graphical models, time series, topology selection, convex optimization
5 0.3127377 69 jmlr-2010-Lp-Nested Symmetric Distributions
Author: Fabian Sinz, Matthias Bethge
Abstract: In this paper, we introduce a new family of probability densities called L p -nested symmetric distributions. The common property, shared by all members of the new class, is the same functional form ˜ x x ρ(x ) = ρ( f (x )), where f is a nested cascade of L p -norms x p = (∑ |xi | p )1/p . L p -nested symmetric distributions thereby are a special case of ν-spherical distributions for which f is only required to be positively homogeneous of degree one. While both, ν-spherical and L p -nested symmetric distributions, contain many widely used families of probability models such as the Gaussian, spherically and elliptically symmetric distributions, L p -spherically symmetric distributions, and certain types of independent component analysis (ICA) and independent subspace analysis (ISA) models, ν-spherical distributions are usually computationally intractable. Here we demonstrate that L p nested symmetric distributions are still computationally feasible by deriving an analytic expression for its normalization constant, gradients for maximum likelihood estimation, analytic expressions for certain types of marginals, as well as an exact and efficient sampling algorithm. We discuss the tight links of L p -nested symmetric distributions to well known machine learning methods such as ICA, ISA and mixed norm regularizers, and introduce the nested radial factorization algorithm (NRF), which is a form of non-linear ICA that transforms any linearly mixed, non-factorial L p nested symmetric source into statistically independent signals. As a corollary, we also introduce the uniform distribution on the L p -nested unit sphere. Keywords: parametric density model, symmetric distribution, ν-spherical distributions, non-linear independent component analysis, independent subspace analysis, robust Bayesian inference, mixed norm density model, uniform distributions on mixed norm spheres, nested radial factorization
7 0.21508946 10 jmlr-2010-An Exponential Model for Infinite Rankings
8 0.21403715 6 jmlr-2010-A Rotation Test to Verify Latent Structure
9 0.21390738 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference
10 0.19263367 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
11 0.17891075 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations
12 0.170075 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models
13 0.16465575 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
14 0.15807824 30 jmlr-2010-Dimensionality Estimation, Manifold Learning and Function Approximation using Tensor Voting
15 0.13576552 77 jmlr-2010-Model-based Boosting 2.0
16 0.13220169 109 jmlr-2010-Stochastic Composite Likelihood
17 0.12935916 99 jmlr-2010-Restricted Eigenvalue Properties for Correlated Gaussian Designs
18 0.12827203 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
19 0.12519096 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values
20 0.12502438 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels
topicId topicWeight
[(3, 0.011), (4, 0.012), (8, 0.013), (21, 0.025), (32, 0.041), (33, 0.535), (36, 0.036), (37, 0.035), (75, 0.118), (81, 0.017), (85, 0.052)]
simIndex simValue paperId paperTitle
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: In part I of this work we introduced and evaluated the Generalized Local Learning (GLL) framework for producing local causal and Markov blanket induction algorithms. In the present second part we analyze the behavior of GLL algorithms and provide extensions to the core methods. SpeciÄ?Ĺš cally, we investigate the empirical convergence of GLL to the true local neighborhood as a function of sample size. Moreover, we study how predictivity improves with increasing sample size. Then we investigate how sensitive are the algorithms to multiple statistical testing, especially in the presence of many irrelevant features. Next we discuss the role of the algorithm parameters and also show that Markov blanket and causal graph concepts can be used to understand deviations from optimality of state-of-the-art non-causal algorithms. The present paper also introduces the following extensions to the core GLL framework: parallel and distributed versions of GLL algorithms, versions with false discovery rate control, strategies for constructing novel heuristics for speciÄ?Ĺš c domains, and divide-and-conquer local-to-global learning (LGL) strategies. We test the generality of the LGL approach by deriving a novel LGL-based algorithm that compares favorably c 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS to the state-of-the-art global learning algorithms. In addition, we investigate the use of non-causal feature selection methods to facilitate global learning. Open problems and future research paths related to local and local-to-global causal learning are discussed. Keywords: local causal discovery, Markov blanket induction, feature selection, classiÄ?Ĺš cation, causal structure learning, learning of Bayesian networks
same-paper 2 0.70558476 36 jmlr-2010-Estimation of a Structural Vector Autoregression Model Using Non-Gaussianity
Author: Aapo Hyvärinen, Kun Zhang, Shohei Shimizu, Patrik O. Hoyer
Abstract: Analysis of causal effects between continuous-valued variables typically uses either autoregressive models or structural equation models with instantaneous effects. Estimation of Gaussian, linear structural equation models poses serious identifiability problems, which is why it was recently proposed to use non-Gaussian models. Here, we show how to combine the non-Gaussian instantaneous model with autoregressive models. This is effectively what is called a structural vector autoregression (SVAR) model, and thus our work contributes to the long-standing problem of how to estimate SVAR’s. We show that such a non-Gaussian model is identifiable without prior knowledge of network structure. We propose computationally efficient methods for estimating the model, as well as methods to assess the significance of the causal influences. The model is successfully applied on financial and brain imaging data. Keywords: structural vector autoregression, structural equation models, independent component analysis, non-Gaussianity, causality
3 0.63230586 116 jmlr-2010-WEKA−Experiences with a Java Open-Source Project
Author: Remco R. Bouckaert, Eibe Frank, Mark A. Hall, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, Ian H. Witten
Abstract: WEKA is a popular machine learning workbench with a development life of nearly two decades. This article provides an overview of the factors that we believe to be important to its success. Rather than focussing on the software’s functionality, we review aspects of project management and historical development decisions that likely had an impact on the uptake of the project. Keywords: machine learning software, open source software
Author: Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, Xenofon D. Koutsoukos
Abstract: We present an algorithmic framework for learning local causal structure around target variables of interest in the form of direct causes/effects and Markov blankets applicable to very large data sets with relatively small samples. The selected feature sets can be used for causal discovery and classiÄ?Ĺš cation. The framework (Generalized Local Learning, or GLL) can be instantiated in numerous ways, giving rise to both existing state-of-the-art as well as novel algorithms. The resulting algorithms are sound under well-deÄ?Ĺš ned sufÄ?Ĺš cient conditions. In a Ä?Ĺš rst set of experiments we evaluate several algorithms derived from this framework in terms of predictivity and feature set parsimony and compare to other local causal discovery methods and to state-of-the-art non-causal feature selection methods using real data. A second set of experimental evaluations compares the algorithms in terms of ability to induce local causal neighborhoods using simulated and resimulated data and examines the relation of predictivity with causal induction performance. Our experiments demonstrate, consistently with causal feature selection theory, that local causal feature selection methods (under broad assumptions encompassing appropriate family of distribuc 2010 Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani and Xenofon D. Koutsoukos. A LIFERIS , S TATNIKOV, T SAMARDINOS , M ANI AND KOUTSOUKOS tions, types of classiÄ?Ĺš ers, and loss functions) exhibit strong feature set parsimony, high predictivity and local causal interpretability. Although non-causal feature selection methods are often used in practice to shed light on causal relationships, we Ä?Ĺš nd that they cannot be interpreted causally even when they achieve excellent predictivity. Therefore we conclude that only local causal techniques should be used when insight into causal structure is sought. In a companion paper we examine in depth the behavior of GLL algorithms, provide extensions, and show
5 0.3967613 56 jmlr-2010-Introduction to Causal Inference
Author: Peter Spirtes
Abstract: The goal of many sciences is to understand the mechanisms by which variables came to take on the values they have (that is, to find a generative model), and to predict what the values of those variables would be if the naturally occurring mechanisms were subject to outside manipulations. The past 30 years has seen a number of conceptual developments that are partial solutions to the problem of causal inference from observational sample data or a mixture of observational sample and experimental data, particularly in the area of graphical causal modeling. However, in many domains, problems such as the large numbers of variables, small samples sizes, and possible presence of unmeasured causes, remain serious impediments to practical applications of these developments. The articles in the Special Topic on Causality address these and other problems in applying graphical causal modeling algorithms. This introduction to the Special Topic on Causality provides a brief introduction to graphical causal modeling, places the articles in a broader context, and describes the differences between causal inference and ordinary machine learning classification and prediction problems. Keywords: Bayesian networks, causation, causal inference
6 0.30614945 69 jmlr-2010-Lp-Nested Symmetric Distributions
7 0.29788589 63 jmlr-2010-Learning Instance-Specific Predictive Models
8 0.29329261 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
9 0.28856978 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
10 0.28401554 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
11 0.2796737 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
12 0.27931902 11 jmlr-2010-An Investigation of Missing Data Methods for Classification Trees Applied to Binary Response Data
13 0.27732697 102 jmlr-2010-Semi-Supervised Novelty Detection
14 0.27211404 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
15 0.27054149 15 jmlr-2010-Approximate Tree Kernels
16 0.27030283 54 jmlr-2010-Information Retrieval Perspective to Nonlinear Dimensionality Reduction for Data Visualization
17 0.26779786 50 jmlr-2010-Image Denoising with Kernels Based on Natural Image Relations
18 0.26452205 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference
19 0.26369044 88 jmlr-2010-Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure
20 0.26319727 101 jmlr-2010-Second-Order Bilinear Discriminant Analysis