nips nips2010 nips2010-247 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Paul Mckeigue, Jon Krohn, Amos J. Storkey, Felix V. Agakov
Abstract: This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. 1
Reference: text
sentIndex sentText sentNum sentScore
1 uk Abstract This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. [sent-13, score-0.703]
2 The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. [sent-14, score-0.425]
3 The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. [sent-15, score-1.208]
4 The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. [sent-16, score-0.558]
5 Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. [sent-17, score-0.919]
6 The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. [sent-18, score-1.201]
7 1 Introduction A problem common to both epidemiology and to systems biology is to infer causal relationships between phenotypic measurements (biomarkers) and disease outcomes or quantitative traits. [sent-19, score-0.597]
8 The problem is complicated by the fact that in large bio-medical studies, the number of possible genetic and environmental causes is very large, which makes it implausible to conduct exhaustive interventional experiments. [sent-20, score-0.291]
9 Moreover, it is generally impossible to remove the confounding bias due to unmeasured latent variables which influence associations between biomarkers and outcomes. [sent-21, score-0.681]
10 Also, in situations when the biomarkers are mRNA transcript levels, the measurements are known to be quite noisy; additionally, the number of unique candidate causes may exceed the number of observations by several orders of magnitude (the p ≫ n problem). [sent-22, score-0.611]
11 Developing an efficient framework for addressing this problem may be fundamental for overcoming bottlenecks in drug development, with possible applications in the validation of biomarkers as causal risk factors, or developing proxies for clinical trials. [sent-24, score-0.712]
12 Pearl [28] argues that causal assumptions cannot be verified unless one makes a recourse 1 to experimental control, and that there is nothing in the probability distribution p(x, y) which can tell whether a change in x may have an effect on y. [sent-26, score-0.291]
13 If the causal effects are shown to be identifiable, their magnitudes can be obtained by statistical estimation, which for common models often reduces to solving systems of linear equations. [sent-30, score-0.394]
14 In this paper we are leaving aside debates about the nature of causality and focus instead on identifying a set of candidate causes for a large partially observed under-determined genetic problem. [sent-37, score-0.29]
15 The approach builds on the instrumental variable methods that were historically used in epidemiological studies, and on approximate Bayesian inference in sparse linear latent variable models. [sent-38, score-0.383]
16 Specific modeling hypotheses are tested by comparing approximate marginal likelihoods of the corresponding direct, reverse, and pleiotropic models with and without latent confounders, where we follow [21] in allowing for flexible priors. [sent-39, score-0.48]
17 2 Previous work Inference of causal direction of x on y is to some extent simplified if we assume existence of an auxiliary variable g, such that g’s effect on x may only be causal, and g’s effect on y may only be through x. [sent-41, score-0.291]
18 The idea is exploited in instrumental variable methods [3, 2, 29] which typically deal with low-dimensional linear models, where the strength of the causal effect may be estimated as wx→y = cov(g, y)/cov(g, x). [sent-42, score-0.484]
19 Selecting a plausible instrument g may be difficult in some domains; however, in genetic studies it may be possible to exploit as an instrument a measure of genotypic variation. [sent-46, score-0.523]
20 In quantitative genetics, such applications of instrumental variable methods have been termed Mendelian randomization [15, 34]. [sent-47, score-0.251]
21 In accordance with the requirements of the classic instrumental variable methods, it is assumed that effects of the genetic instrument g on the biomarker x are unconfounded, and that effects of the instrument on the outcome y are mediated only through the biomarker (i. [sent-48, score-1.17]
22 However, the assumption of no hidden pleiotropy severely restricts the application of this approach, as most genotypic effects on complex traits are not sufficiently well understood to exclude pleiotropy as a possible explanation of an association. [sent-52, score-0.509]
23 Thus the classical instrumental variable argument is limited to biomarkers for which suitable non-pleiotropic instruments exist, and cannot be easily extended to exploit studies with multiple biomarkers and genome-wide data. [sent-53, score-1.189]
24 A more general approach to exploiting genotypic variation to infer causal relationships between gene transcript levels and quantitative traits has been developed by Schadt et. [sent-54, score-0.811]
25 The histogram shows the difference of the AIC scores for the causal and reverse models for a fixed biomarker and outcome, and various choices of loci from predictive regions. [sent-67, score-0.642]
26 Right: AIC scores of the causal (top) and reverse (bottom) models for each choice of instrument gi (the straight lines link the scores for a fixed choice of gi ). [sent-68, score-0.598]
27 Scores were centered relative to those of the pleiotropic model. [sent-69, score-0.305]
28 Biomarker and outcome are liver expressions of Cyp27b1 and plasma HDL measurements for heterogeneous mice. [sent-70, score-0.263]
29 Based on the choice of gi , either causal or reverse explanations are favored. [sent-71, score-0.445]
30 First, effects of loci and biomarkers on outcomes are not modeled jointly, so widely varying inferences are possible depending on the choice of the triads {gi , xj , y}. [sent-73, score-0.695]
31 Figure 1 center, right compares differences in the AIC scores for the causal and reverse models constructed for a fixed biomarker and outcome, and for various choices of the genetic instruments from the predictive region. [sent-74, score-0.899]
32 Depending on the choice of instrument gi , either causal or reverse explanations are favored. [sent-75, score-0.527]
33 A second key limitation is that the LCMS method does not allow for dependencies between multiple biomarkers, measurement noise, or latent variables (such as unobserved confounders of the biomarker-outcome associations). [sent-76, score-0.309]
34 Thus, for instance, without allowance for noise in the biomarker measurements, non-zero conditional mutual information I(gi , y|xj ) will be interpreted as evidence of pleiotropy or reverse causation even when the relation between the underlying biomarker and outcome is causal. [sent-77, score-0.677]
35 For example, their method does not allow for an easy integration of unmeasured confounders with unknown correlations with the intermediate and outcome variables. [sent-80, score-0.313]
36 Another approach to modeling joint effects of genetic loci and biomarkers (gene expressions) was described by [41]. [sent-81, score-0.803]
37 The vast majority of other recent model selection and structure learning methods from machine learning literature are also either not easily extended to include latent confounders (e. [sent-84, score-0.28]
38 3 Methods To address the problem of causal discovery in large bio-medical studies, we need a unified framework for modeling relations between genotypes, biomarkers, and outcomes that is computationally tractable to handle a large number of variables. [sent-89, score-0.356]
39 Our approach extends LCMS and the instrumental variable methods by the joint modeling of effects of genetic loci and biomarkers, and by allowing for both pleiotropic genotypic effects and latent variables that generate couplings between biomarkers and confound the biomarker-outcome associations. [sent-90, score-1.691]
40 For intermediate γ1 ’s and high empirical correlations, there is a strong preference for the causal model. [sent-128, score-0.291]
41 Bayesian framework allows prior biological information to be included if available: for instance, cis-acting genotypic effects on transcript levels are likely to be stronger and less pleiotropic than trans-acting effects on transcript levels. [sent-129, score-0.963]
42 Here it is used in the context of sparse multi-factor instrumental variable analysis in the presence of unobserved confounders, pleiotropy, and noise. [sent-136, score-0.271]
43 Model Parameterization Our sparse instrumental variables model (SPIV) is specified with four classes of variables: genotypic and environmental covariates g ∈ R|g| , phenotypic biomarkers x ∈ R|x| , outcomes y ∈ R|y| , and latent factors z1 , . [sent-137, score-1.098]
44 The biomarkers x and outcomes y are specified as hidden x y variables inferred from noisy observations ˜ ∈ R|˜| and ˜ ∈ R|˜| (note that |˜| = |x|, |˜| = |y|). [sent-143, score-0.489]
45 The x y x y effects of genotype on biomarkers and outcome are assumed to be unconfounded. [sent-144, score-0.581]
46 Pleiotropic effects of genotype (effects on outcome that are not mediated through the phenotypic biomarkers) are accounted for by an explicit parameterization of p(y|g, x, z). [sent-145, score-0.29]
47 It is clear that the SPIV structure extends that of the instrumental variable methods [2, 3, 29] by allowing for the pleiotropic links, and also extends the pleiotropic model of Schadt et. [sent-147, score-0.803]
48 [30] (Figure 1 left (iii)) by allowing for multiple instruments and latent variables. [sent-149, score-0.25]
49 are specified y with inverse Gamma priors Γ−1 (ai , bi ), with hyperparameters ai and bi fixed at values motivating the prior beliefs about the projection noise (often available to lab technicians collecting trait or biomarker measurements). [sent-154, score-0.29]
50 One way to view the latent confounders z is as missing genotypes or environmental covariates, so that prior variances of the latent factors are peaked at values representative of the empirical variances of the instruments g. [sent-155, score-0.651]
51 Some additional intuition of the influence of the sparse prior on the causal inference may be gained by numerically comparing the marginal likelihoods of the Markov-equivalent models with and without confounders Mx←z→y , Mx→y . [sent-170, score-0.591]
52 Figure 2 shows that when the empirical correlations are strong and γ1 is at intermediate levels, there is a strong preference for a causal model. [sent-172, score-0.319]
53 Also, as the number of genetic instruments grows, evidence in favor of the causal or pleiotropic model will be less dependent upon the priors on model parameters. [sent-175, score-0.999]
54 For instance, with two genotypic variables that perturb a single transcript, the causal model has three adjustable parameters, but the pleiotropic model has five (see Figure 1 left, (iv)). [sent-176, score-0.743]
55 Note that the MAP solution for SPIV may also be easily derived for the semi-supervised case where the biomarker and outcome vectors are only partially observed. [sent-180, score-0.245]
56 on sampling or expectation propagation [26, 31], the MAP approximation allows for an efficient handling of very large networks with multiple instruments and biomarkers, and makes it straightforward to incorporate latent confounders. [sent-183, score-0.25]
57 For example, the fixed-point update for ui ∈ R|g| linking biomarker xi with the vector of instruments g is easily expressed as (t) 2 ´ (t−1) + γ2 I|g| ui = GT G + σxi γ1 Ui 5 −1 GT xi − GT Z vi , (4) Figure 3: Top: SPIV for artificial datasets. [sent-187, score-0.326]
58 Bottom: SPIV for a genome-wide study of causal effects on HDL in heterogeneous stock mice. [sent-192, score-0.5]
59 Left/right plots show maximum a-posteriori weights θM AP and the mutual information I(xi , y|e) between the unobserved biomarkers and outcome evaluated from the model at θM AP , under the joint Gaussian assumption. [sent-193, score-0.565]
60 A cluster of pleiotropic links on chromosome 1 at about 173 MBP is consistent with biology. [sent-194, score-0.399]
61 Transcripts that are most predictive of HDL through their links with pleiotropic genetic markers on chrom 1 are Uap1, Rgs5, Apoa2, and Nr1i3. [sent-196, score-0.538]
62 4 Results Artificial data: We applied SPIV to several simulated datasets, and compared specific modeling hypotheses for the biomarkers retained in the posterior modes. [sent-212, score-0.472]
63 Subsequent testing of the specific modeling hypotheses for the most important factors resulted in the correct discrimination of causal and confounded associations in ≈86% of cases. [sent-219, score-0.473]
64 Genome-wide study of HDL cholesterol in mice: To demonstrate our method for a large-scale practical application, we examined effects of gene transcript levels in the liver on plasma highdensity lipoprotein (HDL) cholesterol levels for a mice from a heterogeneous stock. [sent-220, score-0.821]
65 The genetic factors influencing HDL in mice have been well explored in biology e. [sent-221, score-0.317]
66 0 MAP Trpv3 5530401A14Rik Tbx2 1110001A07Rik MI between biomarkers and HDL at Θ foundation. [sent-230, score-0.396]
67 At each of the 12500 marker loci, genotypes were described by 8-D vectors of expected founder ancestry proportions inferred from the raw marker genotypes by an HMM-based reconstruction method [23]. [sent-231, score-0.244]
68 Mouse-specific covariates included age and sex, which were used to augment the set of genetic instruments. [sent-232, score-0.231]
69 The full set of phenotypic biomarkers consisted of 47429 transcript levels, appropriately transformed and cleaned. [sent-233, score-0.641]
70 Indeed, the considered case of ∼ O(105 ) instruments and 47K biomarkers would give rise to O(109 ) interaction weights, which is expensive to analyze or even keep in memory. [sent-238, score-0.559]
71 In this case and for the considered sparseness-inducing priors, no hidden confounders appear to have strong effects on the outcome in the posterior1 . [sent-245, score-0.348]
72 The spikes of the pleiotropic activations in sex chromosome 20 and around chromosome 1 are consistent with the biological knowledge [38]. [sent-246, score-0.479]
73 The biomarker with the strongest direct effect on HDL (computed as the mean MAP weight wi : xi → y divided by its standard deviation over multiple runs, where each mean weight exceeds a threshold) is the expression of Cyp27b1 (gene responsible for vitamin D metabolism). [sent-247, score-0.271]
74 Knockout of the Cyp27b1 gene in mice has been shown to alter body fat stores [24], which might be expected to affect HDL cholesterol levels. [sent-248, score-0.275]
75 Recently it has also been shown that quantitative trait locus for circulating vitamin D levels in humans includes a gene that codes for the enzyme that synthesizes cholesterol [1]. [sent-249, score-0.434]
76 To demonstrate an application to gene fine-mapping studies, Figure 3 (bottom right) shows the approximate mutual information I(xi , y|e = {age, sex}) between the underlying biomarkers and unobserved HDL levels expressed from the model at θM AP . [sent-254, score-0.606]
77 The mutual information takes into account not only the strength of the direct effect of xi on y, but also associations with the pleiotropic instruments, strengths of the pleiotropic effects, and dependencies between the instruments. [sent-255, score-0.796]
78 Here Σgg ∈ R|g|×|g| is the empirical covariance of the instruments, wj ∈ R|x| , wzj ∈ R|z| , and wgj ∈ R|g| are the MAP weights of the couplings of yj with the biomarkers, confounders, and genetic instruments respectively. [sent-257, score-0.533]
79 An application of SPIV to proprietary human data for a study of effects of vitamins and calcium levels on colorectal cancer (which we are not yet allowed to publish) showed very strong effects of the latent confounders. [sent-260, score-0.346]
80 Note that the couplings are via the links with the pleiotropic genetic markers on chromosome 1. [sent-263, score-0.651]
81 Details of the data collection, microarray preprocessing, and feature selection, along with the detailed findings for other biomarkers and phenotypic outcomes will be made available online. [sent-267, score-0.566]
82 SPIV performs the screening of interesting biomarker-phenotype and genotype-biomarker-phenotype associations by exploiting the maximum-a-posteriori inference in a sparse linear latent variable model. [sent-270, score-0.325]
83 Intuitively, the approach is motivated by the observation that while independence of variables implies that they are not in a causal relation, a preference for an unconfounded causal model may indicate possible causality and require further controlled experiments. [sent-272, score-0.722]
84 Technically, SPIV may be viewed as an extension of LASSO and elastic net regression which allows for latent variables and pleiotropic dependencies. [sent-273, score-0.487]
85 While being particularly attractive for genetic studies, SPIV or its modifications may potentially be applied for addressing more general structure learning tasks. [sent-274, score-0.224]
86 While SPIV attempts to focus the attention on important biomarkers establishing strong direct associations with the phenotypes, modeling of the precisions may be used for filtering out unimportant factors (conditionally) independent of the outcome variables. [sent-280, score-0.663]
87 Our future work will involve a direct estimation of the sparse conditional precision matrix Σ−1 of the biomarkers, outcomes, and unmeasured confounders (given the instruments), through xyz|g latent variable extensions of the recently proposed graphical LASSO and related methods [11, 18]. [sent-281, score-0.367]
88 The key purpose of this paper is to draw attention of the machine learning community to the problem of inferring causal relationships between phenotypic measurements and complex traits (disease risks), which may have tremendous implications in epidemiology and systems biology. [sent-282, score-0.574]
89 Our specific approach to the problem is inspired by the ideas of instrumental variable analysis commonly used in epidemiological studies, which we have extended to properly address situations when the genetic variables may be direct causes of the hypothesized outcomes. [sent-283, score-0.509]
90 The sparse instrumental variable framework (SPIV) overcomes limitations of the likelihood-based LCMS methods often used by geneticists, by modeling joint effects of genetic loci and biomarkers in the presence of noise and latent variables. [sent-284, score-1.13]
91 The approach is tractable enough to be used in genetic studies with tens of thousands of variables. [sent-285, score-0.24]
92 It may be used for identifying specific genes associated with phenotypic outcomes, and may have wide applications in identification of biomarkers as possible targets for interventions, or as proxy endpoints for early-stage clinical trials. [sent-286, score-0.527]
93 Identification of causal effects using instrumental variables (with discussion). [sent-302, score-0.615]
94 Statistical estimation of correlated genome associations to a quantitative trait network. [sent-401, score-0.249]
95 Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. [sent-410, score-0.48]
96 Regression by dependence minimization and its application to causal inference in additive noise models. [sent-448, score-0.32]
97 An integrative genomics approach to infer causal associations between gene expression and disease. [sent-511, score-0.546]
98 Mendelian randomisation: can genetic epidemiology contribute to understanding environmental determinants of disease? [sent-534, score-0.292]
99 Genome-wide genetic association of complex traits in heterogeneous stock mice. [sent-562, score-0.38]
100 Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. [sent-581, score-0.565]
wordName wordTfidf (topN-words)
[('biomarkers', 0.396), ('pleiotropic', 0.305), ('causal', 0.291), ('hdl', 0.265), ('spiv', 0.265), ('genetic', 0.199), ('instrumental', 0.193), ('instruments', 0.163), ('confounders', 0.163), ('biomarker', 0.163), ('transcript', 0.14), ('associations', 0.13), ('genotypic', 0.119), ('pleiotropy', 0.106), ('loci', 0.105), ('phenotypic', 0.105), ('effects', 0.103), ('gene', 0.1), ('genotypes', 0.093), ('lcms', 0.093), ('mice', 0.093), ('latent', 0.087), ('trait', 0.086), ('reverse', 0.083), ('outcome', 0.082), ('instrument', 0.082), ('cholesterol', 0.082), ('mendelian', 0.08), ('traits', 0.075), ('mx', 0.075), ('gi', 0.071), ('outcomes', 0.065), ('genetics', 0.064), ('chromosome', 0.06), ('epidemiology', 0.06), ('causality', 0.059), ('stock', 0.057), ('aic', 0.057), ('causation', 0.054), ('sex', 0.054), ('wg', 0.054), ('couplings', 0.053), ('schadt', 0.053), ('unconfounded', 0.053), ('vitamin', 0.053), ('levels', 0.053), ('retained', 0.049), ('heterogeneous', 0.049), ('ap', 0.047), ('transcripts', 0.047), ('sparse', 0.047), ('elastic', 0.044), ('measurements', 0.043), ('studies', 0.041), ('priors', 0.041), ('wz', 0.04), ('knockout', 0.04), ('lwi', 0.04), ('plasma', 0.04), ('unmeasured', 0.04), ('valdar', 0.04), ('likelihoods', 0.036), ('jrss', 0.035), ('links', 0.034), ('yj', 0.034), ('quantitative', 0.033), ('environmental', 0.033), ('screening', 0.032), ('causes', 0.032), ('age', 0.032), ('unobserved', 0.031), ('selection', 0.03), ('weights', 0.03), ('direct', 0.03), ('inference', 0.029), ('marker', 0.029), ('variables', 0.028), ('correlations', 0.028), ('lasso', 0.028), ('hypotheses', 0.027), ('bic', 0.027), ('bayesian', 0.027), ('circulating', 0.027), ('epidemiological', 0.027), ('gg', 0.027), ('interventional', 0.027), ('wgj', 0.027), ('wzj', 0.027), ('mutual', 0.026), ('liver', 0.026), ('xj', 0.026), ('genes', 0.026), ('expression', 0.025), ('edinburgh', 0.025), ('randomization', 0.025), ('addressing', 0.025), ('factors', 0.025), ('marginal', 0.025), ('expressions', 0.023), ('net', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 247 nips-2010-Sparse Instrumental Variables (SPIV) for Genome-Wide Studies
Author: Paul Mckeigue, Jon Krohn, Amos J. Storkey, Felix V. Agakov
Abstract: This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. 1
2 0.26116785 46 nips-2010-Causal discovery in multiple models from different experiments
Author: Tom Claassen, Tom Heskes
Abstract: A long-standing open research problem is how to use information from different experiments, including background knowledge, to infer causal relations. Recent developments have shown ways to use multiple data sets, provided they originate from identical experiments. We present the MCI-algorithm as the first method that can infer provably valid causal relations in the large sample limit from different experiments. It is fast, reliable and produces very clear and easily interpretable output. It is based on a result that shows that constraint-based causal discovery is decomposable into a candidate pair identification and subsequent elimination step that can be applied separately from different models. We test the algorithm on a variety of synthetic input model sets to assess its behavior and the quality of the output. The method shows promising signs that it can be adapted to suit causal discovery in real-world application areas as well, including large databases. 1
3 0.25423566 218 nips-2010-Probabilistic latent variable models for distinguishing between cause and effect
Author: Oliver Stegle, Dominik Janzing, Kun Zhang, Joris M. Mooij, Bernhard Schölkopf
Abstract: We propose a novel method for inferring whether X causes Y or vice versa from joint observations of X and Y . The basic idea is to model the observed data using probabilistic latent variable models, which incorporate the effects of unobserved noise. To this end, we consider the hypothetical effect variable to be a function of the hypothetical cause variable and an independent noise term (not necessarily additive). An important novel aspect of our work is that we do not restrict the model class, but instead put general non-parametric priors on this function and on the distribution of the cause. The causal direction can then be inferred by using standard Bayesian model selection. We evaluate our approach on synthetic data and real-world data and report encouraging results. 1
4 0.11395862 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
Author: Seunghak Lee, Jun Zhu, Eric P. Xing
Abstract: To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs.
5 0.096166223 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
Author: Vikas Sindhwani, Aurelie C. Lozano
Abstract: We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. Within this framework, we then formulate the problem of inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.
6 0.076485313 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization
7 0.059443627 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models
8 0.058325987 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
9 0.055591088 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning
10 0.053177204 22 nips-2010-Active Estimation of F-Measures
11 0.052808184 217 nips-2010-Probabilistic Multi-Task Feature Selection
12 0.052015506 89 nips-2010-Factorized Latent Spaces with Structured Sparsity
13 0.048387077 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
14 0.046215091 101 nips-2010-Gaussian sampling by local perturbations
15 0.045725107 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
16 0.044416815 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
17 0.041772231 129 nips-2010-Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks
18 0.041575324 240 nips-2010-Simultaneous Object Detection and Ranking with Weak Supervision
19 0.040535942 33 nips-2010-Approximate inference in continuous time Gaussian-Jump processes
20 0.039549425 265 nips-2010-The LASSO risk: asymptotic results and real world examples
topicId topicWeight
[(0, 0.131), (1, 0.043), (2, -0.007), (3, 0.072), (4, -0.073), (5, -0.057), (6, 0.0), (7, 0.067), (8, -0.092), (9, -0.108), (10, -0.209), (11, 0.142), (12, -0.093), (13, -0.166), (14, 0.138), (15, 0.298), (16, -0.041), (17, -0.013), (18, -0.018), (19, -0.058), (20, 0.084), (21, -0.023), (22, 0.021), (23, 0.015), (24, 0.004), (25, 0.058), (26, -0.036), (27, -0.013), (28, -0.0), (29, -0.011), (30, -0.022), (31, 0.012), (32, -0.003), (33, -0.017), (34, 0.005), (35, 0.006), (36, 0.016), (37, -0.058), (38, 0.038), (39, 0.037), (40, -0.014), (41, -0.013), (42, 0.0), (43, -0.004), (44, -0.019), (45, 0.008), (46, -0.021), (47, 0.006), (48, 0.004), (49, -0.0)]
simIndex simValue paperId paperTitle
same-paper 1 0.93287522 247 nips-2010-Sparse Instrumental Variables (SPIV) for Genome-Wide Studies
Author: Paul Mckeigue, Jon Krohn, Amos J. Storkey, Felix V. Agakov
Abstract: This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. 1
2 0.92484897 46 nips-2010-Causal discovery in multiple models from different experiments
Author: Tom Claassen, Tom Heskes
Abstract: A long-standing open research problem is how to use information from different experiments, including background knowledge, to infer causal relations. Recent developments have shown ways to use multiple data sets, provided they originate from identical experiments. We present the MCI-algorithm as the first method that can infer provably valid causal relations in the large sample limit from different experiments. It is fast, reliable and produces very clear and easily interpretable output. It is based on a result that shows that constraint-based causal discovery is decomposable into a candidate pair identification and subsequent elimination step that can be applied separately from different models. We test the algorithm on a variety of synthetic input model sets to assess its behavior and the quality of the output. The method shows promising signs that it can be adapted to suit causal discovery in real-world application areas as well, including large databases. 1
3 0.90284091 218 nips-2010-Probabilistic latent variable models for distinguishing between cause and effect
Author: Oliver Stegle, Dominik Janzing, Kun Zhang, Joris M. Mooij, Bernhard Schölkopf
Abstract: We propose a novel method for inferring whether X causes Y or vice versa from joint observations of X and Y . The basic idea is to model the observed data using probabilistic latent variable models, which incorporate the effects of unobserved noise. To this end, we consider the hypothetical effect variable to be a function of the hypothetical cause variable and an independent noise term (not necessarily additive). An important novel aspect of our work is that we do not restrict the model class, but instead put general non-parametric priors on this function and on the distribution of the cause. The causal direction can then be inferred by using standard Bayesian model selection. We evaluate our approach on synthetic data and real-world data and report encouraging results. 1
4 0.71712691 41 nips-2010-Block Variable Selection in Multivariate Regression and High-dimensional Causal Inference
Author: Vikas Sindhwani, Aurelie C. Lozano
Abstract: We consider multivariate regression problems involving high-dimensional predictor and response spaces. To efficiently address such problems, we propose a variable selection method, Multivariate Group Orthogonal Matching Pursuit, which extends the standard Orthogonal Matching Pursuit technique. This extension accounts for arbitrary sparsity patterns induced by domain-specific groupings over both input and output variables, while also taking advantage of the correlation that may exist between the multiple outputs. Within this framework, we then formulate the problem of inferring causal relationships over a collection of high-dimensional time series variables. When applied to time-evolving social media content, our models yield a new family of causality-based influence measures that may be seen as an alternative to the classic PageRank algorithm traditionally applied to hyperlink graphs. Theoretical guarantees, extensive simulations and empirical studies confirm the generality and value of our framework.
5 0.37734634 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
Author: Seunghak Lee, Jun Zhu, Eric P. Xing
Abstract: To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs.
6 0.33852062 82 nips-2010-Evaluation of Rarity of Fingerprints in Forensics
7 0.3101458 242 nips-2010-Slice sampling covariance hyperparameters of latent Gaussian models
8 0.29869688 154 nips-2010-Learning sparse dynamic linear systems using stable spline kernels and exponential hyperpriors
9 0.28929329 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
10 0.28395447 126 nips-2010-Inference with Multivariate Heavy-Tails in Linear Models
11 0.28337196 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
12 0.27222824 266 nips-2010-The Maximal Causes of Natural Scenes are Edge Filters
13 0.26069006 217 nips-2010-Probabilistic Multi-Task Feature Selection
14 0.25770196 224 nips-2010-Regularized estimation of image statistics by Score Matching
15 0.25469813 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
16 0.25394037 213 nips-2010-Predictive Subspace Learning for Multi-view Data: a Large Margin Approach
17 0.25387132 211 nips-2010-Predicting Execution Time of Computer Programs Using Sparse Polynomial Regression
18 0.25370428 2 nips-2010-A Bayesian Approach to Concept Drift
19 0.24872138 262 nips-2010-Switched Latent Force Models for Movement Segmentation
20 0.24859498 84 nips-2010-Exact inference and learning for cumulative distribution functions on loopy graphs
topicId topicWeight
[(13, 0.028), (17, 0.016), (27, 0.073), (30, 0.032), (35, 0.041), (45, 0.144), (50, 0.06), (52, 0.029), (60, 0.023), (71, 0.402), (77, 0.033), (90, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.72445756 247 nips-2010-Sparse Instrumental Variables (SPIV) for Genome-Wide Studies
Author: Paul Mckeigue, Jon Krohn, Amos J. Storkey, Felix V. Agakov
Abstract: This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. 1
2 0.55188602 203 nips-2010-Parametric Bandits: The Generalized Linear Case
Author: Sarah Filippi, Olivier Cappe, Aurélien Garivier, Csaba Szepesvári
Abstract: We consider structured multi-armed bandit problems based on the Generalized Linear Model (GLM) framework of statistics. For these bandits, we propose a new algorithm, called GLM-UCB. We derive finite time, high probability bounds on the regret of the algorithm, extending previous analyses developed for the linear bandits to the non-linear case. The analysis highlights a key difficulty in generalizing linear bandit algorithms to the non-linear case, which is solved in GLM-UCB by focusing on the reward space rather than on the parameter space. Moreover, as the actual effectiveness of current parameterized bandit algorithms is often poor in practice, we provide a tuning method based on asymptotic arguments, which leads to significantly better practical performance. We present two numerical experiments on real-world data that illustrate the potential of the GLM-UCB approach. Keywords: multi-armed bandit, parametric bandits, generalized linear models, UCB, regret minimization. 1
3 0.47370079 217 nips-2010-Probabilistic Multi-Task Feature Selection
Author: Yu Zhang, Dit-Yan Yeung, Qian Xu
Abstract: Recently, some variants of the đ?‘™1 norm, particularly matrix norms such as the đ?‘™1,2 and đ?‘™1,∞ norms, have been widely used in multi-task learning, compressed sensing and other related areas to enforce sparsity via joint regularization. In this paper, we unify the đ?‘™1,2 and đ?‘™1,∞ norms by considering a family of đ?‘™1,đ?‘ž norms for 1 < đ?‘ž ≤ ∞ and study the problem of determining the most appropriate sparsity enforcing norm to use in the context of multi-task feature selection. Using the generalized normal distribution, we provide a probabilistic interpretation of the general multi-task feature selection problem using the đ?‘™1,đ?‘ž norm. Based on this probabilistic interpretation, we develop a probabilistic model using the noninformative Jeffreys prior. We also extend the model to learn and exploit more general types of pairwise relationships between tasks. For both versions of the model, we devise expectation-maximization (EM) algorithms to learn all model parameters, including đ?‘ž, automatically. Experiments have been conducted on two cancer classiďŹ cation applications using microarray gene expression data. 1
4 0.43389466 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection
Author: Seunghak Lee, Jun Zhu, Eric P. Xing
Abstract: To understand the relationship between genomic variations among population and complex diseases, it is essential to detect eQTLs which are associated with phenotypic effects. However, detecting eQTLs remains a challenge due to complex underlying mechanisms and the very large number of genetic loci involved compared to the number of samples. Thus, to address the problem, it is desirable to take advantage of the structure of the data and prior information about genomic locations such as conservation scores and transcription factor binding sites. In this paper, we propose a novel regularized regression approach for detecting eQTLs which takes into account related traits simultaneously while incorporating many regulatory features. We first present a Bayesian network for a multi-task learning problem that includes priors on SNPs, making it possible to estimate the significance of each covariate adaptively. Then we find the maximum a posteriori (MAP) estimation of regression coefficients and estimate weights of covariates jointly. This optimization procedure is efficient since it can be achieved by using a projected gradient descent and a coordinate descent procedure iteratively. Experimental results on simulated and real yeast datasets confirm that our model outperforms previous methods for finding eQTLs.
5 0.41708776 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
Author: Pierre Garrigues, Bruno A. Olshausen
Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1
6 0.41190195 21 nips-2010-Accounting for network effects in neuronal responses using L1 regularized point process models
7 0.41122532 51 nips-2010-Construction of Dependent Dirichlet Processes based on Poisson Processes
8 0.41037673 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
9 0.40821284 44 nips-2010-Brain covariance selection: better individual functional connectivity models using population prior
10 0.40797481 55 nips-2010-Cross Species Expression Analysis using a Dirichlet Process Mixture Model with Latent Matchings
11 0.40745032 161 nips-2010-Linear readout from a neural population with partial correlation data
12 0.40728205 96 nips-2010-Fractionally Predictive Spiking Neurons
13 0.40660638 98 nips-2010-Functional form of motion priors in human motion perception
14 0.40643209 200 nips-2010-Over-complete representations on recurrent neural networks can support persistent percepts
15 0.40619925 17 nips-2010-A biologically plausible network for the computation of orientation dominance
16 0.40473971 194 nips-2010-Online Learning for Latent Dirichlet Allocation
17 0.40391773 123 nips-2010-Individualized ROI Optimization via Maximization of Group-wise Consistency of Structural and Functional Profiles
18 0.4038254 56 nips-2010-Deciphering subsampled data: adaptive compressive sampling as a principle of brain communication
19 0.40355739 117 nips-2010-Identifying graph-structured activation patterns in networks
20 0.40355223 268 nips-2010-The Neural Costs of Optimal Control