nips nips2010 nips2010-247 nips2010-247-reference knowledge-graph by maker-knowledge-mining

247 nips-2010-Sparse Instrumental Variables (SPIV) for Genome-Wide Studies


Source: pdf

Author: Paul Mckeigue, Jon Krohn, Amos J. Storkey, Felix V. Agakov

Abstract: This paper describes a probabilistic framework for studying associations between multiple genotypes, biomarkers, and phenotypic traits in the presence of noise and unobserved confounders for large genetic studies. The framework builds on sparse linear methods developed for regression and modified here for inferring causal structures of richer networks with latent variables. The method is motivated by the use of genotypes as “instruments” to infer causal associations between phenotypic biomarkers and outcomes, without making the common restrictive assumptions of instrumental variable methods. The method may be used for an effective screening of potentially interesting genotype-phenotype and biomarker-phenotype associations in genome-wide studies, which may have important implications for validating biomarkers as possible proxy endpoints for early-stage clinical trials. Where the biomarkers are gene transcripts, the method can be used for fine mapping of quantitative trait loci (QTLs) detected in genetic linkage studies. The method is applied for examining effects of gene transcript levels in the liver on plasma HDL cholesterol levels for a sample of sequenced mice from a heterogeneous stock, with ∼ 105 genetic instruments and ∼ 47 × 103 gene transcripts. 1


reference text

[1] J. Ahn, K. Yu, and R. Stolzenberg-Solomon et. al. Genome-wide association study of circulating vitamin D levels. Human Molecular Genetics, 2010. Epub ahead of print.

[2] J. D. Angrist, G. W. Imbens, and D. B. Rubin. Identification of causal effects using instrumental variables (with discussion). J. of the Am. Stat. Assoc., 91:444–455, 1996.

[3] R. J. Bowden and D. A. Turkington. Instrumental Variables. Cambridge Uni Press, 1984.

[4] C. Brito and J. Pearl. Generalized instrumental variables. In UAI, 2002.

[5] Y. Chen, J. Zhu, and P. Y. Lum et. al. Variations in DNA elucidate molecular networks that cause disease. Nature, 452:429–435, 2008.

[6] B. Cseke and T. Heskes. Improving posterior marginal approximations in latent Gaussian models. In AISTATS, 2010.

[7] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least angle regression. The Ann. of Stat., 32, 2004.

[8] J. Fan and R. Li. Variable selection via nonconcave penalized likelihood and its oracle properties. J. of the Am. Stat. Assoc., 96(456):1348–1360, 2001.

[9] M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Trans. on PAMI, 25(9), 2003.

[10] I. E. Frank and J. H. Friedman. A statistical view of some chemometrics regression tools. Technometrics, 35(2):109–135, 1993.

[11] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical lasso. Biostatistics, 9(3), 2008.

[12] D. Heckerman, C. Meek, and G. F. Cooper. A Bayesian approach to causal discovery. In C. Glymour and G. F. Cooper, editors, Computation, Causation, and Discovery. MIT, 1999.

[13] G. J. Huang, S. Shifman, and W. Valdar et. al. High resolution mapping of expression QTLs in heterogeneous stock mice in multiple tissues. Genome Research, 19(6):1133–40, 2009.

[14] J. Jia and B. Yu. On model selection consistency of the elastic net when p ≫ n. Technical Report 756, UC Berkeley, Department of Statistics, 2008.

[15] M. B. Katan. Apolipoprotein E isoforms, serum cholesterol and cancer. Lancet, i:507–508, 1986.

[16] S. Kim and E. Xing. Statistical estimation of correlated genome associations to a quantitative trait network. PLOS Genetics, 5(8), 2009.

[17] D. A. Lawlor, R. M. Harbord, and J. Sterne et. al. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. in Medicine, 27:1133–1163, 2008.

[18] E. Levina, A. Rothman, and J. Zhu. Sparse estimation of large covariance matrices via a nested lasso penalty. The Ann. of App. Stat., 2(1):245–263, 2008.

[19] M. H. Maathius, M. Kalisch, and P. Buhlmann. Estimating high-dimensional intervention effects from observation data. The Ann. of Stat., 37:3133–3164, 2009.

[20] D. J. C. MacKay. Bayesian interpolation. Neural Computation, 4:415–447, 1992.

[21] D. J. C. MacKay. Information Theory, Inference & Learning Algorithms. Cambridge Uni Press, 2003.

[22] J. Mooij, D. Janzing, J. Peters, and B. Schoelkopf. Regression by dependence minimization and its application to causal inference in additive noise models. In ICML, 2009.

[23] R. Mott, C. J. Talbot, M. G. Turri, A. C. Collins, and J. Flint. A method for fine mapping quantitative trait loci in outbred animal stocks. Proc. Nat. Acad. Sci. USA, 97:12649–12654, 2000.

[24] C. J. Narvaez and D. Matthews et. al. Lean phenotype and resistance to diet-induced obesity in vitamin D receptor knockout mice correlates with induction of uncoupling protein-1. Endocrinology, 150(2), 2009.

[25] R. M. Neal. Bayesian Learning for Neural Networks. Springer, 1996.

[26] T. Park and G. Casella. The Bayesian LASSO. J. of the Am. Stat. Assoc., 103(482), 2008.

[27] J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge Uni Press, 2000.

[28] J. Pearl. Causal inference in statistics: an overview. Statistics Surveys, 3:96–146, 2009.

[29] J. M. Robins and S. Greenland. Identification of causal effects using instrumental variables: comment. J. of the Am. Stat. Assoc., 91:456–458, 1996.

[30] E. E. Schadt, J. Lamb, X. Yang, and J. Zhu et. al. An integrative genomics approach to infer causal associations between gene expression and disease. Nature Genetics, 37(7):710–717, 2005.

[31] M. W. Seeger. Bayesian inference and optimal design for the sparse linear model. JMLR, 9, 2008.

[32] I. Shpitser and J. Pearl. Identification of conditional interventional distributions. In UAI, 2006.

[33] R. Silva, R. Scheines, C. Glymour, and P. Spirtes. Learning the structure of linear latent variable models. JMLR, 7, 2006.

[34] G. D. Smith and S. Ebrahim. Mendelian randomisation: can genetic epidemiology contribute to understanding environmental determinants of disease? Int. J. of Epidemiology, 32:1–22, 2003.

[35] D.C. Thomas and D.V. Conti. Commentary: The concept of Mendelian randomization. Int. J. of Epidemiology, 32, 2004.

[36] R. Tibshirani. Regression shrinkage and selection via the lasso. JRSS B, 58(1):267–288, 1996.

[37] M. E. Tipping. Sparse Bayesian learning and the RVM. JMLR, 1:211–244, 2001.

[38] W. Valdar, L. C. Solberg, and S. Burnett et. al. Genome-wide genetic association of complex traits in heterogeneous stock mice. Nature Genetics, 38:879–887, 2006.

[39] M. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using L1-constrained quadratic programmming. IEEE Trans. on Inf. Theory, 55:2183 – 2202, 2007.

[40] M. Yuan and Y. Lin. On the nonnegative garrote estimator. JRSS:B, 69, 2007.

[41] J. Zhu, M. C. Wiener, and C. Zhang et. al. Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. PLOS Comp. Biol., 3(4):692–703, 2007.

[42] H. Zou and T. Hastie. Regularization and variable selection via the elastic net. JRSS:B, 67(2), 2005. 9