nips nips2012 nips2012-147 nips2012-147-reference knowledge-graph by maker-knowledge-mining

147 nips-2012-Graphical Models via Generalized Linear Models


Source: pdf

Author: Eunho Yang, Genevera Allen, Zhandong Liu, Pradeep K. Ravikumar

Abstract: Undirected graphical models, also known as Markov networks, enjoy popularity in a variety of applications. The popular instances of these models such as Gaussian Markov Random Fields (GMRFs), Ising models, and multinomial discrete models, however do not capture the characteristics of data in many settings. We introduce a new class of graphical models based on generalized linear models (GLMs) by assuming that node-wise conditional distributions arise from exponential families. Our models allow one to estimate multivariate Markov networks given any univariate exponential distribution, such as Poisson, negative binomial, and exponential, by fitting penalized GLMs to select the neighborhood for each node. A major contribution of this paper is the rigorous statistical analysis showing that with high probability, the neighborhood of our graphical models can be recovered exactly. We also provide examples of non-Gaussian high-throughput genomic networks learned via our GLM graphical models. 1


reference text

[1] M.J. Wainwright and M.I. Jordan. Graphical models, exponential families, and variational inference. Foundations and Trends R in Machine Learning, 1(1-2):1–305, 2008.

[2] N. Meinshausen and P. B¨ hlmann. High-dimensional graphs and variable selection with the Lasso. Annals u of Statistics, 34:1436–1462, 2006.

[3] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using 1 regularized logistic regression. Annals of Statistics, 38(3):1287–1319, 2010.

[4] A. Jalali, P. Ravikumar, V. Vasuki, and S. Sanghavi. On learning discrete graphical models using groupsparse regularization. In Inter. Conf. on AI and Statistics (AISTATS), 14, 2011.

[5] P. McCullagh and J.A. Nelder. Generalized linear models. Monographs on statistics and applied probability 37. Chapman and Hall/CRC, New York, 1989.

[6] S.L. Lauritzen. Graphical models, volume 17. Oxford University Press, USA, 1996.

[7] J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2):192–236, 1974.

[8] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10:2295–2328, 2009.

[9] Y.M.M. Bishop, S.E. Fienberg, and P.W. Holland. Discrete multivariate analysis. Springer Verlag, 2007.

[10] Trevor. Hastie, Robert. Tibshirani, and JH (Jerome H.) Friedman. The elements of statistical learning. Springer, 2 edition, 2009.

[11] J.C. Marioni, C.E. Mason, S.M. Mane, M. Stephens, and Y. Gilad. Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research, 18(9):1509–1517, 2008.

[12] S. Negahban, P. Ravikumar, M. J. Wainwright, and B. Yu. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers, 2010.

[13] Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours. Nature, 490(7418):61–70, 2012.

[14] Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216):1061–1068, October 2008.

[15] H. Liu, K. Roeder, and L. Wasserman. Stability approach to regularization selection (stars) for high dimensional graphical models. Arxiv preprint arXiv:1006.3316, 2010.

[16] K. Abdelmohsen, M.M. Kim, S. Srikantan, E.M. Mercken, S.E. Brennan, G.M. Wilson, R. de Cabo, and M. Gorospe. mir-519 suppresses tumor growth by reducing hur levels. Cell cycle (Georgetown, Tex.), 9(7):1354, 2010.

[17] I. Keklikoglou, C. Koerner, C. Schmidt, JD Zhang, D. Heckmann, A. Shavinskaya, H. Allgayer, B. G¨ ckel, T. Fehm, A. Schneeweiss, et al. Microrna-520/373 family functions as a tumor suppressor u in estrogen receptor negative breast cancer by targeting nf-κb and tgf-β signaling pathways. Oncogene, 2011.

[18] F. Yu, H. Yao, P. Zhu, X. Zhang, Q. Pan, C. Gong, Y. Huang, X. Hu, F. Su, J. Lieberman, et al. let-7 regulates self renewal and tumorigenicity of breast cancer cells. Cell, 131(6):1109–1123, 2007.

[19] R. McLendon, A. Friedman, D. Bigner, E.G. Van Meir, D.J. Brat, G.M. Mastrogianakis, J.J. Olson, T. Mikkelsen, N. Lehman, K. Aldape, et al. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature, 455(7216):1061–1068, 2008.

[20] Jianhua Zhang. Convert segment data into a region by sample matrix to allow for other high level computational analyses, version 1.2.0 edition. Bioconductor package.

[21] Gerald B W Wertheim, Thomas W Yang, Tien-chi Pan, Anna Ramne, Zhandong Liu, Heather P Gardner, Katherine D Dugan, Petra Kristel, Bas Kreike, Marc J van de Vijver, Robert D Cardiff, Carol Reynolds, and Lewis A Chodosh. The Snf1-related kinase, Hunk, is essential for mammary tumor metastasis. Proceedings of the National Academy of Sciences of the United States of America, 106(37):15855–15860, September 2009.

[22] J.T. Leek, R.B. Scharpf, H.C. Bravo, D. Simcha, B. Langmead, W.E. Johnson, D. Geman, K. Baggerly, and R.A. Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11(10):733–739, 2010.

[23] J. Li, D.M. Witten, I.M. Johnstone, and R. Tibshirani. Normalization, testing, and false discovery rate estimation for rna-sequencing data. Biostatistics, 2011.

[24] G. I. Allen and Z. Liu. A Log-Linear Graphical Model for Inferring Genetic Networks from HighThroughput Sequencing Data. IEEE International Conference on Bioinformatics and Biomedicine, 2012.

[25] J. Bullard, E. Purdom, K. Hansen, and S. Dudoit. Evaluation of statistical methods for normalization and differential expression in mrna-seq experiments. BMC bioinformatics, 11(1):94, 2010. 9