nips nips2013 nips2013-217 nips2013-217-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Eunho Yang, Pradeep Ravikumar, Genevera I. Allen, Zhandong Liu
Abstract: Undirected graphical models, such as Gaussian graphical models, Ising, and multinomial/categorical graphical models, are widely used in a variety of applications for modeling distributions over a large number of variables. These standard instances, however, are ill-suited to modeling count data, which are increasingly ubiquitous in big-data settings such as genomic sequencing data, user-ratings data, spatial incidence data, climate studies, and site visits. Existing classes of Poisson graphical models, which arise as the joint distributions that correspond to Poisson distributed node-conditional distributions, have a major drawback: they can only model negative conditional dependencies for reasons of normalizability given its infinite domain. In this paper, our objective is to modify the Poisson graphical model distribution so that it can capture a rich dependence structure between count-valued variables. We begin by discussing two strategies for truncating the Poisson distribution and show that only one of these leads to a valid joint distribution. While this model can accommodate a wider range of conditional dependencies, some limitations still remain. To address this, we investigate two additional novel variants of the Poisson distribution and their corresponding joint graphical model distributions. Our three novel approaches provide classes of Poisson-like graphical models that can capture both positive and negative conditional dependencies between count-valued variables. One can learn the graph structure of our models via penalized neighborhood selection, and we demonstrate the performance of our methods by learning simulated networks as well as a network from microRNA-sequencing data. 1
[1] N. Meinshausen and P. B¨ hlmann. High-dimensional graphs and variable selection with the Lasso. Annals u of Statistics, 34:1436–1462, 2006.
[2] M. Yuan and Y. Lin. Model selection and estimation in the gaussian graphical model. Biometrika, 94(1): 19, 2007.
[3] O. Banerjee, L. El Ghaoui, and A. d’Aspremont. Model selection through sparse maximum likelihood estimation for multivariate gaussian or binary data. The Journal of Machine Learning Research, 9:485– 516, 2008.
[4] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the lasso. Biostatistics, 9(3):432–441, 2007.
[5] P. Ravikumar, M. J. Wainwright, and J. Lafferty. High-dimensional ising model selection using 1 regularized logistic regression. Annals of Statistics, 38(3):1287–1319, 2010.
[6] A. Jalali, P. Ravikumar, V. Vasuki, and S. Sanghavi. On learning discrete graphical models using groupsparse regularization. In Inter. Conf. on AI and Statistics (AISTATS), 14, 2011.
[7] H. Liu, J. Lafferty, and L. Wasserman. The nonparanormal: Semiparametric estimation of high dimensional undirected graphs. The Journal of Machine Learning Research, 10:2295–2328, 2009.
[8] A. Dobra and A. Lenkoski. Copula gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 5(2A):969–993, 2011.
[9] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. High dimensional semiparametric gaussian copula graphical models. Arxiv preprint arXiv:1202.2169, 2012.
[10] H. Liu, F. Han, M. Yuan, J. Lafferty, and L. Wasserman. The nonparanormal skeptic. Arxiv preprint arXiv:1206.6488, 2012.
[11] S. L. Lauritzen. Graphical models, volume 17. Oxford University Press, USA, 1996.
[12] I. Yahav and G. Shmueli. An elegant method for generating multivariate poisson random variable. Arxiv preprint arXiv:0710.5670, 2007.
[13] A. S. Krishnamoorthy. Multivariate binomial and poisson distributions. Sankhy¯ : The Indian Journal of a Statistics (1933-1960), 11(2):117–124, 1951.
[14] P. Holgate. Estimation for the bivariate poisson distribution. Biometrika, 51(1-2):241–287, 1964.
[15] D. Karlis. An em algorithm for multivariate poisson distribution and related models. Journal of Applied Statistics, 30(1):63–77, 2003.
[16] N. A. C. Cressie. Statistics for spatial data. Wiley series in probability and mathematical statistics, 1991.
[17] J. Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological), 36(2):192–236, 1974.
[18] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. Graphical models via generalized linear models. In Neur. Info. Proc. Sys., 25, 2012.
[19] E. Yang, P. Ravikumar, G. I. Allen, and Z. Liu. On graphical models via univariate exponential family distributions. Arxiv preprint arXiv:1301.4183, 2013.
[20] M. S. Kaiser and N. Cressie. Modeling poisson variables with positive spatial dependence. Statistics & Probability Letters, 35(4):423–432, 1997.
[21] D. A. Griffith. A spatial filtering specification for the auto-poisson model. Statistics & probability letters, 58(3):245–251, 2002.
[22] J. C. Marioni, C. E. Mason, S. M. Mane, M. Stephens, and Y. Gilad. Rna-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome research, 18(9):1509–1517, 2008.
[23] Cancer Genome Atlas Research Network. Comprehensive molecular portraits of human breast tumours. Nature, 490(7418):61–70, 2012.
[24] G. I. Allen and Z. Liu. A log-linear graphical model for inferring genetic networks from high-throughput sequencing data. IEEE International Conference on Bioinformatics and Biomedicine, 2012.
[25] H. Liu, K. Roeder, and L. Wasserman. Stability approach to regularization selection (stars) for high dimensional graphical models. Arxiv preprint arXiv:1006.3316, 2010.
[26] L. Ma, F. Reinhardt, E. Pan, J. Soutschek, B. Bhat, E. G. Marcusson, J. Teruya-Feldstein, G. W. Bell, and R. A. Weinberg. Therapeutic silencing of mir-10b inhibits metastasis in a mouse mammary tumor model. Nature biotechnology, 28(4):341–347, 2010.
[27] P. de Souza Rocha Simonini, A. Breiling, N. Gupta, M. Malekpour, M. Youns, R. Omranipour, F. Malekpour, S. Volinia, C. M. Croce, H. Najmabadi, et al. Epigenetically deregulated microrna-375 is involved in a positive feedback loop with estrogen receptor α in breast cancer cells. Cancer research, 70(22):9175–9184, 2010. 9