jmlr jmlr2011 jmlr2011-86 jmlr2011-86-reference knowledge-graph by maker-knowledge-mining

86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

Source: pdf

Author: Ricardo Henao, Ole Winther

Abstract: In this paper we consider sparse and identiﬁable linear latent variable (factor) and linear Bayesian network models for parsimonious analysis of multivariate data. We propose a computationally efﬁcient method for joint parameter and model inference, and model comparison. It consists of a fully Bayesian hierarchy for sparse models using slab and spike priors (two-component δ-function and continuous mixtures), non-Gaussian latent factors and a stochastic search over the ordering of the variables. The framework, which we call SLIM (Sparse Linear Identiﬁable Multivariate modeling), is validated and bench-marked on artiﬁcial and real biological data sets. SLIM is closest in spirit to LiNGAM (Shimizu et al., 2006), but differs substantially in inference, Bayesian network structure learning and model comparison. Experimentally, SLIM performs equally well or better than LiNGAM with comparable computational complexity. We attribute this mainly to the stochastic search strategy used, and to parsimony (sparsity and identiﬁability), which is an explicit part of the model. We propose two extensions to the basic i.i.d. linear framework: non-linear dependence on observed variables, called SNIM (Sparse Non-linear Identiﬁable Multivariate modeling) and allowing for correlations between latent variables, called CSLIM (Correlated SLIM), for the temporal and/or spatial data. The source code and scripts are available from http://cogsys.imm.dtu.dk/slim/. Keywords: parsimony, sparsity, identiﬁability, factor models, linear Bayesian networks

reference text

D. F. Andrews and C. L. Mallows. Scale mixtures of normal distributions. Journal of the Royal Statistical Society: Series B (Methodology), 36(1):99–102, 1974. A. Asuncion and D .J. Newman. UCI machine learning repository, 2007. A. Azzalini and A. W. Bowman. A look at some data on the Old Faithful geyser. Journal of the Royal Statistical Society. Series C (Applied Statistics), 39(3):357–365, 1990. P. Bekker and J. M. F. ten Berge. Generic global indentiﬁcation in factor analysis. Linear Algebra and its Applications, 264(1–3):255–263, 1997. M. Branco and D. K. Dey. A general class of multivariate skew-elliptical distributions. Journal of Multivariate Analysis, 79(1):99–113, 2001. C. M. Carvalho, J. Chang, J. E. Lucas, J. R. Nevins, Q. Wang, and M. West. High-dimensional sparse factor modeling: Applications in gene expression genomics. Journal of the American Statistical Association, 103(484):1438–1456, 2008. R. S. Chhikara and L. Folks. The Inverse Gaussian Distribution: Theory, Methodology, and Applications. M. Dekker, New York, 1989. S. Chib. Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90(732):1313–1321, 1995. D. M. Chickering. Learning Bayesian networks is NP-complete. In D. Fisher and H.-J. Lenz, editors, Learning from Data: AI and Statistics, pages 121–130. Springer-Verlag, 1996. 901 H ENAO AND W INTHER P. Comon. Independent component analysis, a new concept? Signal Processing, 36(3):287–314, 1994. G. F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309–347, 1992. P. Daniusis, J. Janzing, J. Mooij, J. Zscheischler, B. Steudel, K. Zhang, and B. Sch¨ lkopf. Inferring o deterministic causal relations. In Proceedings of the 26th Conference on Uncertainty in Artiﬁcial Intelligence, 2010. A. P Dawid and S. L Lauritzen. Hyper Markov laws in the statistical analysis of decomposable graphical models. Annals of Statistics, 21(3):1272–1317, 1993. A. P. Dempster. Covariance selection. Biometrics, 28:157–175, 1972. G. Elidan, I. Nachman, and N. Friedman. “Ideal Parent” structure learning for continuous variable Bayesian networks. Journal of Machine Learning Research, 8:1799–1833, 2007. N. Friedman and D. Koller. Being Bayesian about network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50(1–2):95–125, 2003. N. Friedman and I. Nachman. Gaussian process networks. In Proceedings of the 16th Conference on Uncertainty in Artiﬁcial Intelligence, pages 211–219. 2000. N. Friedman, I. Nachman, and D. Pe’er. Learning Bayesian network structure from massive datasets: The “sparse candidate” algorithm. In K. B. Laskey and H. Prade, editors, Proceedings of the 15th Conference on Uncertainty in Artiﬁcial Intelligence, pages 206–215, 1999. N. Friel and A. N. Pettitt. Marginal likelihood estimation via power posteriors. Journal of the Royal Statistical Society: Series B (Methodology), 70(3):589–607, 2008. S. Gama-Castro, V. Jim´ nez-Jacinto, M. Peralta-Gil, A. Santos-Zavaleta, M. I. Pe˜ aloza-Spinola, e n B. Contreras-Moreira, J. Segura-Salazar, L. Mu˜ iz-Rascado, I. Mart´nez-Flores, H. Salgado, n ı C. Bonavides-Mart´nez, C. Abreu-Goodger, C. Rodr´guez-Penagos, J. Miranda-R´os, E. Morett, ı ı ı E. Merino, A. M. Huerta, L. Trevi˜ o-Quintanilla, and J. Collado-Vides. RegulonDB (version n 6.0): gene regulation model of Escherichia coli K-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation. Nucleic Acids Research, 36(Database Issue): 120–124, 2008. E. I. George and R. E. McCulloch. Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423):881–889, 1993. J. Geweke. Variable selection and model comparison in regression. In J. Berger, J. Bernardo, A. Dawid, and A. Smith, editors, Bayesian Statistics 5, pages 609–620. Oxford University Press, 1996. Z. Ghahramani, T. L. Grifﬁths, and P. Sollich. Bayesian nonparametric latent feature models. In J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, editors, Bayesian Statistics 8, pages 201–226. Oxford University Press, 2006. 902 S PARSE L INEAR I DENTIFIABLE M ULTIVARIATE M ODELING P. Giudici and P. J Green. Decomposable graphical Gaussian model determination. Biometrika, 86 (4):785–801, 1999. A. Gretton, K. Fukumizu, C. H. Teo, L. Song, B. Sch¨ lkopf, and A. Smola. A kernel statistical test o of independence. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 585–592. MIT Press, 2008. D. Heckerman, D. M. Chickering, C. Meek, R. Rounthwaite, and C. Kadie. Dependency networks for inference, collaborative ﬁltering, and data visualization. Journal of Machine Learning Research, 1:49–75, 2000. R. Henao and O. Winther. Bayesian sparse factor models and DAGs inference and comparison. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 736–744. The MIT Press, 2009. P. O. Hoyer, S. Shimizu, A. J. Kerminen, and M. Palviainen. Estimation of causal effects using linear non-Gaussian causal models with hidden variables. Interantional Journal of Approximate Reasoning, 49(2):362–378, 2008. P .O. Hoyer, D. Janzing, J. M. Mooij, J. Peters, and B. Sch¨ lkopf. Nonlinear causal discovery with o additive noise models. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 689–696. 2009. A. Hyv¨ rinen, J. Karhunen, and E. Oja. Independent Component Analysis. Wiley-Interscience, a 2001. H. Ishwaran and J. S. Rao. Spike and slab variable selection: Frequentist and Bayesian strategies. Annals of Statistics, 33(2):730–773, 2005. I. T. Jolliffe, N. T. Trendaﬁlov, and M. Uddin. A modiﬁed principal component technique based on the LASSO. Journal of Computational and Graphical Statistics, 12(3):531–547, 2003. A. M. Kagan, YU. V Linnik, and C. Radhakrishna Rao. Characterization Problems in Mathematical Statistics. Probability and Mathematical Statistics. Wiley, New York, 1973. K. C. Kao, Y-L. Yang, R. Boscolo, C. Sabatti, V. Roychowdhury, and J. C. Liao. Transcriptomebased determination of multiple transcription regulator activities in Escherichia Coli by using network component analysis. PNAS, 101(2):641–646, 2004. D. Knowles and Z. Ghahramani. Inﬁnite sparse factor analysis and inﬁnite independent components analysis. In M. E. Davies, C. C. James, S. A. Abdallah, and M. D. Plumbley, editors, 7th International Conference on Independent Component Analysis and Signal Separation, volume 4666 of Lecture Notes in Computer Science, pages 381–388. Springer-Verlag, Berlin, 2007. F. B. Lempers. Posterior Probabilities of Alternative Linear Models. Rotterdam University Press, 1971. H. F. Lopes and M. West. Bayesian model assessment in factor analysis. Statistica Sinica, 14(1): 41–67, 2004. 903 H ENAO AND W INTHER J. Lucas, C. Carvalho, Q. Wang, A. Bild, J. R. Nevins, and M. West. Bayesian Inference for Gene Expression and Proteomics, chapter Sparse Statistical Modeling in Gene Expression Genomics, pages 155–176. Cambridge University Press, 2006. J. K. Martin and R. P. McDonald. Bayesian estimation in unrestricted factor analysis: A treatment for heywood cases. Psychometrika, 40(4):505–517, 1975. T. J. Mitchell and J. J. Beauchamp. Bayesian variable selection in linear regression. Journal of the American Statistical Association, 83(404):1023–1032, 1988. J. Mooij and D. Janzing. Distinguishing between cause and effect. In JMLR Workshop and Conference Proceedings, volume 6, pages 147–156, 2010. I. Murray. Advances in Markov Chain Monte Carlo Methods. PhD thesis, Gatsby computational neuroscience unit, University College London, 2007. R. Neal. Annealed importance sampling. Statistics and Computing, 11(2):125–139, 2001. T. Park and G. Casella. The Bayesian lasso. Journal of the American Statistical Association, 103 (482):681–686, 2008. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2000. P. Rai and H. Daume III. The inﬁnite hierarchical factor regression model. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1321–1328. The MIT Press, 2009. B. Rajaratman, H. Massam, and C. Carvalho. Flexible covariance estimation in graphical gaussian models. Annals of Statistics, 36(6):2818–2849, 2008. J. M. Robins, R. Scheines, P. Spirtes, and L. Wasserman. Uniform consistency in causal inference. Biometrika, 90(3):491–515, 2003. K. Sachs, O. Perez, D. Pe’er, D. A. Lauffenburger, and G. P. Nolan. Causal protein-signaling networks derived from multiparameter single-cell data. Science, 308(5721):523–529, 2005. M. W. Schmidt, A. Niculescu-Mizil, and K. P. Murphy. Learning graphical model structure using L1-regularization paths. In Proceedings of the 22nd National Conference on Artiﬁcial Intelligence, pages 1278–1283, 2007. S. Shimizu, P. O. Hoyer, A. Hyv¨ rinen, and A. Kerminen. A linear non-Gaussian acyclic model for a causal discovery. Journal of Machine Learning Research, 7:2003–2030, 2006. R. Silva. Causality in the Sciences, chapter Measuring Latent Causal Structure. Oxford University Press, 2010. P. Spirtes, C. Glymour, and R. Scheines. Causation, Prediction, and Search. The MIT Press, second edition, 2001. M. Teyssier and D. Koller. Ordering-based search: A simple and effective algorithm for learning Bayesian networks. In Proceedings of the 21st Conference on Uncertainty in Artiﬁcial Intelligence, pages 548–549, 2005. 904 S PARSE L INEAR I DENTIFIABLE M ULTIVARIATE M ODELING R. Thibaux and M. I. Jordan. Hierarchical beta processes and the indian buffet process. In M. Meila and X. Shen, editors, Proceedings of the Eleventh International Conference on Artiﬁcial Intelligence and Statistics, pages 564–571, 2007. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodology), 58(1):267–288, 1996. R. Tillman, A. Gretton, and P. Spirtes. Nonlinear directed acyclic structure learning with weakly additive noise models. In Advances in Neural Information Processing Systems 22, pages 1847– 1855. Y. Bengio and D. Schuurmans and J. Lafferty and C. K. I. Williams and A. Culotta, 2009. I. Tsamardinos, L. E. Brown, and C. F. Aliferis. The max-min hill-climbing Bayesian network structure learning algorithm. Machine Learning, 65(1):31–78, 2006. M. West. On scale mixtures of normal distributions. Biometrika, 74(3):646–648, 1987. M. West. Bayesian factor regression models in the “large p, small n” paradigm. In J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith, and M. West, editors, Bayesian Statistics 7, pages 723–732. Oxford University Press, 2003. S. Yu, V. Tresp, and K. Yu. Robust multi-task learning with t-processes. In Proceedings of the 24th International Conference on Machine Learning, volume 227, pages 1103–1110, 2007. K. Zhang and A. Hyv¨ rinen. On the identiﬁability of the post-nonlinear causal model. In Proceeda ings of the 25th Conference on Uncertainty in Artiﬁcial Intelligence, pages 647–655. AUAI Press, 2009. K. Zhang and A. Hyv¨ rinen. Distinguishing causes from effect using nonlinear acyclic causal a models. In JMLR Workshop and Conference Proceedings, volume 6, pages 157–164, 2010. H. Zou, T. Hastie, and R. Tibshirani. Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):262–286, 2006. 905