jmlr jmlr2005 jmlr2005-44 jmlr2005-44-reference knowledge-graph by maker-knowledge-mining

44 jmlr-2005-Learning Module Networks

Source: pdf

Author: Eran Segal, Dana Pe'er, Aviv Regev, Daphne Koller, Nir Friedman

Abstract: Methods for learning Bayesian networks can discover dependency structure between observed variables. Although these methods are useful in many applications, they run into computational and statistical problems in domains that involve a large number of variables. In this paper,1 we consider a solution that is applicable when many variables have similar behavior. We introduce a new class of models, module networks, that explicitly partition the variables into modules, so that the variables in each module share the same parents in the network and the same conditional probability distribution. We deﬁne the semantics of module networks, and describe an algorithm that learns the modules’ composition and their dependency structure from data. Evaluation on real data in the domains of gene expression and the stock market shows that module networks generalize better than Bayesian networks, and that the learned module network structure reveals regularities that are obscured in learned Bayesian networks. 1. A preliminary version of this paper appeared in the Proceedings of the Nineteenth Conference on Uncertainty in Artiﬁcial Intelligence, 2003 (UAI ’03). c 2005 Eran Segal, Dana Pe’er, Aviv Regev, Daphne Koller and Nir Friedman. S EGAL , P E ’ ER , R EGEV, KOLLER AND F RIEDMAN

reference text

A. Battle, E. Segal, and D. Koller. Probabilistic discovery of overlapping cellular processes and their regulation using gene expression data. In Proceedings Eighth Annual International Conference on Research in Computational Molecular Biology (RECOMB), 2004. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classiﬁcation and Regression Trees. Wadsworth & Brooks, Monterey, CA, 1984. W. Buntine. Operations for learning with graphical models. Journal of Artiﬁcial Intelligence Research, 2:159–225, 1994. P. Cheeseman, J. Kelly, M. Self, J. Stutz, W. Taylor, and D. Freeman. Autoclass: a Bayesian classiﬁcation system. In Proceedings Fifth International Conference on Machine Learning (ML), pages 54–64, 1988. J. M. Cherry, C. Ball, K. Dolinski, S. Dwight, M. Harris, J. C. Matese, G. Sherlock, G. Binkley, H. Jin, S. Weng, and D. Botstein. Saccharomyces genome database. Nucleic Acid Research, 26:73–79, 1998. http://genome-www.stanford.edu/Saccharomyces/. D. M. Chickering, D. Heckerman, and C. Meek. A Bayesian approach to learning Bayesian networks with local structure. In Proceedings Thirteenth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 80–89, 1997. G. F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347, 1992. T. Dean and K. Kanazawa. A model for reasoning about persistence and causation. Computational Intelligence, 5:142–150, 1989. M. H. DeGroot. Optimal Statistical Decisions. McGraw-Hill, New York, 1970. A. Doucet, N. de Freitas, and N. Gordon (eds). Sequential Monte Carlo Methods in Practice. Springer-Verlag, 2001. B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman & Hall, London, 1993. 586 L EARNING M ODULE N ETWORKS G. Elidan and N. Friedman. Learning the dimensionality of hidden variables. In Proceedings Seventeenth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 144–151, 2001. A. P. Gasch et al. Genomic expression program in the response of yeast cells to environmental changes. Mol. Bio. Cell, 11:4241–4257, 2000. N. Friedman and M. Goldszmidt. Learning Bayesian networks with local structure. In M. I. Jordan, editor, Learning in Graphical Models, pages 421–460. Kluwer, Dordrecht, Netherlands, 1998. N. Friedman and D. Koller. Being Bayesian about Bayesian network structure: A Bayesian approach to structure discovery in Bayesian networks. Machine Learning, 50:95–126, 2003. N. Friedman, L. Getoor, D. Koller, and A. Pfeffer. Learning probabilistic relational models. In Proceedings Sixteenth International Conference on Artiﬁcial Intelligence (IJCAI), pages 1300– 1309, 1999. N. Friedman, M. Goldszmidt, and A. Wyner. Data analysis with Bayesian networks: A bootstrap approach. In Proc. UAI, pages 206–215, 1999. N. Friedman, M. Linial, I. Nachman, and D. Pe’er. Using Bayesian networks to analyze expression data. Journal of Computational Biology, 7:601–620, 2000. N. Friedman, M. Linial, I. Nachman, and D. Pe’er. Using Bayesian networks to analyze expression data. Computational Biology, 7:601–620, 2000. L. Getoor, D. Koller, and N. Friedman. From instances to classes in probabilistic relational models. In Proceedings of the ICML Workshop on Attribute-Value and Relational Learning, 2000. D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20:197–243, 1995. D. Heckerman. A tutorial on learning with Bayesian networks. In M. I. Jordan, editor, Learning in Graphical Models. Kluwer, Dordrecht, Netherlands, 1998. J. A. Hoeting, D. Madigan, A. Raftery, and C. T. Volinsky. Bayesian model averaging: A tutorial. Statistical Science, 14(4), 1999. D. Koller and A. Pfeffer. Object-oriented Bayesian networks. In Proceedings Thirteenth Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 302–313, 1997. D. Koller and A. Pfeffer. Probabilistic frame-based systems. In Proceedings National Conference on Artiﬁcial Intelligence (AAAI), pages 580–587, 1998. E. Lander. Array of hope. Nature Genetics, 21:3–4, 1999. H. Langseth and T. D. Nielsen. Fusion of domain knowledge with data for structural learning in object oriented domains. Machine Learning Research, 4:339–368, 2003. D. Pe’er, A. Regev, G. Elidan, and N. Friedman. Inferring subnetworks from perturbed expression proﬁles. Bioinformatics, 17(Suppl 1):S215–24, 2001. 587 S EGAL , P E ’ ER , R EGEV, KOLLER AND F RIEDMAN N. E. Savin. The Bonferroni and the Scheffe multiple comparison procedures. Review of Economic Studies, 47(1):255–73, 1980. E. Segal, B. Taskar, A. Gasch, N. Friedman, and D. Koller. Rich probabilistic models for gene expression. Bioinformatics, 17(Suppl 1):S243–52, 2001. E. Segal, A. Battle, and D. Koller. Decomposing gene expression into cellular processes. In Proceedings Eighth Paciﬁc Symposium on Biocomputing (PSB), 2003. E. Segal, M. Shapira, A. Regev, D. Pe’er, D. Botstein, D. Koller, and N. Friedman. Module networks: Discovering regulatory modules and their condition speciﬁc regulators from gene expression data. Nature Genetics, 34(2):166–176, 2003. 588