jmlr jmlr2013 jmlr2013-44 jmlr2013-44-reference knowledge-graph by maker-knowledge-mining

44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

Source: pdf

Author: Pekka Parviainen, Mikko Koivisto

Abstract: We consider the problem of ﬁnding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n , to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: ﬁrst, the user may trade space against time; second, the proposed algorithms easily and efﬁciently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P , which can be signiﬁcantly less than 2n . Considering sufﬁciently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders. Keywords: exact algorithm, parallelization, partial order, space-time tradeoff, structure learning

reference text

R. Bellman. Dynamic programming treatment of the travelling salesman problem. Journal of the ACM, 9(1):61–63, 1962. A. Bj¨ rklund and T. Husfeldt. Exact algorithms for exact satisﬁability and number of perfect matcho ings. Algorithmica, 52:226–249, 2008. H. L. Bodlaender, F. V. Fomin, A. M. C. A. Koster, D. Kratsch, and D. M. Thilikos. On exact algorithms for treewidth. In Proceedings of the 14th Annual European Symposium on Algorithms (ESA), pages 672–683, 2006. H. L. Bodlaender, F. V. Fomin, A. M. C. A. Koster, D. Kratsch, and D. M. Thilikos. A note on exact algorithms for vertex ordering problems on graphs. Theory of Computing Systems, 50(3): 420–432, 2012. G. Brightwell and P. Winkler. Counting linear extensions. Order, 8:225–242, 1991. D. M. Chickering. Learning from Data: Artiﬁcial Intelligence and Statistics V, chapter Learning Bayesian networks is NP-Complete, pages 121–130. Springer-Verlag, 1996. D. M. Chickering, D. Heckerman, and C. Meek. Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5:1287–1330, 2004. H. Chockler and J. Y. Halpern. Responsibility and blame: A structural-model approach. Journal of Artiﬁcial Intelligence Research, 22:93–115, 2004. G. F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309–347, 1992. 1413 PARVIAINEN AND KOIVISTO J. Cussens. Bayesian network learning with cutting planes. In Proceedings of the 27th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 153–160, 2011. B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 2003. C. P. de Campos and Q. Ji. Efﬁcient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12:663–689, 2011. C. P. de Campos, Z. Zeng, and Q. Ji. Structure learning of Bayesian networks using constraints. In Proceedings of the 26th International Conference on Machine Learning (ICML), pages 113–120, 2009. L. M. de Campos. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. Journal of Machine Learning Research, 7:2149–2187, 2006. K. Etminani, M. Naghibzadeh, and A. R. Razavi. Globally optimal structure learning of Bayesian networks from data. In Proceedings of the 20th International Conference on Artiﬁcial Neural Networks (ICANN), pages 101–106, 2010. J. Flum and M. Grohe. Parametrized Complexity Theory. Springer, 2006. Y. Gurevich and S. Shelah. Expected computation time for Hamiltonian path problem. SIAM Journal of Computation, 16(3):486–502, 1987. D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197–243, 1995. M. Held and R. M. Karp. A dynamic programming approach to sequencing problem. Journal of the Society for Industrial and Applied Mathematics, 10(1):196–210, 1962. T. Jaakkola, D. Sontag, A. Globerson, and M. Meila. Learning Bayesian network structure using LP relaxations. In Proceedings of the 13th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), JMLR: W&CP;, pages 358–365, 2010. M. Koivisto. Parent assignment is hard for the MDL, AIC, and NML costs. In Proceedings of the 19th Annual Conference on Learning Theory (COLT), pages 289–303, 2006. M. Koivisto and P. Parviainen. A space–time tradeoff for permutation problems. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 484–492, 2010. M. Koivisto and K. Sood. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5:549–573, 2004. B. Malone, C. Yuan, E. A. Hansen, and S. Bridges. Improving the scalability of optimal Bayesian network learning with external-memory frontier breadth-ﬁrst branch and bound search. In Proceedings of the 27th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 479–488, 2011. S. Ott and S. Miyano. Finding optimal gene networks using biological constraints. Genome Informatics, 14:124–133, 2003. 1414 O PTIMAL BAYESIAN N ETWORKS U SING P RECEDENCE C ONSTRAINTS P. Parviainen and M. Koivisto. Exact structure discovery in Bayesian networks with less space. In Proceedings of the 25th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 436– 443, 2009. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, 1988. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2000. S. Provan and M. O. Ball. On the complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal of Computing, 12:777–788, 1983. W. Savitch. Relationship between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4:177–192, 1970. T. Silander and P. Myllym¨ ki. A simple approach for ﬁnding the globally optimal Bayesian network a structure. In Proceedings of the 22th Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 445–452, 2006. A. P. Singh and A. W. Moore. Finding optimal Bayesian networks by dynamic programming. Technical Report CMU-CALD-05-106, Carnegie Mellon University, June 2005. P. Spirtes and C. Glymour. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9:62–72, 1991. G. Steiner. On the complexity of dynamic programming with precedence constraints. Annals of Operations Research, 26:103–123, 1990. Y. Tamada, S. Imoto, and S. Miyano. Parallel algorithm for learning optimal Bayesian network structure. Journal of Machine Learning Research, 12:2437–2459, 2011. T. S. Verma and J. Pearl. Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artiﬁcial Intelligence (UAI), pages 255–270, 1990. C. Yuan, B. Malone, and X. Wu. Learning optimal Bayesian networks using A* search. In Proceedings of the 22nd International Joint Conference on Artiﬁcial Intelligence (IJCAI), pages 2186– 2191, 2011. 1415