jmlr jmlr2013 jmlr2013-44 jmlr2013-44-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Pekka Parviainen, Mikko Koivisto
Abstract: We consider the problem of finding a directed acyclic graph (DAG) that optimizes a decomposable Bayesian network score. While in a favorable case an optimal DAG can be found in polynomial time, in the worst case the fastest known algorithms rely on dynamic programming across the node subsets, taking time and space 2n , to within a factor polynomial in the number of nodes n. In practice, these algorithms are feasible to networks of at most around 30 nodes, mainly due to the large space requirement. Here, we generalize the dynamic programming approach to enhance its feasibility in three dimensions: first, the user may trade space against time; second, the proposed algorithms easily and efficiently parallelize onto thousands of processors; third, the algorithms can exploit any prior knowledge about the precedence relation on the nodes. Underlying all these results is the key observation that, given a partial order P on the nodes, an optimal DAG compatible with P can be found in time and space roughly proportional to the number of ideals of P , which can be significantly less than 2n . Considering sufficiently many carefully chosen partial orders guarantees that a globally optimal DAG will be found. Aside from the generic scheme, we present and analyze concrete tradeoff schemes based on parallel bucket orders. Keywords: exact algorithm, parallelization, partial order, space-time tradeoff, structure learning
R. Bellman. Dynamic programming treatment of the travelling salesman problem. Journal of the ACM, 9(1):61–63, 1962. A. Bj¨ rklund and T. Husfeldt. Exact algorithms for exact satisfiability and number of perfect matcho ings. Algorithmica, 52:226–249, 2008. H. L. Bodlaender, F. V. Fomin, A. M. C. A. Koster, D. Kratsch, and D. M. Thilikos. On exact algorithms for treewidth. In Proceedings of the 14th Annual European Symposium on Algorithms (ESA), pages 672–683, 2006. H. L. Bodlaender, F. V. Fomin, A. M. C. A. Koster, D. Kratsch, and D. M. Thilikos. A note on exact algorithms for vertex ordering problems on graphs. Theory of Computing Systems, 50(3): 420–432, 2012. G. Brightwell and P. Winkler. Counting linear extensions. Order, 8:225–242, 1991. D. M. Chickering. Learning from Data: Artificial Intelligence and Statistics V, chapter Learning Bayesian networks is NP-Complete, pages 121–130. Springer-Verlag, 1996. D. M. Chickering, D. Heckerman, and C. Meek. Large-sample learning of Bayesian networks is NP-hard. Journal of Machine Learning Research, 5:1287–1330, 2004. H. Chockler and J. Y. Halpern. Responsibility and blame: A structural-model approach. Journal of Artificial Intelligence Research, 22:93–115, 2004. G. F. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309–347, 1992. 1413 PARVIAINEN AND KOIVISTO J. Cussens. Bayesian network learning with cutting planes. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pages 153–160, 2011. B. A. Davey and H. A. Priestley. Introduction to Lattices and Order. Cambridge University Press, 2003. C. P. de Campos and Q. Ji. Efficient structure learning of Bayesian networks using constraints. Journal of Machine Learning Research, 12:663–689, 2011. C. P. de Campos, Z. Zeng, and Q. Ji. Structure learning of Bayesian networks using constraints. In Proceedings of the 26th International Conference on Machine Learning (ICML), pages 113–120, 2009. L. M. de Campos. A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. Journal of Machine Learning Research, 7:2149–2187, 2006. K. Etminani, M. Naghibzadeh, and A. R. Razavi. Globally optimal structure learning of Bayesian networks from data. In Proceedings of the 20th International Conference on Artificial Neural Networks (ICANN), pages 101–106, 2010. J. Flum and M. Grohe. Parametrized Complexity Theory. Springer, 2006. Y. Gurevich and S. Shelah. Expected computation time for Hamiltonian path problem. SIAM Journal of Computation, 16(3):486–502, 1987. D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197–243, 1995. M. Held and R. M. Karp. A dynamic programming approach to sequencing problem. Journal of the Society for Industrial and Applied Mathematics, 10(1):196–210, 1962. T. Jaakkola, D. Sontag, A. Globerson, and M. Meila. Learning Bayesian network structure using LP relaxations. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS), JMLR: W&CP;, pages 358–365, 2010. M. Koivisto. Parent assignment is hard for the MDL, AIC, and NML costs. In Proceedings of the 19th Annual Conference on Learning Theory (COLT), pages 289–303, 2006. M. Koivisto and P. Parviainen. A space–time tradeoff for permutation problems. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 484–492, 2010. M. Koivisto and K. Sood. Exact Bayesian structure discovery in Bayesian networks. Journal of Machine Learning Research, 5:549–573, 2004. B. Malone, C. Yuan, E. A. Hansen, and S. Bridges. Improving the scalability of optimal Bayesian network learning with external-memory frontier breadth-first branch and bound search. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI), pages 479–488, 2011. S. Ott and S. Miyano. Finding optimal gene networks using biological constraints. Genome Informatics, 14:124–133, 2003. 1414 O PTIMAL BAYESIAN N ETWORKS U SING P RECEDENCE C ONSTRAINTS P. Parviainen and M. Koivisto. Exact structure discovery in Bayesian networks with less space. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI), pages 436– 443, 2009. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, San Francisco, 1988. J. Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, Cambridge, 2000. S. Provan and M. O. Ball. On the complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal of Computing, 12:777–788, 1983. W. Savitch. Relationship between nondeterministic and deterministic tape complexities. Journal of Computer and System Sciences, 4:177–192, 1970. T. Silander and P. Myllym¨ ki. A simple approach for finding the globally optimal Bayesian network a structure. In Proceedings of the 22th Conference on Uncertainty in Artificial Intelligence (UAI), pages 445–452, 2006. A. P. Singh and A. W. Moore. Finding optimal Bayesian networks by dynamic programming. Technical Report CMU-CALD-05-106, Carnegie Mellon University, June 2005. P. Spirtes and C. Glymour. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9:62–72, 1991. G. Steiner. On the complexity of dynamic programming with precedence constraints. Annals of Operations Research, 26:103–123, 1990. Y. Tamada, S. Imoto, and S. Miyano. Parallel algorithm for learning optimal Bayesian network structure. Journal of Machine Learning Research, 12:2437–2459, 2011. T. S. Verma and J. Pearl. Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence (UAI), pages 255–270, 1990. C. Yuan, B. Malone, and X. Wu. Learning optimal Bayesian networks using A* search. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI), pages 2186– 2191, 2011. 1415