nips nips2002 nips2002-124 nips2002-124-reference knowledge-graph by maker-knowledge-mining

124 nips-2002-Learning Graphical Models with Mercer Kernels

Source: pdf

Author: Francis R. Bach, Michael I. Jordan

Abstract: We present a class of algorithms for learning the structure of graphical models from data. The algorithms are based on a measure known as the kernel generalized variance (KGV), which essentially allows us to treat all variables on an equal footing as Gaussians in a feature space obtained from Mercer kernels. Thus we are able to learn hybrid graphs involving discrete and continuous variables of arbitrary type. We explore the computational properties of our approach, showing how to use the kernel trick to compute the relevant statistics in linear time. We illustrate our framework with experiments involving discrete and continuous data.

reference text

[1] D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197–243, 1995.

[2] W. Lam and F. Bacchus. Learning Bayesian belief networks: An approach based on the MDL principle. Computational Intelligence, 10(4):269–293, 1994.

[3] D. Geiger and D. Heckerman. Learning Gaussian networks. In Proc. UAI, 1994.

[4] J. Pearl. Causality: Models, Reasoning and Inference. Cambridge University Press, 2000.

[5] S. Della Pietra, V. J. Della Pietra, and J. D. Lafferty. Inducing features of random ﬁelds. IEEE Trans. PAMI, 19(4):380–393, 1997.

[6] F. R. Bach and M. I. Jordan. Kernel independent component analysis. Journal of Machine Learning Research, 3:1–48, 2002.

[7] S. L. Lauritzen. Graphical Models. Clarendon Press, 1996.

[8] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2001. o

[9] T. M. Cover and J. A. Thomas. Elements of Information Theory. Wiley & Sons, 1991.

[10] D. M. Chickering. Learning Bayesian networks is NP-complete. In Learning from Data: Artiﬁcial Intelligence and Statistics 5. Springer-Verlag, 1996.

[11] T. W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley & Sons, 1984.

[12] R. G. Cowell. Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In Proc. UAI, 2001.

[13] D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In Adv. NIPS 12, 2000.

[14] N. Friedman and M. Goldszmidt. Discretizing continuous attributes while learning Bayesian networks. In Proc. ICML, 1996.

[15] N. Friedman and M. Goldszmidt. Learning Bayesian networks with local structure. In Learning in Graphical Models. MIT Press, 1998.