jmlr jmlr2007 jmlr2007-18 jmlr2007-18-reference knowledge-graph by maker-knowledge-mining

18 jmlr-2007-Characterizing the Function Space for Bayesian Kernel Models

Source: pdf

Author: Natesh S. Pillai, Qiang Wu, Feng Liang, Sayan Mukherjee, Robert L. Wolpert

Abstract: Kernel methods have been very popular in the machine learning literature in the last ten years, mainly in the context of Tikhonov regularization algorithms. In this paper we study a coherent Bayesian kernel model based on an integral operator deﬁned as the convolution of a kernel with a signed measure. Priors on the random signed measures correspond to prior distributions on the functions mapped by the integral operator. We study several classes of signed measures and their image mapped by the integral operator. In particular, we identify a general class of measures whose image is dense in the reproducing kernel Hilbert space (RKHS) induced by the kernel. A consequence of this result is a function theoretic foundation for using non-parametric prior speciﬁcations in Bayesian modeling, such as Gaussian process and Dirichlet process prior distributions. We discuss the construction of priors on spaces of signed measures using Gaussian and L´ vy processes, e with the Dirichlet processes being a special case the latter. Computational issues involved with sampling from the posterior distribution are outlined for a univariate regression and a high dimensional classiﬁcation problem. Keywords: reproducing kernel Hilbert space, non-parametric Bayesian methods, L´ vy processes, e Dirichlet processes, integral operator, Gaussian processes c 2007 Natesh S. Pillai, Qiang Wu, Feng Liang, Sayan Mukherjee and Robert L. Wolpert. P ILLAI , W U , L IANG , M UKHERJEE AND W OLPERT

reference text

David Applebaum. L´ vy Processes and Stochasitic Calculus. Cambridge Studies in Advanced e Mathematics. Cambridge Univ. Press, Cambridge, UK, 2004. Nachman Aronszajn. Theory of reproducing kernels. T. Am. Math. Soc., 686:337–404, 1950. Mikhail Belkin and Partha Niyogi. Semi-supervised learning on Riemannian manifolds. Machine Learning, 56(1-3):209–239, 2004. Mikhail Belkin, Partha Niyogi, and Vikas Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J. Mach. Learn. Res., 7:2399–2434, 2006. David M. Blei and Michael I. Jordan. Variational inference for Dirichlet process mixtures. Bayesian Anal., 1(1):121–143 (electronic), 2006. Olivier Bousquet and Andr´ Elisseeff. Stability and generalization. J. Mach. Learn. Res., 2:499– e 526, 2002. Sounak Chakraborty, Malay Ghosh, and Bani K. Mallick. Bayesian non-linear regression for large p small n problems. J. Am. Stat. Assoc., 2005. Under revision. Corinna Cortes and Vladimir N. Vapnik. Support-vector networks. Machine Learning, 20(3):273– 297, 1995. Felipe Cucker and Stephen Smale. On the mathematical foundations of learning. Bulletin of the American Mathematical Society, 39:1–49, 2001. Carl de Boor and Robert E. Lynch. On splines and their minimum properties. J. Math. Mech., 15: 953–969, 1966. Ronald A. DeVore, Ralph Howard, and Charles A. Micchelli. Optimal nonlinear approximation. Manuskripta Mathematika, 1989. 1793 P ILLAI , W U , L IANG , M UKHERJEE AND W OLPERT Persi Diaconis. Bayesian numerical analysis. In Shanti S. Gupta and James O. Berger, editors, Statistical decision theory and related topics, IV, volume 1, pages 163–175. Springer-Verlag, New York, NY, 1988. Michael D. Escobar and Mike West. Bayesian density estimation and inference using mixtures. J. Am. Stat. Assoc., 90:577–588, 1995. Theodoros Evgeniou, Massimiliano Pontil, and Tomaso Poggio. Regularization networks and support vector machines. Advances in Computational Mathematics, 13:1–50, 2000. Thomas S. Ferguson. Prior distributions on spaces of probability measures. Ann. Stat., 2:615–629, 1974. Thomas S. Ferguson. A Bayesian analysis of some nonparametric problems. Ann. Stat., 1:209–230, 1973. Erik Ivar Fredholm. Sur une nouvelle m´ thode pour la r´ solution du probl` m de Dirichlet. Eue e e vres compl` tes:publi´ es sous les auspices de la Kungliga svenska vetensakademien par l’Institut e e Mittag-Lefﬂer, pages 61–68, 1900. Subhashis Ghosal and Anindya Roy. Posterior consistency of Gaussian process prior for nonparametric binary regression. Ann. Statist., 34(5):2413–2429, 2006. Jacques Hadamard. Sur les probl` mes aux d´ riv´ es partielles et leur signiﬁcation physique. Princee e e ton University Bulletin, pages 49–52, 1902. Jaroslav H´ jek. On linear statistical problems in stochastic processes. Czechoslovak Math. J., 12 a (87):404–444, 1962. Jaroslv H´ jek. On a property of normal distributions of any stochastic process. Select. Transl. Math. a Statist. and Probability, 1:245–252, 1961. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001. ¨ Lancelot F. James, Antonio Lijoa, and Igor Prunster. Conjugacy as a distinctive feature of the Dirichlet process. Scand. J. Stat., 33:105–120, 2005. Iain Johnstone. Function estimation in Gaussian noise: sequence models. Draft of a monograph, 1998. Gopinath Kallianpur. The role of reproducing kernel Hilbert spaces in the study of Gaussian processes. Advances in Probability and Related Topics, 2:49–83, 1970. Hermann K¨ nig. Eigenvalue distribution of compact operators, volume 16 of Operator Theory: o Advances and Applications. Birkh¨ user, Basel, CH, 1986. a George S. Kimeldorf and Grace Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist., 41(2):495–502, 1971. 1794 C HARACTERIZING THE F UNCTION S PACE FOR B AYESIAN K ERNEL M ODELS Feng Liang, Sayan Mukherjee, and Mike West. Understanding the use of unlabelled data in predictive modeling. Stat. Sci., 2006. To appear. Feng Liang, Ming Liao, Kai Mao, Sayan Mukherjee, and Mike West. Non-parametric Bayesian kernel models. Discussion Paper 2007-10, Duke University ISDS, Durham, NC, 2007. URL {\emwww.stat.duke.edu/research/papers/}. Milan N. Luki´ and Jay H. Beder. Stochasitic processes with sample paths in reproducing kernel c Hilbert spaces. T. Am. Math. Soc., 353(10):3945–3969, 2001. ¨ Stephen MacEachern and Peter Muller. Estimating mixture of Dirichlet process models. J. Comput. Graph. Stat., pages 223–238, 1998. Vladimir G. Mazja. Sobolev Spaces. Springer-Verlag, New York, NY, 1985. Peter M¨ ller, Fernando Quintana, and Gary Rosner. A method for combining inference across u related nonparametric Bayesian models. J. Am. Stat. Assoc., pages 735–749, 2004. James Mercer. Functions of positive and negative type and their connection with the theory of integral equations. Philosophical Transactions of the Royal Society, London A, 209:415–446, 1909. Charles A. Micchelli and Grace Wahba. Design problems for optimal surface interpolation. In Zvi Ziegler, editor, Approximation Theory and Applications, pages 329–348, 1981. Sayan Mukherjee, Pablo Tamayo, Simon Rogers, Ryan M. Rifkin, Anna Engle, Colin Campbell, Todd R. Golub, and Jill P. Mesirov. Estimating dataset size requirements for classifying DNA Microarray data. Journal of Computational Biology, 10:119–143, 2003. Neal, R. M. Bayesian Learning for Neural Networks. Springer, New York, 1996. Lecture Notes in Statistics 118. Emanuel Parzen. Probability density functionals and reproducing kernel Hilbert spaces. In Murray Rosenblatt, editor, Proceedings of the Symposium on Time Series Analysis, pages 155–169, New York, NY, 1963. John Wiley & Sons. Tomaso Poggio and Federico Girosi. Regularization algorithms for learning that are equivalent to multilayer networks. Science, 247:978–982, 1990. Tomaso Poggio, Ryan M. Rifkin, Sayan Mukherjee, and Partha Niyogi. General conditions for predictivity in learning theory. Nature, 428:419–422, 2004. Sridhar Ramaswamy, Pablo Tamayo, Ryan M. Rifkin, Sayan Mukherjee, Chen-Hsiang Yeang, Michael Angelo, Christine Ladd, Michael Reich, Eva Latulippe, Jill P. Mesirov, Tomaso Poggio, William Gerald, Massimo Loda, Eric S. Lander, and Todd R. Golub. Multiclass cancer diagnosis using tumor gene expression signatures. Proc. Nat. Aca. Sci., 98:149–54, 2001. Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. 1795 P ILLAI , W U , L IANG , M UKHERJEE AND W OLPERT L. Chris G. Rogers and David Williams. Diffusions, Markov Processes, and Martingales, volume 2. John Wiley & Sons, New York, NY, 1987. ISBN 0-471-91482-7. Bernhard Sch¨ lkopf and Alexander J. Smola. Learning with Kernels: Support Vector Machines, o Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, 2001. Isaac J. Schoenberg. Positive deﬁnite functions on spheres. Duke Mathematics Journal, 9:96–108, 1942. John S. Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge Univ. Press, Cambridge, UK, 2004. Peter Sollich. Bayesian methods for support vector machines: Evidence and predictive class probabilities. Machine Learning, 46(1-3):21–52, 2002. Andrei Nikolaevich Tikhonov. Solution of incorrectly formulated problems and the regularization method. Soviet Doklady, 4:1035–1038, 1963. Michael E. Tipping. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res., 1:211–244, 2001. Chong Tu, Merlise A. Clyde, and Robert L. Wolpert. L´ vy adaptive regression kernels. Discussion e Paper 2006-08, Duke University ISDS, Durham, NC, 2006. URL http://www.stat.duke. edu/research/papers/. Vladimir N. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, NY, 1998. Grace Wahba. Splines Models for Observational Data, volume 59 of Series in Applied Mathematics. SIAM, Philadelphia, PA, 1990. Grace Wahba. Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV. In Bernhard Sch¨ lkopf, Alexander J. Smola, Christopher J. C. Burges, and Rosanna Soentpiet, o editors, Advances in Kernel Methods: Support Vector Learning, pages 69–88. MIT Press, Cambridge, MA, 1999. Larry Wasserman. All of Nonparametric Statistics. Springer-Verlag, 2005. Mike West. Hyperparameter estimation in Dirichlet process mixture models. Discussion Paper 1992-03, Duke University ISDS, Durham, NC, 1992. URL http://www.stat.duke.edu/ research/papers/. Robert L. Wolpert and Katja Ickstadt. Reﬂecting uncertainty in inverse problems: A Bayesian solution using L´ vy processes. Inverse Problems, 20(6):1759–1771, 2004. e Robert L. Wolpert, Katja Ickstadt, and Martin Bøgsted Hansen. A nonparametric Bayesian approach to inverse problems (with discussion). In Jos´ Miguel Bernardo, Maria Jesus Bayarri, e James O. Berger, A. Phillip Dawid, David Heckerman, Adrian F. M. Smith, and Mike West, editors, Bayesian Statistics 7, pages 403–418, Oxford, UK, 2003. Oxford Univ. Press. ISBN 0-19-852615-6. 1796 C HARACTERIZING THE F UNCTION S PACE FOR B AYESIAN K ERNEL M ODELS Eric P. Xing, Roded Sharan, and Michael I. Jordan. Bayesian haplotype inference via the Dirichlet process. In Carla E. Brodley, editor, Machine Learning, Proceedings of the 21 st International Conference (ICML 2004), Banff, Canada, New York, NY, 2004. ACM Press. URL http://www. aicml.cs.ualberta.ca/\_banff04/icml/pages/accepted.htm. Eric P. Xing, Kyung-Ah Sohn, Michael I. Jordan, and Yee-Whye Teh. Bayesian multi-population haplotype inference via a hierarchical Dirichlet process mixture. In William Cohen and Andrew Moore, editors, Machine Learning, Proceedings of the 23 rd International Conference (ICML 2006), Pittsburgh, PA, New York, NY, 2006. ACM Press. URL http://www.icml2006.org/ icml2006/technical/accepted.html. Ding-Xuan Zhou. Capacity of reproducing kernel spaces in learning theory. IEEE T. Inform. Theory, 49:1743–1752, 2003. 1797