jmlr jmlr2011 jmlr2011-13 jmlr2011-13-reference knowledge-graph by maker-knowledge-mining

13 jmlr-2011-Bayesian Generalized Kernel Mixed Models


Source: pdf

Author: Zhihua Zhang, Guang Dai, Michael I. Jordan

Abstract: We propose a fully Bayesian methodology for generalized kernel mixed models (GKMMs), which are extensions of generalized linear mixed models in the feature space induced by a reproducing kernel. We place a mixture of a point-mass distribution and Silverman’s g-prior on the regression vector of a generalized kernel model (GKM). This mixture prior allows a fraction of the components of the regression vector to be zero. Thus, it serves for sparse modeling and is useful for Bayesian computation. In particular, we exploit data augmentation methodology to develop a Markov chain Monte Carlo (MCMC) algorithm in which the reversible jump method is used for model selection and a Bayesian model averaging method is used for posterior prediction. When the feature basis expansion in the reproducing kernel Hilbert space is treated as a stochastic process, this approach can be related to the Karhunen-Lo` ve expansion of a Gaussian process (GP). Thus, our sparse e modeling framework leads to a flexible approximation method for GPs. Keywords: reproducing kernel Hilbert spaces, generalized kernel models, Silverman’s g-prior, Bayesian model averaging, Gaussian processes


reference text

J. H. Albert and S. Chib. Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422):669–679, 1993. F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the Twenty-first International Conference on Machine Learning, 2004. J. M. Bernardo and A. F. M. Smith. Bayesian Theory. John Wiley and Sons, New York, 1994. S. P. Brooks. Quantitative convergence diagnosis for MCMC via CUSUMS. Statistics and Computing, 8(3):267–274, 1998. C. M. Carvalho, N. G. Polson, and J. G. Scott. Biometrika, 97(2):465–480, 2010. The horseshoe estimator for sparse signals. S. Chakraborty, M. Ghosh, and B. K. Mallick. Bayesian nonlinear regression for large p small n problems. Technical report, Department of Statistics, University of Florida, 2005. P. J. Diggle, J. A. Tawn, and R. A. Moyeed. Model-based geostatistics (with discussions). Applied Statistics, 47(3):299–350, 1998. M. Figueiredo. Adaptive sparseness for supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25(9):1150–1159, 2003. E. I. George and R. E. McCulloch. Approaches for Bayesian variable selection. Statistica Sinica, 7:339–374, 1997. M. A. Girolami and S. Rogers. Variational Bayesian multinomial probit regression with Gaussian process priors. Neural Computation, 18:1790–1817, 2006. 136 BAYESIAN G ENERALIZED K ERNEL M IXED M ODELS P. J. Green. Discussion of Dr. Silverman’s paper. Journal of the Royal Statistical Society, Series B, 47(1):29, 1985. P. J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82:711–732, 1995. A.K. Gupta and D.K. Nagar. Matrix Variate Distributions. Chapman & Hall/CRC, 2000. C. Hans. Bayesian lasso regression. Biometrika, 96(4):835–845, 2009. D. A. Harville. Maximum likelihood approaches to variance component estimation and to related problems. Journal of the American Statistical Association, 72(358):320–338, 1977. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer-Verlag, New York, 2001. C. C. Holmes and L. Held. Bayesian auxiliary variable models for binary and multinomial regression. Bayesian Analysis, 1(1):145–168, 2006. R. Kohn, M. Smith, and D. Chan. Nonparametric regression using linear combinations of basis functions. Statistics and Computing, 11:313–322, 2001. N. D. Lawrence, M. Seeger, and R. Herbrich. Fast sparse Gaussian process methods: the informative vector machine. In Advances in Neural Information Processing Systems 15, 2003. Y. Lee, Y. Lin, and G. Wahba. Multicategory support vector machines: theory and application to the classification of microarray data and satellite radiance data. Journal of the American Statistical Association, 99(465):67–81, 2004. Q. Li and N. Lin. The Bayesian elastic net. Bayesian Analysis, 5(1):151–170, 2010. F. Liang, R. Paulo, G. Molina, M. A. Clyde, and J. O. Berger. Mixtures of g-priors for Bayesian variable selection. Journal of the American Statistical Association, 103(481):410–423, 2008. F. Liang, K. Mao, M. Liao, R. F. MacLehose, and D. B. Dunson. Nonparametric Bayesian kernel models. In Discussion Paper 2005-09, Duke University ISDS, 2009. R. F. MacLehose and D. B. Dunson. Nonparametric Bayes kernel-based priors for functional data analysis. Statistica Sinica, 19:611–629, 2009. B. K. Mallick, D. Ghosh, and M. Ghosh. Bayesian classification of tumours by using gene expression data. Journal of the Royal Statistical Society Series B, 67:219–234, 2005. K. V. Mardia, J. T. Kent, and J. M. Bibby. Multivariate Analysis. Academic Press, New York, 1979. P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, New York, 1989. T. P. Minka. Expectation propagation for approximate Bayesian inference. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (UAI), pages 362–369, 2001. 137 Z HANG , DAI AND J ORDAN R. M. Neal. Regression and classification using Gaussian process priors (with discussion). In J. M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics, volume 6, pages 475–501. Oxford University Press, 1999. D. J. Nott and P. J. Green. Bayesian variable selection and the Swendsen-Wang algorithm. Journal of Computational and Graphical Statistics, 13(1):1–17, 2004. T. Park and G. Casella. The Bayesian lasso. Journal of the American Statistical Association, 103 (482):681–686, 2008. N. S. Pillai, Q. Wu, F. Liang, S. Mukherjee, and R. L. Wolpert. Characterizing the function space for Bayesian kernel models. Journal of Machine Learning Research, 8:1769–1797, 2007. J. Qui˜ onero-Candela and C. E. Rasmussen. A unifying view of sparse approximate Gaussian n process regression. Journal of Machine Learning Research, 6:1939–1959, 2005. A. E. Raftery, D. Madigan, and D. Hoeting. Bayesian model averaging for linear regression. Journal of the American Statistical Association, 92:179–191, 1997. C. E. Rasmussen and C. K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006. G. R¨ tsch, T. Onoda, and K. M¨ ller. Soft margins for Adaboost. Machine Learning, 42:287–320, a u 2001. N. Sha, M. Vannucci, M. G. Tadesse, P. J. Brown, I. Dragoni, N. Davies T. C. Roberts, A. Contestabile, M. Salmon, C. Buckley, and F. Falciani. Bayesian variable selection in multinomial probit models to identify molecular signatures of disease stage. Biometrics, 60:812–819, 2004. B. W. Silverman. Some aspects of the spline smoothing approach to non-parametric regression curve fitting (with discussion). Journal of the Royal Statistical Society, Series B, 47(1):1–52, 1985. M. Smith and R. Kohn. Nonparametric regression using Bayesian variable selection. Journal of Econometrics, 75:317–344, 1996. A. J. Smola and P. Bartlett. Sparse greedy Gaussian process regression. In Advances in Neural Information Processing Systems 13, 2001. E. Snelson and Z. Ghahramani. Sparse Gaussian processes using pseudo-inputs. In Neural Information Processing Systems 18, 2006. E. L. Snelson. Flexible and Efficient Gaussian Process Models for Machine Learning. PhD thesis, University College London, 2007. P. Sollich. Bayesian methods for support vector machines: evidence and predictive class probabilities. Machine Learning, 46:21–52, 2001. R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B, 58:267–288, 1996. 138 BAYESIAN G ENERALIZED K ERNEL M IXED M ODELS M. E. Tipping. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research, 1:211–244, 2001. V. Vapnik. Statistical Learning Theory. John Wiley and Sons, New York, 1998. G. Wahba. Spline Models for Observational Data. SIAM, Philadelphia, 1990. M. West. Bayesian factor regression models in the “large p, small n” paradigm. In J. M. Bernardo, M. J. Bayarri, J. O. Berger, A. P. Dawid, D. Heckerman, A. F. M. Smith, and M. West, editors, Bayesian Statistics 7, pages 723–732. Oxford University Press, 2003. C. K. I. Williams and D. Barber. Bayesian classification with Gaussian processes. IEEE Trans. PAMI, 20(12):1342–1351, 1998. C. K. I. Williams and M. Seeger. Using the Nystr¨ m method to speed up kernel machines. In o Advances in Neural Information Processing Systems 13, 2001. A. Zellner. On assessing prior distributions and Bayesian regression analysis with g−prior distributions. In P. K. Goel and A. Zellner, editors, Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pages 233–243. North-Holland, Amsterdam, 1986. Z. Zhang and M. I. Jordan. Bayesian multicategory support vector machines. In the Twenty-Second Conference on Uncertainty in Artificial Intelligence (UAI), 2006. Z. Zhang, G. Wu, and E. Y Chang. Semiparametric regression using Student t processes. IEEE Transactions on Neural Networks, 18(6):1572–1588, 2007. Z. Zhang, M. I. Jordan, and D.-Y. Yeung. Posterior consistency of the Silverman g-prior in Bayesian model choice. In Advances in Neural Information Processing Systems 22, 2008. J. Zhu and T. Hastie. Kernel logistic regression and the import vector machines. Journal of Computational and Graphical Statistics, 14(1):185–205, 2005. 139