jmlr jmlr2007 jmlr2007-13 jmlr2007-13-reference knowledge-graph by maker-knowledge-mining

13 jmlr-2007-Bayesian Quadratic Discriminant Analysis

Source: pdf

Author: Santosh Srivastava, Maya R. Gupta, Béla A. Frigyik

Abstract: Quadratic discriminant analysis is a common tool for classiﬁcation, but estimation of the Gaussian parameters can be ill-posed. This paper contains theoretical and algorithmic contributions to Bayesian estimation for quadratic discriminant analysis. A distribution-based Bayesian classiﬁer is derived using information geometry. Using a calculus of variations approach to deﬁne a functional Bregman divergence for distributions, it is shown that the Bayesian distribution-based classiﬁer that minimizes the expected Bregman divergence of each class conditional distribution also minimizes the expected misclassiﬁcation cost. A series approximation is used to relate regularized discriminant analysis to Bayesian discriminant analysis. A new Bayesian quadratic discriminant analysis classiﬁer is proposed where the prior is deﬁned using a coarse estimate of the covariance based on the training data; this classiﬁer is termed BDA7. Results on benchmark data sets and simulations show that BDA7 performance is competitive with, and in some cases signiﬁcantly better than, regularized quadratic discriminant analysis and the cross-validated Bayesian quadratic discriminant analysis classiﬁer Quadratic Bayes. Keywords: quadratic discriminant analysis, regularized quadratic discriminant analysis, Bregman divergence, data-dependent prior, eigenvalue decomposition, Wishart, functional analysis

reference text

S. Amari and H. Nagaoka. Methods of Information Geometry. American Mathematical Society, USA, 2000. T. W. Anderson. An Introduction to Multivariate Statistical Analysis. Wiley-Interscience, USA, 2003. A. Banerjee, X. Guo, and H. Wang. On the optimality of conditional expectation as a Bregman predictor. IEEE Trans. on Information Theory, 51(7):2664–2669, 2005a. A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:1705–1749, 2005b. 1303 S RIVASTAVA , G UPTA AND F RIGYIK H. Bensmail and G. Celeux. Regularized Gaussian discriminant analysis through eigenvalue decomposition. Journal of the American Statistical Association, 91:1743–1748, 1996. P. J. Bickel and B. Li. Regularization in statistics. Test, 15(2):271–344, 2006. G. E. P. Box and G. C. Tiao. Bayesian Inference in Statistical Analysis. Addison-Wesley, Reading, Massachusetts, 1973. P. J. Brown, T. Fearn, and M. S. Haque. Discrimination with many variables. Journal of the American Statistical Association, 94(448):1320–1329, 1999. S. Censor and Y. Zenios. Parallel Optimization: Theory, Algorithms, and Applications. Oxford University Press, Oxford, England, 1997. I. Csisz´ r. Generalized projections for non-negative functions. Acta Mathematica Hungarica, 68: a 161–185, 1995. P. S. Dwyer. Some applications of matrix derivatives in multivariate analysis. Journal of the American Statistical Association, 333:607–625, 1967. B. Efron and C. Morris. Multivariate empirical Bayes and estimation of covariance matrices. The Annals of Statistics, 4:22–32, 1976. J. H. Friedman. Regularized discriminant analysis. Journal of the American Statistical Association, 84(405):165–175, 1989. S. Geisser. Posterior odds for multivariate normal distributions. Journal of the Royal Society Series B Methodological, 26:69–76, 1964. S. Geisser. Predictive Inference: An Introduction. Chapman & Hall, New York, 1993. I. M. Gelfand and S. V. Fomin. Calculus of Variations. Dover, USA, 2000. A. K. Gupta and D. K. Nagar. Matrix Variate Distributions. Chapman and Hall/CRC, Florida, 2000. L. R. Haff. Empirical Bayes estimation of the multivariate normal covariance matrix. The Annals of Statistics, 8(3):586–597, 1980. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, New York, 2001. J. P. Hoffbeck and D. A. Landgrebe. Covariance matrix estimation and classiﬁcation with limited training data. IEEE Trans. on Pattern Analysis and Machine Intelligence, 18:763–767, 1996. E. T. Jaynes and G. T. Bretthorst. Probability Theory: the Logic of Science. Cambridge University Press, Cambridge, 2003. R. E. Kass. The geometry of asymptotic inference. Statistical Science, 4(3):188–234, 1989. D. G. Keehn. A note on learning for Gaussian properties. IEEE Trans. on Information Theory, 11: 126–132, 1965. 1304 BAYESIAN Q UADRATIC D ISCRIMINANT A NALYSIS E. L. Lehmann and G. Casella. Theory of Point Estimation. Springer, New York, 1998. S. J. Raudys and A. K. Jain. Small sample size effects in statistical pattern recognition: Recommendations for practitioners. IEEE Trans. on Pattern Analysis and Machine Intelligence, 13(3): 252–264, 1991. B. Ripley. Pattern Recognition and Neural Nets. Cambridge University Press, Cambridge, 2001. S. Srivastava and M. R. Gupta. Distribution-based Bayesian minimum expected risk for discriminant analysis. Proc. of the IEEE Intl. Symposium on Information Theory, pages 2294–2298, 2006. D. L. Swets and J. Weng. Using discriminant eigenfeatures for image retrieval. IEEE Trans. on Pattern Analysis and Machine Intelligence, 18(8):831–836, 1996. L. Wasserman. Asymptotic inference of mixture models using data-dependent prior. Journal of the Royal Statistical Society Series B, 62(1):159–180, 2000. J. Ye. Characterization of a family of algorithms for generalized discriminant analysis of undersampled problems. Journal of Machine Learning Research, 6:483–502, 2005. 1305