nips nips2012 nips2012-117 nips2012-117-reference knowledge-graph by maker-knowledge-mining

117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation


Source: pdf

Author: Kumar Sricharan, Alfred O. Hero

Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suffer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for sufficiently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed offline. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suffer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1


reference text

[1] I. Ahmad and Pi-Erh Lin. A nonparametric estimation of the entropy for absolutely continuous distributions (corresp.). Information Theory, IEEE Trans. on, 22(3):372 – 375, May 1976.

[2] J. Beirlant, EJ Dudewicz, L. Gy¨rfi, and EC Van der Meulen. Nonparametric entropy estimo ation: An overview. Intl. Journal of Mathematical and Statistical Sciences, 6:17–40, 1997.

[3] L. Birge and P. Massart. Estimation of integral functions of a density. The Annals of Statistics, 23(1):11–29, 1995.

[4] D. Chauveau and P. Vandekerkhove. Selection of a MCMC simulation strategy via an entropy convergence criterion. ArXiv Mathematics e-prints, May 2006.

[5] J.A. Costa and A.O. Hero. Geodesic entropic graphs for dimension and entropy estimation in manifold learning. Signal Processing, IEEE Transactions on, 52(8):2210–2221, 2004.

[6] P. B. Eggermont and V. N. LaRiccia. Best asymptotic normality of the kernel density entropy estimator for smooth densities. Information Theory, IEEE Trans. on, 45(4):1321 –1326, May 1999.

[7] E. Gin´ and D.M. Mason. Uniform in bandwidth estimation of integral functionals of the e density function. Scandinavian Journal of Statistics, 35:739761, 2008.

[8] M. Goria, N. Leonenko, V. Mergel, and P. L. Novi Inverardi. A new class of random vector entropy estimators and its applications in testing statistical hypotheses. Nonparametric Statistics, 2004.

[9] R. Gupta. Quantization Strategies for Low-Power Communications. PhD thesis, University of Michigan, Ann Arbor, 2001.

[10] L. Gy¨rfi and E. C. van der Meulen. Density-free convergence properties of various estimators o of entropy. Comput. Statist. Data Anal., pages 425–436, 1987.

[11] L. Gy¨rfi and E. C. van der Meulen. An entropy estimate based on a kernel density estimation. o Limit Theorems in Probability and Statistics, pages 229–240, 1989.

[12] P. Hall and S. C. Morton. On the estimation of the entropy. Ann. Inst. Statist. Meth., 45:69–88, 1993.

[13] K. Hlav´ˇkov´-Schindler, M. Paluˇ, M. Vejmelka, and J. Bhattacharya. Causality detection ac a s based on information-theoretic approaches in time series analysis. Physics Reports, 441(1):1– 46, 2007.

[14] A.T. Ihler, J.W. Fisher III, and A.S. Willsky. Nonparametric estimators for online signature authentication. In Acoustics, Speech, and Signal Processing, 2001. Proceedings.(ICASSP’01). 2001 IEEE International Conference on, volume 6, pages 3473–3476. IEEE, 2001.

[15] H. Joe. Estimation of entropy and other functionals of a multivariate density. Annals of the Institute of Statistical Mathematics, 41(4):683–697, 1989.

[16] G. Lanckriet, N. Cristianini, P. Bartlett, and L. El Ghaoui. Learning the kernel matrix with semi-definite programming. Journal of Machine Learning Research, 5:2004, 2002.

[17] B. Laurent. Efficient estimation of integral functionals of a density. The Annals of Statistics, 24(2):659–681, 1996.

[18] N. Leonenko, L. Prozanto, and V. Savani. A class of R´nyi information estimators for multie dimensional densities. Annals of Statistics, 36:2153–2182, 2008.

[19] E. Liiti¨inen, A. Lendasse, and F. Corona. On the statistical estimation of r´nyi entropies. a e In Proceedings of IEEE/MLSP 2009 International Workshop on Machine Learning for Signal Processing, Grenoble (France), September 2-4 2009.

[20] D. Pal, B. Poczos, and C. Szepesvari. Estimation of R´nyi entropy and mutual information e based on generalized nearest-neighbor graphs. In Proc. Advances in Neural Information Processing Systems (NIPS). MIT Press, 2010.

[21] Robert E. Schapire. The strength of weak learnability. Machine Learning, 5(2):197–227–227, June 1990.

[22] K. Sricharan and A. O. Hero, III. Ensemble estimators for multivariate entropy estimation. ArXiv e-prints, March 2012.

[23] C. Studholme, C. Drapaca, B. Iordanova, and V. Cardenas. Deformation-based mapping of volume change from serial brain mri in the presence of local tissue contrast change. Medical Imaging, IEEE Transactions on, 25(5):626–639, 2006.

[24] B. van Es. Estimating functionals related to a density by class of statistics based on spacing. Scandinavian Journal of Statistics, 1992. 9