nips nips2004 nips2004-204 nips2004-204-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Liam Paninski
Abstract: We develop a family of upper and lower bounds on the worst-case expected KL loss for estimating a discrete distribution on a finite number m of points, given N i.i.d. samples. Our upper bounds are approximationtheoretic, similar to recent bounds for estimating discrete entropy; the lower bounds are Bayesian, based on averages of the KL loss under Dirichlet distributions. The upper bounds are convex in their parameters and thus can be minimized by descent methods to provide estimators with low worst-case error; the lower bounds are indexed by a one-dimensional parameter and are thus easily maximized. Asymptotic analysis of the bounds demonstrates the uniform KL-consistency of a wide class of estimators as c = N/m → ∞ (no matter how slowly), and shows that no estimator is consistent for c bounded (in contrast to entropy estimation). Moreover, the bounds are asymptotically tight as c → 0 or ∞, and are shown numerically to be tight within a factor of two for all c. Finally, in the sparse-data limit c → 0, we find that the Dirichlet-Bayes (add-constant) estimator with parameter scaling like −c log(c) optimizes both the upper and lower bounds, suggesting an optimal choice of the “add-constant” parameter in this regime.
1. D. Mackay, L. Peto, Natural Language Engineering 1, 289 (1995). 2. N. Friedman, Y. Singer, NIPS (1998). 3. A. Orlitsky, N. Santhanam, J. Zhang, Science 302, 427 (2003). 4. T. Cover, IEEE Transactions on Information Theory 18, 216 (1972). 5. R. Krichevsky, V. Trofimov, IEEE Transactions on Information Theory 27, 199 (1981). 6. D. Braess, H. Dette, Sankhya 66, 707 (2004). 7. R. Krichevsky, IEEE Transactions on Information Theory 44, 296 (1998). 8. D. Braess, T. Sauer, Journal of Approximation Theory 128, 187 (2004). 9. T. Schurmann, P. Grassberger, Chaos 6, 414 (1996). 10. I. Nemenman, F. Shafee, W. Bialek, NIPS 14 (2002). 11. L. Paninski, Neural Computation 15, 1191 (2003). 12. L. Paninski, IEEE Transactions on Information Theory 50, 2200 (2004). 13. G. Watson, Approximation theory and numerical methods (Wiley, Boston, 1980). 14. D. Braess, J. Forster, T. Sauer, H. Simon, Algorithmic Learning Theory 13, 380 (2002).