nips nips2005 nips2005-58 nips2005-58-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan
Abstract: In this paper, we provide a general theorem that establishes a correspondence between surrogate loss functions in classification and the family of f -divergences. Moreover, we provide constructive procedures for determining the f -divergence induced by a given surrogate loss, and conversely for finding all surrogate loss functions that realize a given f -divergence. Next we introduce the notion of universal equivalence among loss functions and corresponding f -divergences, and provide necessary and sufficient conditions for universal equivalence to hold. These ideas have applications to classification problems that also involve a component of experiment design; in particular, we leverage our results to prove consistency of a procedure for learning a classifier under decentralization requirements. 1
[1] S. M. Ali and S. D. Silvey. A general class of coefficients of divergence of one distribution from another. J. Royal Stat. Soc. Series B, 28:131–142, 1966.
[2] P. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classification and risk bounds. Journal of the American Statistical Association, 2005. To appear.
[3] D. Blackwell. Equivalent comparisons of experiments. Annals of Statistics, 24(2):265–272, 1953.
[4] I. Csisz´ r. Information-type measures of difference of probability distributions and indirect a observation. Studia Sci. Math. Hungar, 2:299–318, 1967.
[5] T. Kailath. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. on Communication Technology, 15(1):52–60, 1967.
[6] X. Nguyen, M. J. Wainwright, and M. I. Jordan. Nonparametric decentralized detection using kernel methods. IEEE Transactions on Signal Processing, 53(11):4053–4066, 2005.
[7] X. Nguyen, M. J. Wainwright, and M. I. Jordan. On divergences, surrogate loss functions and decentralized detection. Technical Report 695, Department of Statistics, University of California at Berkeley, September 2005.
[8] H. V. Poor and J. B. Thomas. Applications of Ali-Silvey distance measures in the design of generalized quantizers for binary decision systems. IEEE Trans. on Communications, 25:893– 900, 1977.
[9] G. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970.
[10] I. Steinwart. Consistency of support vector machines and other regularized kernel machines. IEEE Trans. Info. Theory, 51:128–142, 2005.
[11] F. Topsoe. Some inequalities for information divergence and related measures of discrimination. IEEE Transactions on Information Theory, 46:1602–1609, 2000.
[12] J. Tsitsiklis. Extremal properties of likelihood-ratio quantizers. IEEE Trans. on Communication, 41(4):550–558, 1993.
[13] T. Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization. Annal of Statistics, 53:56–134, 2004.