nips nips2005 nips2005-58 nips2005-58-reference knowledge-graph by maker-knowledge-mining

58 nips-2005-Divergences, surrogate loss functions and experimental design

Source: pdf

Author: Xuanlong Nguyen, Martin J. Wainwright, Michael I. Jordan

Abstract: In this paper, we provide a general theorem that establishes a correspondence between surrogate loss functions in classiﬁcation and the family of f -divergences. Moreover, we provide constructive procedures for determining the f -divergence induced by a given surrogate loss, and conversely for ﬁnding all surrogate loss functions that realize a given f -divergence. Next we introduce the notion of universal equivalence among loss functions and corresponding f -divergences, and provide necessary and sufﬁcient conditions for universal equivalence to hold. These ideas have applications to classiﬁcation problems that also involve a component of experiment design; in particular, we leverage our results to prove consistency of a procedure for learning a classiﬁer under decentralization requirements. 1

reference text

[1] S. M. Ali and S. D. Silvey. A general class of coefﬁcients of divergence of one distribution from another. J. Royal Stat. Soc. Series B, 28:131–142, 1966.

[2] P. Bartlett, M. I. Jordan, and J. D. McAuliffe. Convexity, classiﬁcation and risk bounds. Journal of the American Statistical Association, 2005. To appear.

[3] D. Blackwell. Equivalent comparisons of experiments. Annals of Statistics, 24(2):265–272, 1953.

[4] I. Csisz´ r. Information-type measures of difference of probability distributions and indirect a observation. Studia Sci. Math. Hungar, 2:299–318, 1967.

[5] T. Kailath. The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. on Communication Technology, 15(1):52–60, 1967.

[6] X. Nguyen, M. J. Wainwright, and M. I. Jordan. Nonparametric decentralized detection using kernel methods. IEEE Transactions on Signal Processing, 53(11):4053–4066, 2005.

[7] X. Nguyen, M. J. Wainwright, and M. I. Jordan. On divergences, surrogate loss functions and decentralized detection. Technical Report 695, Department of Statistics, University of California at Berkeley, September 2005.

[8] H. V. Poor and J. B. Thomas. Applications of Ali-Silvey distance measures in the design of generalized quantizers for binary decision systems. IEEE Trans. on Communications, 25:893– 900, 1977.

[9] G. Rockafellar. Convex Analysis. Princeton University Press, Princeton, 1970.

[10] I. Steinwart. Consistency of support vector machines and other regularized kernel machines. IEEE Trans. Info. Theory, 51:128–142, 2005.

[11] F. Topsoe. Some inequalities for information divergence and related measures of discrimination. IEEE Transactions on Information Theory, 46:1602–1609, 2000.

[12] J. Tsitsiklis. Extremal properties of likelihood-ratio quantizers. IEEE Trans. on Communication, 41(4):550–558, 1993.

[13] T. Zhang. Statistical behavior and consistency of classiﬁcation methods based on convex risk minimization. Annal of Statistics, 53:56–134, 2004.