nips nips2002 nips2002-68 nips2002-68-reference knowledge-graph by maker-knowledge-mining

68 nips-2002-Discriminative Densities from Maximum Contrast Estimation


Source: pdf

Author: Peter Meinicke, Thorsten Twellmann, Helge Ritter

Abstract: We propose a framework for classifier design based on discriminative densities for representation of the differences of the class-conditional distributions in a way that is optimal for classification. The densities are selected from a parametrized set by constrained maximization of some objective function which measures the average (bounded) difference, i.e. the contrast between discriminative densities. We show that maximization of the contrast is equivalent to minimization of an approximation of the Bayes risk. Therefore using suitable classes of probability density functions, the resulting maximum contrast classifiers (MCCs) can approximate the Bayes rule for the general multiclass case. In particular for a certain parametrization of the density functions we obtain MCCs which have the same functional form as the well-known Support Vector Machines (SVMs). We show that MCC-training in general requires some nonlinear optimization but under certain conditions the problem is concave and can be tackled by a single linear program. We indicate the close relation between SVM- and MCC-training and in particular we show that Linear Programming Machines can be viewed as an approximate realization of MCCs. In the experiments on benchmark data sets, the MCC shows a competitive classification performance.


reference text

[1] C. M. Bishop. Neural Networks for Pattern Recognition. Clarendon Press, Oxford, 1995.

[2] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273–297, 1995.

[3] R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973.

[4] T. Graepel, R. Herbrich, B. Scholkopf, A. Smola, P. Bartlett, K. Robert-Muller, K. Obermayer, and B. Williamson. Classification on proximity data with lp–machines, 1999.

[5] S.S. Keerthi, S.K. Shevade, C. Bhattacharyya, and K.R.K. Murthy. Improvements to platt’s SMO algorithm for SVM classifier design. Technical report, Dept of CSA, IISc, Bangalore, India, 1999.

[6] P. Meinicke, T. Twellmann, and H. Ritter. Maximum contrast classifiers. In Proc. of the Int. Conf. on Artificial Neural Networks, Berlin, 2002. Springer. in press.

[7] J. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Sch¨ lkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods — Support o Vector Learning, pages 185–208, Cambridge, MA, 1999. MIT Press.

[8] G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for AdaBoost. Technical Report NC-TRa u 1998-021, Department of Computer Science, Royal Holloway, University of London, Egham, UK, August 1998. Submitted to Machine Learning.

[9] B. D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1996.

[10] B. Sch¨ lkopf and A. J. Smola. Learning with Kernels. MIT Press, 2002. o

[11] D. W. Scott. Multivariate Density Estimation. Wiley, 1992.

[12] V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.