jmlr jmlr2011 jmlr2011-100 jmlr2011-100-reference knowledge-graph by maker-knowledge-mining

100 jmlr-2011-Unsupervised Supervised Learning II: Margin-Based Classification Without Labels

Source: pdf

Author: Krishnakumar Balasubramanian, Pinar Donmez, Guy Lebanon

Abstract: Many popular linear classiﬁers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled data set. We develop a novel technique for estimating such risks using only unlabeled data and the marginal label distribution. We prove that the proposed risk estimator is consistent on high-dimensional data sets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classiﬁers in transfer learning, and for training classiﬁers with no labeled data whatsoever. Keywords: classiﬁcation, large margin, maximum likelihood

reference text

J. Behboodian. Information matrix for a mixture of two normal distributions. Journal of Statistical Computation and Simulation, 1(4):295–314, 1972. K. N. Berk. A central limit theorem for m-dependent random variables with unbounded m. The Annals of Probability, 1(2):352–354, 1973. V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1):105–111, 1995. V. Castelli and T. M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Transactions on Information Theory, 42(6): 2102–2117, 1996. W. Dai, Q. Yang, G.-R. Xue, and Y. Yu. Boosting for transfer learning. In Proc. of International Conference on Machine Learning, 2007. J. Davidson. Stochastic Limit Theory: An Introduction for Econometricians. Oxford University Press, USA, 1994. P. Donmez, G. Lebanon, and K. Balasubramanian. Unsupervised supervised learning I: Estimating classiﬁcation and regression error rates without labels. Journal of Machine Learning Research, 11(April):1323–1351, 2010. T. S. Ferguson. A Course in Large Sample Theory. Chapman & Hall, 1996. A. Frank and A. Asuncion. UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. Available at http://archive.ics.uci.edu/ml/, 2010. R. Gomes, A. Krause, and P. Perona. Discriminative clustering by regularized information maximization. In Advances in Neural Information Processing Systems 24, 2010. W. Hoeffding and H. Robbins. The central limit theorem for dependent random variables. Duke Mathematical Journal, 15:773–780, 1948. 3144 M ARGIN -BASED C LASSIFICATION WITHOUT L ABELS J. J. Ratsaby and S. S. Venkatesh. Learning from a mixture of labeled and unlabeled examples with parametric side information. In Annual conference on Computational learning theory, 1995. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004. T. V. Pham, M. Worring, and A. W. M. Smeulders. Face detection by aggregated bayesian network classiﬁers. Pattern Recognition Letters, 23(4):451–461, February 2002. N. Quadrianto, A. J. Smola, T. S. Caetano, and Q. V. Le. Estimating labels from label proportions. Journal of Machine Learning Research, 10:2349–2374, 2009. Y. Rinott. On normal approximation rates for certain sums of dependent random variables. Journal of Computational and Applied Mathematics, 55(2):135–143, 1994. H. Teicher. Identiﬁability of ﬁnite mixtures. The Annals of Mathematical Statistics, 34(4):1265– 1269, 1963. A. Tewari and P. L. Bartlett. On the consistency of multiclass classiﬁcation methods. Journal of Machine Learning Research, pages 1007–1025, 2007. X. Zhu and A. B. Goldberg. Introduction to Semi-supervised Learning. Morgan & Claypool Publishers, 2009. 3145