jmlr jmlr2011 jmlr2011-100 jmlr2011-100-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Krishnakumar Balasubramanian, Pinar Donmez, Guy Lebanon
Abstract: Many popular linear classifiers, such as logistic regression, boosting, or SVM, are trained by optimizing a margin-based risk function. Traditionally, these risk functions are computed based on a labeled data set. We develop a novel technique for estimating such risks using only unlabeled data and the marginal label distribution. We prove that the proposed risk estimator is consistent on high-dimensional data sets and demonstrate it on synthetic and real-world data. In particular, we show how the estimate is used for evaluating classifiers in transfer learning, and for training classifiers with no labeled data whatsoever. Keywords: classification, large margin, maximum likelihood
J. Behboodian. Information matrix for a mixture of two normal distributions. Journal of Statistical Computation and Simulation, 1(4):295–314, 1972. K. N. Berk. A central limit theorem for m-dependent random variables with unbounded m. The Annals of Probability, 1(2):352–354, 1973. V. Castelli and T. M. Cover. On the exponential value of labeled samples. Pattern Recognition Letters, 16(1):105–111, 1995. V. Castelli and T. M. Cover. The relative value of labeled and unlabeled samples in pattern recognition with an unknown mixing parameter. IEEE Transactions on Information Theory, 42(6): 2102–2117, 1996. W. Dai, Q. Yang, G.-R. Xue, and Y. Yu. Boosting for transfer learning. In Proc. of International Conference on Machine Learning, 2007. J. Davidson. Stochastic Limit Theory: An Introduction for Econometricians. Oxford University Press, USA, 1994. P. Donmez, G. Lebanon, and K. Balasubramanian. Unsupervised supervised learning I: Estimating classification and regression error rates without labels. Journal of Machine Learning Research, 11(April):1323–1351, 2010. T. S. Ferguson. A Course in Large Sample Theory. Chapman & Hall, 1996. A. Frank and A. Asuncion. UCI machine learning repository. University of California, School of Information and Computer Science, Irvine, CA. Available at http://archive.ics.uci.edu/ml/, 2010. R. Gomes, A. Krause, and P. Perona. Discriminative clustering by regularized information maximization. In Advances in Neural Information Processing Systems 24, 2010. W. Hoeffding and H. Robbins. The central limit theorem for dependent random variables. Duke Mathematical Journal, 15:773–780, 1948. 3144 M ARGIN -BASED C LASSIFICATION WITHOUT L ABELS J. J. Ratsaby and S. S. Venkatesh. Learning from a mixture of labeled and unlabeled examples with parametric side information. In Annual conference on Computational learning theory, 1995. D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004. T. V. Pham, M. Worring, and A. W. M. Smeulders. Face detection by aggregated bayesian network classifiers. Pattern Recognition Letters, 23(4):451–461, February 2002. N. Quadrianto, A. J. Smola, T. S. Caetano, and Q. V. Le. Estimating labels from label proportions. Journal of Machine Learning Research, 10:2349–2374, 2009. Y. Rinott. On normal approximation rates for certain sums of dependent random variables. Journal of Computational and Applied Mathematics, 55(2):135–143, 1994. H. Teicher. Identifiability of finite mixtures. The Annals of Mathematical Statistics, 34(4):1265– 1269, 1963. A. Tewari and P. L. Bartlett. On the consistency of multiclass classification methods. Journal of Machine Learning Research, pages 1007–1025, 2007. X. Zhu and A. B. Goldberg. Introduction to Semi-supervised Learning. Morgan & Claypool Publishers, 2009. 3145