jmlr jmlr2007 jmlr2007-44 jmlr2007-44-reference knowledge-graph by maker-knowledge-mining

44 jmlr-2007-Large Margin Semi-supervised Learning

Source: pdf

Author: Junhui Wang, Xiaotong Shen

Abstract: In classiﬁcation, semi-supervised learning occurs when a large amount of unlabeled data is available with only a small number of labeled data. In such a situation, how to enhance predictability of classiﬁcation through unlabeled data is the focus. In this article, we introduce a novel large margin semi-supervised learning methodology, using grouping information from unlabeled data, together with the concept of margins, in a form of regularization controlling the interplay between labeled and unlabeled data. Based on this methodology, we develop two speciﬁc machines involving support vector machines and ψ-learning, denoted as SSVM and SPSI, through difference convex programming. In addition, we estimate the generalization error using both labeled and unlabeled data, for tuning regularizers. Finally, our theoretical and numerical analyses indicate that the proposed methodology achieves the desired objective of delivering high performance in generalization, particularly against some strong performers. Keywords: generalization, grouping, sequential quadratic programming, support vectors

reference text

R. A. Adams. Sobolev Spaces. Academic Press, New York, 1975. M. Amini, and P. Gallinari. Semi-supervised learning with an explicit label-error model for misclassiﬁed data. In IJCAI 2003. L. An and P. Tao. Solving a class of linearly constrained indeﬁnite quadratic problems by D.C. algorithms. J. of Global Optimization, 11:253-285, 1997. R. Ando and T. Zhang. A framework for learning predictive structures from multiple tasks and unlabeled data. Technical Report RC23462, IBM T.J. Watson Research Center, 2004. 1889 WANG AND S HEN M. Balcan, A. Blum, P. Choi, J. Lafferty, B. Pantano, M. Rwebangira and X. Zhu. Person identiﬁcation in webcam images: an application of semi-supervised learning. In ICML 2005. P. L. Bartlett, M. I. Jordan and J. D. McAuliffe. Convexity, classiﬁcation, and risk bounds. J. Amer. Statist. Assoc., 19:138-156, 2006. M. Belkin, P. Niyogi and V. Sindhwani. Manifold Regularization : A Geometric Framework for Learning From Examples. Technical Report, Univ. of Chicago, Department of Computer Science, TR-2004-06, 2004. C. L. Blake and C. J. Merz. UCI repository of machine learning databases [http://www.ics.ci.edu/∼mlearn/MLRepository.html]. University of California, Irvine, Department of Information and Computer Science, 1998. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, 1998. M. Collins and Y. Singer. Unsupervised models for named entity classiﬁcation. In Empirical Methods in Natural Language Processing and Very Large Corpora, pages 100-110, 1999. C. Cortes and V. Vapnik. Support vector networks. Machine Learning, 20:273-297, 1995. F. G. Cozman, I. Cohen and M. C. Cirelo. Semi-supervised learning of mixture models and Bayesian networks. In ICML 2003. B. Efron. The estimation of prediction error: J. Amer. Statist. Assoc., 99:619-632, 2004. Covariance penalties and cross-validation. C. Gu. Multidimension smoothing with splines. In M.G. Schimek, editor, Smoothing and Regression: Approaches, Computation and Application, 2000. T. Hastie, S. Rosset, R. Tibshirani and J. Zhu. The entire regularization path for the support vector machine. J. of Machine Learning Research, 5: 1391-1415, 2004. T. Joachims. Transductive inference for text classiﬁcation using support vector machines. In ICML 1999. Y. Lin. Support vector machines and the Bayes rule in classiﬁcation. Data Mining and Knowledge Discovery, 6:259-275, 2002. Y. Lin and L. D. Brown. Statistical properties of the method of regularization with periodic Gaussian reproducing kernel . Ann. Statist., 32:1723-1743, 2004. S. Liu, X. Shen and W. Wong. Computational development of ψ-learning. In SIAM 2005 International Data Mining Conference, pages 1-12, 2005. Y. Liu and X. Shen. Multicategory ψ-learning. J. Amer. Statist. Assoc., 101:500-509, 2006. P. Mason, L. Baxter, J. Bartlett and M. Frean. Boosting algorithms as gradient descent. In Advances in Neural Information Processing Systems 12, pages 512-518. The MIT Press, 2000. 1890 L ARGE M ARGIN S EMI - SUPERVISED L EARNING K. Nigam, A. McCallum, S. Thrun and T. Mitchell . Text classiﬁcation from labeled and unlabeled documents using EM. In AAAI 1998. B. Sch¨ lkopf, A. Smola, R. Williamson and P. Bartlett. New support vector algorithms. Neural o Computation, 12:1207-1245, 2000. X. Shen and W. Wong. Convergence rate of sieve estimates. Ann. Statist., 22:580-615, 1994. X. Shen. On the method of penalization. Statist. Sinica, 8:337-357, 1998. X. Shen and H. C. Huang. Optimal model assessment, selection and combination. J. Amer. Statist. Assoc., 101:554-568, 2006. X. Shen, G. C. Tseng, X. Zhang and W. Wong. On psi-learning. J. Amer. Statist. Assoc., 98:724-734, 2003. X. Shen and L. Wang. Discussion of 2004 IMS Medallion Lecture: “Local Rademacher complexities and oracle inequalities in risk minimization”. Ann. Statist., in press. I. Steinwart. On the inﬂuence of the kernel on the consistency of support vector machines. J. Machine Learning Research, 2:67-93, 2001. M. Szummer and T. Jaakkola. Information regularization with partially labeled data. In NIPS 2003. S. Van De Geer. Hellinger-consistency of certain nonparametric maximum likelihood estimators. Ann. Statist., 21:14-44, 1993. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. G. Wahba. Spline models for observational data. Series in Applied Mathematics, Vol. 59, SIAM, Philadelphia, 1990. J. Wang and X. Shen. Estimation of generalization error: random and ﬁxed inputs. Statist. Sinica, 16:569-588, 2006. J. Wang, X. Shen and W. Pan. On transductive support vector machines. In Proc. of the Snowbird Machine Learning Conference, in press. T. Zhang and F. Oles. A probability analysis on the value of unlabeled data for classiﬁcation problems. In ICML 2000. D. Zhou. The covering number in learning theory. J. of Complexity, 18:739-767, 2002. J. Zhu and T. Hastie. Kernel logistic regression and the import vector machine. J. Comp. Graph. Statist., 14:185-205, 2005. X. Zhu, Z. Ghahramani and J. Lafferty. Semi-supervised learning using gaussian ﬁelds and harmonic functions. In ICML 2003. X. Zhu and J. Lafferty. Harmonic mixtures: combining mixture models and graph-based methods for inductive and scalable semi-supervised learning. In ICML 2005. 1891