jmlr jmlr2010 jmlr2010-103 jmlr2010-103-reference knowledge-graph by maker-knowledge-mining

103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions

Source: pdf

Author: Shiliang Sun, John Shawe-Taylor

Abstract: In this paper, we propose a general framework for sparse semi-supervised learning, which concerns using a small portion of unlabeled data and a few labeled data to represent target functions and thus has the merit of accelerating function evaluations when predicting the output of a new example. This framework makes use of Fenchel-Legendre conjugates to rewrite a convex insensitive loss involving a regularization with unlabeled data, and is applicable to a family of semi-supervised learning methods such as multi-view co-regularized least squares and single-view Laplacian support vector machines (SVMs). As an instantiation of this framework, we propose sparse multi-view SVMs which use a squared ε-insensitive loss. The resultant optimization is an inf-sup problem and the optimal solutions have arguably saddle-point properties. We present a globally optimal iterative algorithm to optimize the problem. We give the margin bound on the generalization error of the sparse multi-view SVMs, and derive the empirical Rademacher complexity for the induced function class. Experiments on artiﬁcial and real-world data show their effectiveness. We further give a sequential training approach to show their possibility and potential for uses in large-scale problems and provide encouraging experimental results indicating the efﬁcacy of the margin bound and empirical Rademacher complexity on characterizing the roles of unlabeled data for semi-supervised learning. Keywords: semi-supervised learning, Fenchel-Legendre conjugate, representer theorem, multiview regularization, support vector machine, statistical learning theory

reference text

M. Balcan and A. Blum. A PAC-style model for learning from labeled and unlabeled data. In Proceedings of the 18th Annual Conference on Computational Learning Theory, pages 111–126, 2005. M. Balcan, A. Blum, and K. Yang. Co-training and expansion: Towards bridging theory and practice. Advances in Neural Information Processing Systems, 17:89–96, 2005. P. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3:463–482, 2002. 2453 S UN AND S HAWE -TAYLOR M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled exampls. Journal of Machine Learning Research, 7:2399– 2434, 2006. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory, pages 92–100, 1998. S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, England, 2004. U. Brefeld, T. G¨ rtner, T. Sheffer, and S. Wrobel. Efﬁcient co-regularized least squares regression. a In Proceedings of the 23th International Conference on Machine Learning, pages 137–144, 2006. O. Chapelle, B. Sch¨ lkopf, and A. Zien. Semi-supervised Learning. MIT Press, Cambridge, MA, o 2006. J. Farquhar, D. Hardoon, H. Meng, J. Shawe-Taylor, and S. Szedmak. Two view learning: SVM-2K, theory and practice. Advances in Neural Information Processing Systems, 18:355–362, 2006. G.H. Golub and C.F. Van Loan. Matrix Computations. Johns Hopkins University Press, USA, 1996. G. Kimeldorf and G. Wahba. Some results on Tchebychefﬁan spline functions. Journal of Mathematical Analysis and Applications, 33(1):82–95, 1971. R. Latala and K. Oleszkiewicz. On the best constant in the Khintchine-Kahane inequality. Studia Mathematica, 109(1):101–104, 1994. K. Nigam and R. Ghani. Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th International Conference on Information and Knowledge Management, pages 86–93, 2000. M. F. Porter. An algorithm for sufﬁx stripping. Program, 14:130–137, 1980. R. Rifkin and R. Lippert. Value regularization and Fenchel duality. Journal of Machine Learning Research, 8:441–479, 2007. D. Rosenberg and P. Bartlett. The Rademacher complexity of co-regularized kernel classes. Journal of Machine Learning Research Workshop and Conference Proceedings, 2:396–403, 2007. G. Salton and C. Buckley. Term-weighting approaches in automatic text retrieval. Information Processing and Management, 24:513–523, 1988. J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, England, 2004. V. Sindhwani and D. Rosenberg. An RKHS for multi-view learning and manifold co-regularization. In Proceedings of the 25th International Conference on Machine Learning, pages 976–983, 2008. V. Sindhwani, P. Niyogi, and M. Belkin. A co-regularization approach to semi-supervised learning with multiple views. In Proceedings of the Workshop on Learning with Multiple Views, 22nd ICML, 2005. 2454 S PARSE S EMI - SUPERVISED L EARNING U SING C ONJUGATE F UNCTIONS S. Sun. Semantic features for multi-view semi-supervised and active learning of text classiﬁcation. In Proceedings of the IEEE International Conference on Data Mining Workshops, pages 731– 735, 2008. S. Szedmak and J. Shawe-Taylor. Synthesis of maximum margin and multiview learning using unlabeled data. Neurocomputing, 70(7-9):1254–1264, 2007. I. Tsang and J. Kwok. Large-scale sparsiﬁed manifold regularization. Advances in Neural Information Processing Systems, 19:1401–1408, 2007. X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Department of Computer Sciences, University of Wisconsin Madison, 2008. 2455