jmlr jmlr2013 jmlr2013-99 jmlr2013-99-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jun Wang, Tony Jebara, Shih-Fu Chang
Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy
A. Azran. The rendezvous algorithm: Multiclass semi-supervised learning with markov random walks. In Proceedings of the International Conference on Machine Learning, pages 49–56, Corvalis, Oregon, 2007. ACM. M.-F. Balcan, A. Blum, and K. Yang. Co-training and expansion: towards bridging theory and practice. In Advances in Neural Information Processing Systems, volume 17, pages 89–96. 2005. F. Barahona, M. Gr¨ tschel, M. J¨ nger, and G. Reinelt. An application of combinatorial optimization o u to statistical physics and circuit layout design. Operations Research, 36(3):493–513, 1988. 796 S EMI -S UPERVISED L EARNING U SING G REEDY M AX -C UT M. Belkin, P. Niyogi, and V. Sindhwani. On manifold regularization. In Proceedings of International Conference on Artificial Intelligence and Statistics, Barbados, January 2005. M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: a Geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:2399– 2434, 2006. T. D. Bie and N. Cristianini. Convex methods for transduction. In Advances in Neural Information Processing Systems, volume 16. 2004. A. Blum and S. Chawla. Learning from labeled and unlabeled data using graph mincuts. In Proceedings of International Conference on Machine Learning, pages 19–26, San Francisco, CA, USA, 2001. A. Blum and T. Mitchell. Combining labeled and unlabeled data with co-training. In Proceedings of the Eleventh Annual Conference on Computational Learning Theory, pages 92–100, Madison, Wisconsin, United States, 1998. ACM. A. Blum, J. Lafferty, M. R. Rwebangira, and R. Reddy. Semi-supervised learning using randomized mincuts. In Proceedings of the Twenty-first International Conference on Machine Learning, pages 13–20, Banff, Alberta, Canada, 2004. M. Carreira-Perpin´ n and R. S. Zemel. Proximity graphs for clustering and manifold learning. In a Advances in neural information processing systems, volume 17, pages 225–232. 2005. O. Chapelle and A. Zien. Semi-supervised classification by low density separation. In Proceedings of International Conference on Artificial Intelligence and Statistics, Barbados, January 2005. O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors. Semi-Supervised Learning. MIT Press, Cambridge, o MA, 2006. URL http://www.kyb.tuebingen.mpg.de/ssl-book. O. Chapelle, V. Sindhwani, and S. S. Keerthi. Branch and bound for semi-supervised support vector machines. In Advances in Neural Information Processing Systems, volume 19, pages 217–224. Cambridge, MA, 2007. O. Chapelle, V. Sindhwani, and S. S. Keerthi. Optimization techniques for semi-supervised support vector machines. The Journal of Machine Learning Research, 9:203–233, 2008. N. V. Chawla and G. Karakoulas. Learning from labeled and unlabeled data: An empirical study across techniques and domains. Journal of Artificial Intelligence Research, 23(1):331–366, 2005. F. R. K. Chung and N. Biggs. Spectral Graph Theory. American Mathematical Society Providence, RI, 1997. S. A. Cook. The complexity of theorem-proving procedures. In Proceedings of the Third Annual ACM Symposium on Theory of Computing, pages 151–158, Shaker Heights, Ohio, United States, 1971. ACM. M. M. Deza and M. Laurent. Geometry of Cuts and Metrics. Springer Verlag, 2009. 797 WANG , J EBARA AND C HANG J. Edmonds and E. Johnson. Matching: A well-solved class of integer linear programs. Combinatorial OptimizationEureka, You Shrink!, pages 27–30, 2003. A. Frank and A. Asuncion. UCI machine learning repository, http://archive.ics.uci.edu/ml. 2010. URL M. X. Goemans and D. P. Williamson. .879-approximation algorithms for MAX CUT and MAX 2SAT. In Proceedings of the Twenty-sixth Annual ACM Symposium on Theory of Computing, pages 422–431, Montreal, Quebec, Canada, 1994. M. X. Goemans and D. P. Williamson. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. Journal of the ACM, 42(6):1115–1145, 1995. A.B. Goldberg, X. Zhu, A. Furger, and J.M. Xu. Oasis: Online active semi-supervised learning. In Proceedings of the Twenty-Fifth AAAI Conference on Artificial Intelligence, 2011. S. Goldman and Y. Zhou. Enhancing supervised learning with unlabeled data. In Proceedings of the 17th International Conference on Machine Learning, pages 327–334, 2000. B. Huang and T. Jebara. Loopy belief propagation for bipartite maximum weight b-matching. In Int. Workshop on Artificial Intelligence and Statistics, 2007. B. Huang and T. Jebara. Collaborative filtering via rating concentration. In Y.W. Teh and M. Titterington, editors, Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, volume Volume 9 of JMLR: W&CP;, pages 334–341, May 13-15 2010. T. Jebara, J. Wang, and S.-F. Chang. Graph construction and b-matching for semi-supervised learning. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 441–448, 2009. T. Joachims. Transductive inference for text classifcation using support vector machines. In Proceedings of International Conference on Machine Learning, pages 200–209, 1999. T. Joachims. Transductive learning via spectral graph partitioning. In Proceedings of International Conference on Machine Learning, pages 290–297, 2003. R. M. Karp. Reducibility among combinatorial problems. Complexity of Computer Computations, 43:85–103, 1972. M.-A. Krogel and T. Scheffer. Multi-relational learning, text mining, and semi-supervised learning for functional genomics. Machine Learning, 57(1):61–81, 2004. B. Kveton, M. Valko, A. Rahimi, and L. Huang. Semi-supervised learning with max-margin graph cuts. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, pages 421–428, 2010. W. Liu, J. Wang, and S.-F. Chang. Robust and scalable graph-based semisupervised learning. Proceedings of the IEEE, 100(9):2624–2638, 2012. 798 S EMI -S UPERVISED L EARNING U SING G REEDY M AX -C UT M. Maier, U. von Luxburg, and M. Hein. Influence of graph construction on graph-based clustering measures. In Advances in Neural Information Processing Systems, volume 22, pages 1025–1032. 2009. G. S. Mann and A. McCallum. Simple, robust, scalable semi-supervised learning via expectation regularization. In Proceedings of International Conference on Machine Learning, pages 593– 600, Corvalis, Oregon, 2007. C. Mathieu and W. Schudy. Yet another algorithm for dense max cut: go greedy. In Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 176–182, 2008. S. Melacci and M. Belkin. Laplacian support vector machines trained in the primal. Journal of Machine Learning Research, 12:1149–1184, 2011. H. Mobahi, R. Collobert, and J. Weston. Deep learning from temporal coherence in video. In Proceedings of the 26th Annual International Conference on Machine Learning, pages 737–744, 2009. S.A. Nene, S.K. Nayar, and H. Murase. Columbia object image library (coil-20). Dept. Comput. Sci., Columbia Univ., New York.[Online] http://www. cs. columbia. edu/CAVE/coil-20.html, 1996. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semisupervised learning. In Proceedings of the 22nd International Conference on Machine Learning, pages 824–831, Bonn, Germany, 2005. V. Sindhwani, J. Hu, and A. Mojsilovic. Regularized co-clustering with dual supervision. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems, pages 976–983. MIT Press, 2008. M. Szummer and T. Jaakkola. Partially labeled classification with Markov random walks. In Advances in Neural Information Processing Systems, volume 14, pages 945–952. 2002. V. Vapnik. Statistical Learning Theory. Wiley, New York, 1998. J. Wang, S.-F. Chang, X. Zhou, and T. C. S. Wong. Active Microscopic Cellular Image Annotation by Superposable Graph Transduction with Imbalanced Labels. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Alaska, USA, June 2008a. J. Wang, T. Jebara, and S.-F. Chang. Graph transduction via alternating minimization. In Proceedings of International Conference on Machine Learning, pages 1144–1151, Helsinki, Finland, 2008b. J. Wang, Y.-G. Jiang, and S.-F. Chang. Label diagnosis through self tuning for web image search. In IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Miami Beach, Florida, USA, June 2009. W. Wang and Z.-H. Zhou. A new analysis of co-training. In Proceedings of the 27th International Conference on Machine Learning, pages 1135–1142, Haifa, Israel, June 2010. 799 WANG , J EBARA AND C HANG Z. Xu, R. Jin, J. Zhu, I. King, and M. Lyu. Efficient convex relaxation for transductive support vector machine. volume 21, pages 1641–1648. 2008. D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch¨ lkopf. Learning with local and global o consistency. In S. Thrun, L. Saul, and B. Sch¨ lkopf, editors, Advances in Neural Information o Processing Systems, volume 16, pages 321–328. 2004. X. Zhu. Semi-supervised learning literature survey. Technical Report 1530, Computer Sciences, University of Wisconsin-Madison, 2005. X. Zhu and A. B. Goldberg. Introduction to Semi-supervised Learning. Morgan & Claypool Publishers, 2009. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In Proceedings of International Conference on Machine Learning, pages 912– 919, 2003. 800