jmlr jmlr2013 jmlr2013-29 jmlr2013-29-reference knowledge-graph by maker-knowledge-mining

29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs


Source: pdf

Author: Yu-Feng Li, Ivor W. Tsang, James T. Kwok, Zhi-Hua Zhou

Abstract: In this paper, we study the problem of learning from weakly labeled data, where labels of the training examples are incomplete. This includes, for example, (i) semi-supervised learning where labels are partially known; (ii) multi-instance learning where labels are implicitly known; and (iii) clustering where labels are completely unknown. Unlike supervised learning, learning with weak labels involves a difficult Mixed-Integer Programming (MIP) problem. Therefore, it can suffer from poor scalability and may also get stuck in local minimum. In this paper, we focus on SVMs and propose the W ELL SVM via a novel label generation strategy. This leads to a convex relaxation of the original MIP, which is at least as tight as existing convex Semi-Definite Programming (SDP) relaxations. Moreover, the W ELL SVM can be solved via a sequence of SVM subproblems that are much more scalable than previous convex SDP relaxations. Experiments on three weakly labeled learning tasks, namely, (i) semi-supervised learning; (ii) multi-instance learning for locating regions of interest in content-based information retrieval; and (iii) clustering, clearly demonstrate improved performance, and W ELL SVM is also readily applicable on large data sets. Keywords: weakly labeled data, semi-supervised learning, multi-instance learning, clustering, cutting plane, convex relaxation


reference text

S. Andrews, I. Tsochantaridis, and T. Hofmann. Support vector machines for multiple-instance learning. In S. Becker, S. Thrun, and K. Obermayer, editors, Advances in Neural Information Processing Systems 15, pages 577–584. MIT Press, Cambridge, MA, 2003. D. Angluin and P. Laird. Learning from noisy examples. Machine Learning, 2(4):343–370, 1988. F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In Proceedings of the 21st International Conference on Machine Learning, pages 41–48, Banff, Canada, 2004. 2183 L I , T SANG , K WOK AND Z HOU M. Belkin, P. Niyogi, and V. Sindhwani. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. Journal of Machine Learning Research, 7:2399– 2434, 2006. S. P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge, UK, 2004. S. S. Bucak, R. Jin, and A. K. Jain. Multi-label learning with incomplete class assignments. In Proceedings of International Conference on Computer Vision and Pattern Recognition, pages 2801–2808, Colorado Springs, CO, 2011. O. Chapelle. Training a support vector machine in the primal. Neural Computation, 19(5):1155– 1178, 2007. O. Chapelle and A. Zien. Semi-supervised classification by low density separation. In Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics, The Savannah Hotel, Barbados, 2005. O. Chapelle, M. Chi, and A. Zien. A continuation method for semi-supervised SVMs. In Proceedings of the 23rd International Conference on Machine Learning, pages 185–192, Pittsburgh, PA, 2006a. O. Chapelle, B. Sch¨ lkopf, and A. Zien. Semi-Supervised Learning. MIT Press, Cambridge, MA, o USA, 2006b. O. Chapelle, V. Sindhwani, and S. S. Keerthi. Branch and bound for semi-supervised support vector machines. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information o Processing Systems 19, pages 217–224. MIT Press, Cambridge, MA, 2007. O. Chapelle, V. Sindhwani, and S. S. Keerthi. Optimization techniques for semi-supervised support vector machines. Journal of Machine Learning Research, 9:203–233, 2008. P. M. Cheung and J. T. Kwok. A regularization framework for multiple-instance learning. In Proceedings of the 23th International Conference on Machine Learning, pages 193–200, Pittsburgh, PA, USA, 2006. R. Collobert, F. Sinz, J. Weston, and L. Bottou. Large scale transductive SVMs. Journal of Machine Learning Research, 7:1687–1712, 2006. K. Crammer and Y. Singer. On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research, 2:265–292, 2002. N. Cristianini, J. Shawe-Taylor, A. Elisseeff, and J. Kandola. On kernel-target alignment. In T. G. Dietterich, Z. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 367–373. MIT Press, Cambridge, MA, 2002. T. De Bie and N. Cristianini. Semi-supervised learning using semi-definite programming. In O. Chapelle, B. Sch¨ lkopf, and A. Zien, editors, Semi-Supervised Learning. MIT Press, Camo bridge, MA, 2006. 2184 C ONVEX AND S CALABLE W EAKLY L ABELED SVM S J. Demsar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30, 2006. T. G. Dietterich, R. H. Lathrop, and T. Lozano-P´ rez. Solving the multiple instance problem with e axis-parallel rectangles. Artificial Intelligence, 89(1-2):31–71, 1997. R.E. Fan, P.H. Chen, and C.J. Lin. Working set selection using second order information for training support vector machines. Journal of Machine Learning Research, 6:1889–1918, 2005. T. G¨ rtner, P. A. Flach, A. Kowalczyk, and A. J. Smola. Multi-instance kernels. In Proceedings a of the 19th International Conference on Machine Learning, pages 179–186, Sydney, Australia, 2002. Y. Guo. Max-margin multiple-instance learning via semidefinite programming. In Proceedings of the 1st Asian Conference on Machine Learning, pages 98–108, Nanjing, China, 2009. R. Horst and N.V. Thoai. DC programming: Overview. Journal of Optimization Theory and Applications, 103(1):1–43, 1999. C. J. Hsieh, K. W. Chang, C. J. Lin, S. S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear SVM. In Proceedings of the 25th International Conference on Machine Learning, pages 408–415, Helsinki, Finland, 2008. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data. Prentice Hall, Upper Saddle River, NJ, USA, 1988. T. Joachims. Transductive inference for text classification using support vector machines. In Proceedings of the 16th International Conference on Machine Learning, pages 200–209, Bled, Slovenia, 1999. T. Joachims. Training linear SVMs in linear time. In Proceedings of the 12th International Conference on Knowledge Discovery and Data mining, pages 217–226, Philadelphia, PA, 2006. J. E. Kelley. The cutting plane method for solving convex programs. Journal of the Society for Industrial and Applied Mathematics, 8(4):703–712, 1960. S.-J. Kim and S. Boyd. A minimax theorem with applications to machine learning, signal processing, and finance. SIAM Journal on Optimization, 19(3):1344–1367, 2008. M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. M¨ ller, and A. Zien. Efficient and accurate u lp-norm multiple kernel learning. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 997–1005. MIT Press, Cambridge, MA, 2009. G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5:27–72, 2004. Y.-F. Li and Z.-H. Zhou. Towards making unlabeled data never hurt. In Proceedings of the 28th International Conference on Machine Learning, pages 1081–1088, Bellevue, WA, 2011. 2185 L I , T SANG , K WOK AND Z HOU Y.-F. Li, J.T. Kwok, I.W. Tsang, and Z.-H. Zhou. A convex method for locating regions of interest with multi-instance learning. In Proceedings of the 20th European Conference on Machine Learning and Knowledge Discovery in Databases, pages 15–30, Bled, Slovenia, 2009a. Y.-F. Li, J.T. Kwok, and Z.-H. Zhou. Semi-supervised learning using label mean. In Proceedings of the 26th International Conference on Machine Learning, pages 633–640, Montreal, Canada, 2009b. Y.-F. Li, I.W. Tsang, J.T. Kwok, and Z.-H. Zhou. Tighter and convex maximum margin clustering. In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, pages 344–351, Clearwater Beach, FL, 2009c. Y.-F. Li, J.-H. Hu, Y. Jiang, and Z.-H. Zhou. Towards discovering what patterns trigger what labels. In Proceedings of the 26th AAAI Conference on Artificial Intelligence, pages 1012–1018, Toronto, Canada, 2012. M.S. Lobo, L. Vandenberghe, S. Boyd, and H. Lebret. Applications of second-order cone programming. Linear algebra and its applications, 284(1):193–228, 1998. O. Maron and A. L. Ratan. Multiple-instance learning for natural scene classification. In Proceedings of the 15th International Conference on Machine Learning, pages 341–349, Madison, WI, 1998. T.M. Mitchell. The discipline of machine learning. Technical report, Machine Learning Department, Carnegie Mellon University, 2006. Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial Algorithms in Convex Programming, volume 13. Society for Industrial Mathematics, 1987. J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning, pages 185–208. MIT Press, Cambridge, MA, USA, 1999. A. Rakotomamonjy, F. R. Bach, S. Canu, and Y. Grandvalet. SimpleMKL. Journal of Machine Learning Research, 9:2491–2521, 2008. B. Sch¨ lkopf and A.J. Smola. Learning with Kernels. MIT Press, 2002. o S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for svm. In Proceedings of the 24th International Conference on Machine Learning, pages 807–814, Corvallis, OR, 2007. V.S. Sheng, F. Provost, and P.G. Ipeirotis. Get another label? improving data quality and data mining using multiple, noisy labelers. In Proceedings of the 14th International Conference on Knowledge Discovery and Data mining, pages 614–622, Las Vegas, NV, 2008. J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000. 2186 C ONVEX AND S CALABLE W EAKLY L ABELED SVM S V. Sindhwani and S.S. Keerthi. Large scale semi-supervised linear SVMs. In Proceedings of the 29th annual International Conference on Research and Development in Information Retrieval, pages 477–484, Seattle, WA, 2006. V. Sindhwani, S.S. Keerthi, and O. Chapelle. Deterministic annealing for semi-supervised kernel machines. In Proceedings of the 23rd International Conference on Machine Learning, pages 841–848, Pittsburgh, PA, 2006. S. Sonnenburg, G. R¨ tsch, C. Sch¨ fer, and B. Sch¨ lkopf. Large scale multiple kernel learning. a a o Journal of Machine Learning Research, 7:1531–1565, 2006. A. Subramanya and J. Bilmes. Entropic graph regularization in non-parametric semi-supervised classification. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, and A. Culotta, editors, Advances in Neural Information Processing Systems 22, pages 1803–1811. MIT Press, Cambridge, MA, 2009. Y.-Y. Sun, Y. Zhang, and Z.-H. Zhou. Multi-label learning with weak label. In Proceedings of the 24th AAAI Conference on Artificial Intelligence, pages 593–598, Atlanta, GA, 2010. I. W. Tsang, J. T. Kwok, and P. Cheung. Core vector machines: Fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363–392, 2006. I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2):1453, 2006. H. Valizadegan and R. Jin. Generalized maximum margin clustering and unsupervised kernel learning. In B. Sch¨ lkopf, J. Platt, and T. Hoffman, editors, Advances in Neural Information Processo ing Systems 19, pages 1417–1424. MIT Press, Cambridge, MA, 2007. V.N. Vapnik. Statistical Learning Theory. Wiley-Interscience, 1998. H. Y. Wang, Q. Yang, and H. Zha. Adaptive p-posterior mixture-model kernels for multiple instance learning. In Proceedings of the 25th International Conference on Machine Learning, pages 1136– 1143, Helsinki, Finland, 2008. L. Xu and D. Schuurmans. Unsupervised and semi-supervised multi-class support vector machines. In Proceedings of the 20th National Conference on Artificial Intelligence, pages 904–910, Pittsburgh, PA, 2005. L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1537–1544. MIT Press, Cambridge, MA, 2005. X. Xu and E. Frank. Logistic regression and boosting for labeled bags of instances. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining, pages 272–281, Sydney, Australia, 2004. Z. Xu, R. Jin, I. King, and M. R. Lyu. An extended level method for efficient multiple kernel learning. In D. Koller, D. Schuurmans, Y. Bengio, and L. Bottou, editors, Advances in Neural Information Processing Systems 21, pages 1825–1832. MIT Press, Cambridge, MA, 2009. 2187 L I , T SANG , K WOK AND Z HOU Z. Xu, R. Jin, H. Yang, I. King, and M. Lyu. Simple and efficient multiple kernel learning by group lasso. In Proceedings of 27th International Conference on Machine Learning, pages 1–8, Haifa, Israel, 2010. S.-J. Yang, Y. Jiang, and Z.-H. Zhou. Multi-instance multi-label learning with weak label. In Proceedings of 23rd International Joint Conference on Artificial Intelligence, Beijing, China, 2013. K. Zhang, I. W. Tsang, and J. T. Kwok. Maximum margin clustering made practical. In Proceedings of the 24th International Conference on Machine Learning, pages 1119–1126, Corvallis, OR, 2007. K. Zhang, J.T. Kwok, and B. Parvin. Prototype vector machine for large scale semi-supervised learning. In Proceedings of the 26th International Conference on Machine Learning, pages 1233– 1240, Montreal, Canada, 2009a. K. Zhang, I. W. Tsang, and J. T. Kwok. Maximum margin clustering made practical. IEEE Transactions on Neural Networks, 20(4):583–596, 2009b. Q. Zhang and S. A. Goldman. EM-DD: An improved multiple-instance learning technique. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, pages 1073–1080. MIT Press, Cambridge, MA, 2002. B. Zhao, F. Wang, and C. Zhang. Efficient maximum margin clustering via cutting plane algorithm. In Proceedings of the 8th International Conference on Data Mining, pages 751–762, Atlanta, GA, 2008. Z.-H. Zhou and M. Li. Semi-supervised learning by disagreement. Knowledge and Information Systems, 24(3):415–439, 2010. Z.-H. Zhou and J.-M. Xu. On the relation between multi-instance learning and semi-supervised learning. In Proceedings of the 24th International Conference on Machine Learning, pages 1167– 1174, Corvallis, OR, 2007. Z.-H. Zhou and M.-L. Zhang. Multi-instance multi-label learning with application to scene classification. In B. Sch¨ lkopf, J. Platt, and T. Hofmann, editors, Advances in Neural Information o Processing Systems 19, pages 1609–1616. MIT Press, Cambridge, MA, 2007. Z.-H. Zhou, X.-B. Xue, and Y. Jiang. Locating regions of interest in CBIR with multi-instance learning techniques. In Proceedings of the 18th Australian Joint Conference on Artificial Intelligence, pages 92–101, Sydney, Australia, 2005. Z.-H. Zhou, M.-L. Zhang, S.-J. Huang, and Y.-F. Li. Multi-instance multi-label learning. Artificial Intelligence, 176(1):2291–2320, 2012. X. Zhu. Semi-supervised learning literature survey. Technical report, Computer Science, University of Wisconsin-Madison, 2006. 2188