jmlr jmlr2010 jmlr2010-113 jmlr2010-113-reference knowledge-graph by maker-knowledge-mining

113 jmlr-2010-Tree Decomposition for Large-Scale SVM Problems

Source: pdf

Author: Fu Chang, Chien-Yang Guo, Xiao-Rong Lin, Chi-Jen Lu

Abstract: To handle problems created by large data sets, we propose a method that uses a decision tree to decompose a given data space and train SVMs on the decomposed regions. Although there are other means of decomposing a data space, we show that the decision tree has several merits for large-scale SVM training. First, it can classify some data points by its own means, thereby reducing the cost of SVM training for the remaining data points. Second, it is efﬁcient in determining the parameter values that maximize the validation accuracy, which helps maintain good test accuracy. Third, the tree decomposition method can derive a generalization error bound for the classiﬁer. For data sets whose size can be handled by current non-linear, or kernel-based, SVM training techniques, the proposed method can speed up the training by a factor of thousands, and still achieve comparable test accuracy. Keywords: binary tree, generalization error ¨bound, margin-based theory, pattern classiﬁcation, ı tree decomposition, support vector machine, VC theory

reference text

N. Alon, S. Ben-David, N. Cesa-Bianchi, and D. Haussler. Scale-sensitive dimensions, uniform convergence, and learnability. Journal of ACM, 44(4):615–631, 1997. P. Bartlett and J. Shawe-Taylor. Generalization performance of support vector machines and other pattern classiﬁers. In B. Sch¨ lkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel o Methods - Support Vector Learning. MIT Press, 1998. K. P. Bennett and J. A. Blue. A support vector machine approach to decision trees. In Proceedings IEEE International Joint Conference on Neural Networks, 1998. K. P. Bennett, N. Cristianini, J. Shawe-Taylor, and D. Wu. Enlarging the margins in perceptron decision trees. Machine Learning, 41:295–313, 2000. A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classiﬁers with online and active learning. Journal of Machine Learning Research, 6:1579–1619, 2005. A. Bordes, L. Bottou, P. Gallinari, and J. Weston. Solving multiclass support vector machines with LaRank. In Proceedings of the 24th International Machine Learning Conference, pages 89–96, 2007. A. Bordes, L. Bottou, and P. Gallinari. SGD-QN: careful quasi-newton stochastic gradient descent. Journal of Machine Learning Research, 10:1737–1754, 2009. L. Bottou, C. Cortes, J. Denker, H. Drucker, I. Guyon, L. Jackel, Y. LeCun, U. M¨ ller, E. Sackinger, u P. Simard, and V. Vapnik. Comparison of classiﬁer methods: A case study in handwriting digit recognition. In Proc. Int. Conf. Pattern Recognition, pages 77–87, 1994. 2969 C HANG , G UO , L IN AND L U L. Breiman. Bagging predictors. Machine Learning, 262:123–140, 1996. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classiﬁcation and Regression Trees. Chapman and Hall, 1984. C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Journal of Data Mining and Knowledge Discovery, 2(2):1–47, 1998. R. Collobert, S. Bengio, and Y. Bengio. A parallel mixture of SVMs for very large scale problems. Neural Computation, 14(5):1105–1114, 2002. C. Cortes and V. Vapnik. Support vector machines. Machine Learning, 20:1–25, 1995. N. Cristianini and J. Shawe-Taylor. An Introduction to Support Vector Machines and Other KernelBased Learning Methods. Cambridge University Press, 2000. G. R. Dattatreya and L. N. Kanal. Decision trees in pattern recognition. In L. N. Kanal and A. Rossenfeld, editors, Progress in Pattern Recognition, volume 2. Elsevier, 1985. T. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:139–157, 2000. R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research, 6:1889–1918, 2005. R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classiﬁcation. Journal of Machine Learning Research, 9:1871–1874, 2008. V. Franc and S. Sonnenburg. Optimized cutting plane algorithm for support vector machines. In Proceedings of the 25th International Machine Learning Conference, pages 320–327, 2008. T. Glasmachers and C. Igle. Maximum-gain working set selection for SVMs. Journal of Machine Learning Research, 7:1437–1466, 2006. H. P. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and V. Vapnik. Parallel support vector machines: the cascade SVM. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Neural Information Processing Systems. MIT Press, 2004. T. Joachims. Making large-scale support vector machine learning practical. In B. Sch¨ lkopf, C. J. C. o Burges, and A. J. Smola, editors, Kernel Methods – Support Vector Learning. MIT Press, 1998. T. Joachims. Training linear SVMs in linear time. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 217–226, 2006. J. Kivinen, A. J. Smola, and R. C. Williamson. Online learning with kernels. IEEE Transactions on Signal Processing, 52(8), 2004. S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer learning revisited: A stepwise procedure for building and training a neural network. In J. Fogelman, editor, Neurocomputing: Algorithms, Architectures and Applications. Springer-Verlag, 1990. 2970 T REE D ECOMPOSITION FOR L ARGE -S CALE SVM P ROBLEMS N. D. Lawrence, M. Seeger, and R. Herbrich. Fast sparse gaussian process methods: the informative vector machine. In S. Becker, S. Thrun, and K. Obermayer, editors, Neural Information Processing Systems. MIT Press, 2003. A. K. Menon. Large-scale support vector machines: algorithms and theory. Research Exam, University of California, San Diego, 2009. S. K. Murthy, S. Kasif, and S. Salzberg. A system for induction of oblique decision trees. Journal of Machine Learning Research, 2:1–32, 1994. D. J. Newman, S. Hettich, C. L. Blake, and C. J. Merz. UCI repository of machine learning databases. Irvine, CA: University of California, Department of Information and Computer Science, 1998. E. Osuna, R. Freud, and F. Girosi. An improved training algorithm for support vector machines. In Neural Networks for Signal Processing, volume VII, pages 276–285, 1997. N. Panda, E. Y. Chang, and G. Wu. Concept boundary detection for speeding up SVMs. In Proceedings of International Conference on Machine learning, pages 681–688, 2006. D. Pavlov, D. Chudova, and P. Smyth. Towards scalable support vector machines using squashing. In ACM SIGKDD, pages 295–299, 2000. J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Sch¨ lkopf, C. Burges, and A. J. Smola, editors, Kernel Methods: Support Vector Learning. o MIT Press, 1998. J. C. Platt, N. Cristianini, and J. Shawe-Taylor. Large margin dags for multiclass classiﬁcation. In S. A. Solla, T. K. Leen, and K.-R. M¨ ller, editors, Advances in Neural Information Processing u Systems. MIT Press, 2000. J. R. Quinlan. Induction of decision tree. Machine Learning, 1:81–106, 1986. S. Ramaswamy. Multiclass text classiﬁcation a decision tree based SVM approach. CS294 Practical Machine Learning Project, 2006. A. Rida, A. Labbi, and C. Pellegrini. Local experts combination through density decomposition. In Proceedings of International Workshop on AI and Statistics, 1999. H. Sahbi and D. Geman. A hierarchy of support vector machines for pattern detection. Journal of Machine Learning Research, 7:2087–2123, 2006. R. E. Schapire. The strength of weak learnability. Machine Learning, 5:197–227, 1990. R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Journal of Machine Learning Research, 39(2/3):135–168, 2000. S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In Proceedings of International Conference on Machine learning, pages 807–814, 2007. 2971 C HANG , G UO , L IN AND L U J. Shawe-Taylor and N. Cristianini. Margin distribution bounds on generalization. In Proceedings of the European Conference on Computational Learning Theory, pages 263–273, 1999. J. Shawe-Taylor and N. Cristianini. On the generalization of soft margin algorithms. IEEE Transactions on Information Theory, 48(10):2721–2735, 2002. A. J. Smola and B. Sch¨ lkopf. Sparse greedy matrix approximation for machine learning. In o Proceedings of the International Conference on Machine Learning, pages 911–918, 2000. A. J. Smola, S. V. N. Vishwanathan, and Q. V. Le. Bundle methods for machine learning. In J. C. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems, volume 20. MIT Press, 2007. J.-Q. Sun, G.-G. Wang, Q. Hu, and S.-Y. Li. A novel SVM detection tree and its application to face detection. In Proceedings of the 8th ACIS International Conference on Software Engineering, Artiﬁcial Intelligence, Networking, and Parallel/Distributed Computing, pages 385–389, 2007. R. Tibshirani and T. Hastie. Margin trees for high-dimensional classiﬁcation. Journal of Machine Learning Research, 8:637–652, 2007. V. Tresp. A bayesian committee machines. Neural Computation, 12:2719–2741, 2000. V. Tresp. Scaling kernel-based systems to large data sets. Data Mining and Knowledge Discovery, 5:197–211, 2001. I. W. Tsang, J. T. Kwok, and P.-M. Cheung. Core vector machines: fast SVM training on very large data sets. Journal of Machine Learning Research, 6:363–392, 2005. T.-S. Tseng, C.-Y. Guo, Wen-Lian Hsu, and F. Chang. Protein-protein interface prediction based on a novel SVM speedup. Technical Report, Number TR-IIS-10-001, Institute of Information Science, Academia Sinica, 2010. V. Vapnik. The Nature of Statistical Learning Theory. Springer Verlag, 1995. D. Wu, K. P. Bennett, N. Cristianini, and J. Shawe-Taylor. Large margin decision trees for induction and transduction. In Proceedings of International Conference on Machine Learning, pages 474– 483, 1999. H. Yu, J. Han J. Yang, and X. Li. Making SVMs scalable to large data sets using hierarchical cluster indexing. Data Mining and Knowledge Discovery, 11:295–321, 2003. 2972