jmlr jmlr2009 jmlr2009-70 jmlr2009-70-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hugo Jair Escalante, Manuel Montes, Luis Enrique Sucar
Abstract: This paper proposes the application of particle swarm optimization (PSO) to the problem of full model selection, FMS, for classification tasks. FMS is defined as follows: given a pool of preprocessing methods, feature selection and learning algorithms, to select the combination of these that obtains the lowest classification error for a given data set; the task also includes the selection of hyperparameters for the considered methods. This problem generates a vast search space to be explored, well suited for stochastic optimization techniques. FMS can be applied to any classification domain as it does not require domain knowledge. Different model types and a variety of algorithms can be considered under this formulation. Furthermore, competitive yet simple models can be obtained with FMS. We adopt PSO for the search because of its proven performance in different problems and because of its simplicity, since neither expensive computations nor complicated operations are needed. Interestingly, the way the search is guided allows PSO to avoid overfitting to some extend. Experimental results on benchmark data sets give evidence that the proposed approach is very effective, despite its simplicity. Furthermore, results obtained in the framework of a model selection challenge show the competitiveness of the models selected with PSO, compared to models selected with other techniques that focus on a single algorithm and that use domain knowledge. Keywords: full model selection, machine learning challenge, particle swarm optimization, experimentation, cross validation
P. J. Angeline. Evolutionary optimization vs particle swarm optimization: Philosophy and performance differences. In Proceedings of the 7th Conference on Evolutionary Programming, volume 1447 of LNCS, pages 601–610, San Diego, CA, March 1998. Springer. Y. Bengio and N. Chapados. Extensions to metric-based model selection. Journal of Machine Learning Research, 3:1209–1227, 2003. J. Bi, M. Embrechts K. P. Bennett, C. M. Breneman, and M. Song. Dimensionality reduction via sparse support vector machines. Journal of Machine Learning Research, Mar(3):1229–1243, Mar 2003. C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006. M. Boull´ . Report on preliminary experiments with data grid models in the agnostic learning vs e prior knowledge challgenge. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1802–1808, 2007. L. Breiman. Random forest. Machine Learning, 45(1):5–32, 2001. G. Cawley. Leave-one-out cross-validation based model selection criteria for weighted ls-svms. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2006), pages 2970–2977, Vancouver, Canada, July 2006. 435 E SCALANTE , M ONTES AND S UCAR G. Cawley and N. L. C. Talbot. Agnostic learning vs prior knowledge in the design of kernel machines. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1444–1450, Orlando, Florida, 2007a. G. Cawley, G. Janacek, and N. L. C. Talbot. Generalised kernel machines. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1439–1445, Orlando, Florida, 2007. G. C. Cawley and N. L. C. Talbot. Preventing over-fitting during model selection via bayesian regularisation of the hyper-parameters. Journal of Machine Learning Research, 8:841–861, April 2007b. M. Clerc and J. Kennedy. The particle swarm: Explosion, stability and convergenge in a multidimensional complex space. IEEE Transactions on on Evolutionary Computation, 6(1):58–73, February 2002. J. Demsar. Statistical comparisons of classfifiers over multiple data sets. Journal of Machine Learning Research, 7:1–30, January 2006. J. E. Dennis and V. J. Torczon. Derivative-free pattern search methods for multidisciplinary design problems. In Proceedings of the AIAA / USAF / NASA / ISSMO Symposium on Multidisciplinary Analysis and Optimizatino, pages 922–932, 1994. T. Dietterich. Overfitting and undercomputing in machine learning. ACM Comput. Surv., 27(3): 326–327, 1995. ISSN 0360-0300. A. P. Engelbrecht. Fundamentals of Computational Swarm Intelligence. Wiley, 2006. H. J. Escalante. Particle swarm optimization for classifier selection: A practical guide to psms. http://ccc.inaoep.mx/˜hugojair/psms/psms_doc.pdf, In preparation, 2009. H. J. Escalante, M. Montes, and E. Sucar. Psms for neural networks on the ijcnn 2007 agnostic vs prior knowledge challenge. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1191–1197, Orlando, FL, USA., 2007. V. Franc and V. Hlavac. The statistical pattern recognition toolbox. http://cmp.felk.cvut.cz/ cmp/software/stprtool/index.html, 2004. J. Friedman, T. Hastie, and R. Tibshirani. Additive logistic regression: a statistical view of boosting. Annals of Statistics, 28(2):337–407, 2000. P. Geurts, D. Ernst, and L. Wehenkel. Extremely randomized trees. Machine Learning, 63(1):3–42, 2006. ISSN 0885-6125. T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomeld, and E. S. Lander. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286: 531–537, October 1999. D. Gorissen. Heterogeneous evolution of surrogate models. Master’s thesis, Katholieke Universiteit Leuven, Belgium, June 2007. 436 PARTICLE S WARM M ODEL S ELECTION D. Gorissen, L. De Tommasi, J. Croon, and T. Dhaene. Automatic model type selection with heterogeneous evolution: An application to rf circuit block modeling. In IEEE Proceedings of WCCI 2008, pages 989–996, 2008. V.G. Gudise and G.K. Venayagamoorthy. Comparison of particle swarm optimization and backpropagation as training algorithms for neural networks. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium, 2003. (SIS03), pages 110–117, 2003. I. Guyon and A. Elisseeff. An introduction to variable and feature selection. Journal of Machine Learning Research, 3(Mar):1157–1182, 2003. I. Guyon, S. Gunn, A. Ben-Hur, and G. Dror. Result analysis of the nips 2003 feature selection challenge. In Advances in Neural Information Processing Systems 17, pages 545–552. MIT Press, Cambridge, MA, 2005. I. Guyon, S. Gunn, M. Nikravesh, and L. Zadeh, editors. Feature Extraction, Foundations and Applications. Series Studies in Fuzziness and Soft Computing. Springer, 2006a. I. Guyon, A. Saffari, G. Dror, and J. M. Buhmann. Performance prediction challenge. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2006), pages 2958–2965, Vancouver, Canada, July 2006b. I. Guyon, A. Saffari, G. Dror, G. Cawley, and O. Guyon. Benchmark datasets and game result summary. In NIPS Workshop on Multi-level Inference and the Model Selection Game, Whistler, Canada, December 2006c. I. Guyon, A. Saffari, G. Dror, and G. Cawley. Agnostic learning vs prior knowledge challenge. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1232–1238, Orlando, Florida, 2007. I. Guyon, A. Saffari, G. Dror, and Gavin Cawley. Analysis of the ijcnn 2007 competition agnostic learning vs. prior knowledge. Neural Networks, 21(2–3):544–550, 2008. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Verlag, New York, 2001. E. Hern´ ndez, C. Coello, and A. Hern´ ndez. On the use of a population-based particle swarm a a optimizer to design combinational logic circuits. In Evolvable Hardware, pages 183–190, 2004. G. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18(7):1527–1554, 2006. ISSN 0899-7667. C. W. Hsu, C. C. Chang, and C. J. Lin. A practical guide to support vector classification. Technical report, Taipei, 2003. URL http://www.csie.ntu.edu.tw/˜cjlin/papers/guide/guide. pdf. C. Hue and M. Boull´ . A new probabilistic approach in rank regression with optimal bayesian e partitioning. Journal of Machine Learning Research, 8:2727–2754, December 2007. D. Jensen and P Cohen. Multiple comparisons in induction algorithms. Machine Learning, 38(3): 309–338, 2000. ISSN 0885-6125. 437 E SCALANTE , M ONTES AND S UCAR J. Kennedy. How it works: Collaborative trial and error. International Journal of Computational Intelligence Research, 4(2):71–78, 2008. J. Kennedy and R. Eberhart. Particle swarm optimization. In Proceedings of the International Conference on Neural Networks, volume IV, pages 1942–1948. IEEE, 1995. J. Kennedy and R. Eberhart. Swarm Intelligence. Morgan Kaufmann, 2001. J. Kennedy and R. Mendes. Population structure and particle swarm performance. In Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2002), volume 2, pages 1671–1676, 2002. Y. Kim, N. Street, and F. Menczer. Evolutionary model selection in unsupervised learning. Intelligent Data Analysis, 6:531–556, 2002. S. Kirkpatrick, C. Gelatt, and M. Vecchi. Optimization by simulated annealing. Science, 220(4598): 671–680, 1983. J. Loughrey and P. Cunningham. Overfitting in wrapper-based feature subset selection: The harder you try the worse it gets. In F. Coenen M. Bramer and T. Allen, editors, Proceedings of AI-2004, the Twenty-fourth SGAI International Conference on Innovative Techniques and Applications of Artificial Intelligence, Research and Development in Intelligent Systems XXI, pages 33–43, 2005. R. Lutz. Logitboost with trees applied to the wcci 2006 performance prediction challenge datasets. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2006), pages 1657– 1660, Vancouver, Canada, July 2006. S. Mika, G. R¨ tsch, J. Weston, B. Sch¨ lkopf, A. J. Smola, and K.-R. M¨ ller. Invariant feature a o u extraction and classification in kernel spaces. In S. A. Solla, T. K. Leen, and K.-R. M¨ ller, editors, u Advances in Neural Information Processing Systems 12, pages 526–532, Cambridge, MA, 2000. MIT Press. M. Momma and K. Bennett. A pattern search method for model selection of support vector regression. In Proceedings of SIAM Conference on Data Mining, 2002. O. Nelles. Nonlinear System Identification: From Classical Approaches to Neural Networks and Fuzzy Models. Springer, 2001. E. Ozcan and C. K. Mohan. Analysis of a simple particle swarm optimization system. In Intelligent Engineering Systems Through Artificial Neural Networks, pages 253–258, 1998. E. Pranckeviciene, R. Somorjai, and M. N. Tran. Feature/model selection by the linear programming svm combined with state-of-art classifiers: What can we learn about the data. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1422–1428, 2007. N. Qian. On the momentum term in gradient descent learning algorithms. Neural Netw., 12(1): 145–151, 1999. ISSN 0893-6080. doi: http://dx.doi.org/10.1016/S0893-6080(98)00116-6. 438 PARTICLE S WARM M ODEL S ELECTION J. R. Quinlan and R. M. Cameron-Jones. Oversearching and layered search in empirical learning. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, pages 1019– 1024, 1995. G. R¨ tsch, T. Onoda, and K.-R. M¨ ller. Soft margins for adaboost. Mach. Learn., 42(3):287–320, a u 2001. ISSN 0885-6125. doi: http://dx.doi.org/10.1023/A:1007618119488. J. Reunanen. Model selection and assessment using cross-indexing. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1674–1679, 2007. M. Reyes and C. Coello. Multi-objective particle swarm optimizers: A survey of the state-of-the-art. International Journal of Computational Intelligence Research, 3(2):287308, 2006. Y. Robinson, J. Rahmat-Samii. Particle swarm optimization in electromagnetics. IEEE Transactions on Antennas and Propagation, 52(2):397– 407, February 2004. A. Saffari and I. Guyon. Quickstart guide for clop. Technical report, Graz University of Technology and Clopinet, May 2006. http://www.ymer.org/research/files/clop/QuickStartV1.0. pdf. J. Salerno. Using the particle swarm optimization technique to train a recurrent neural model. In Proceedings of the Ninth International Conference on Tools with Artificial Intelligence, pages 45–49, 1997. C. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables. In Jude W. Shavlik, editor, Proceedings of the 15th International Conference on Machine Learning, pages 515–521, Madison, WI, USA, 1998. Y. Shi and R. C. Eberhart. Parameter selection in particle swarm optimization. In Evolutionary Programming VII, pages 591–600, New York, 1998. Springer-Verlag. Y. Shi and R. C. Eberhart. Emprirical study of particle swarm optimization. In Proceedings of the Congress on Evolutionary Computation, pages 1945–1949, Piscataway, NJ, USA, 1999. IEEE. S. Sonnenburg. Nips workshop on machine learning open source software. http://www2.fml. tuebingen.mpg.de/raetsch/workshops/MLOSS06/, December 2006. J.A.K. Suykens and J. Vandewalle. Least squares support vector machine classifiers. Neural Processing Letters, 9(1):293–300, 1999. F. van den Bergh. An Analysis of Particle Swarm Optimizers. PhD thesis, University of Pretoria, Sudafrica, November 2001. F. van der Heijden, R. P.W. Duin, D. de Ridder, and D. M.J. Tax. Prtools: a matlab based toolbox for pattern recognition. http://www.prtools.org/, 2004. M. Voss and X. Feng. Arma model selection using particle swarm optimization and aic criteria. In Proceedings of the 15th IFAC World Congress on Automatic Control, 2002. J. Weston, A. Elisseeff, G. BakIr, and F. Sinz. The spider machine learning toolbox. http://www. kyb.tuebingen.mpg.de/bs/people/spider/, 2005. 439 E SCALANTE , M ONTES AND S UCAR J. Wichard. Agnostic learning with ensembles of classifiers. In Proceedings of the 20th International Joint Conference on Neural Networks, pages 1753–1759, 2007. J. Wichard and C. Merkwirth. Entool - a matlab toolbox for ensemble modeling. http://www. j-wichard.de/entool/, 2007. I. H. Witten and E. Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, San Francisco, 2nd edition, 2005. D. H. Wolpert and W. G. Macready. No free lunch theorems for optimization. IEEE Transactions on Evolutionary Computation, 4:67–82, 1997. H. Xiaohui, R. Eberhart, and Y. Shi. Engineering optimization with particle swarm. In Proceedings of the 2003 IEEE Swarm Intelligence Symposium, 2003. (SIS03), pages 53–57, 2003. H. Yoshida, K. Kawata, Y. Fukuyama, S. Takayama, and Y. Nakanishi. A particle swarm optimization for reactive power and voltage control considering voltage security assessment. IEEE Transactions on Power Systems, 15(4):1232–1239, Jan 2001. 440