jmlr jmlr2009 jmlr2009-3 jmlr2009-3-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marc Boullé
Abstract: With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper, we present a parameter-free scalable classification method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classification enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with data sets that are far larger than the available central memory. We finally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time. Keywords: large scale learning, naive Bayes, Bayesianism, model selection, model averaging
A. Bordes and L. Bottou. Sgd-qn, larank: Fast optimizers for linear svms. In ICML 2008 Workshop for PASCAL Large Scale Learning Challenge, 2008. http://largescale.first.fraunhofer.de/workshop/. M. Boull´ . A Bayes optimal approach for partitioning the values of categorical attributes. Journal e of Machine Learning Research, 6:1431–1452, 2005. M. Boull´ . MODL: a Bayes optimal discretization method for continuous attributes. Machine e Learning, 65(1):131–165, 2006. M. Boull´ . Compression-based averaging of selective naive Bayes classifiers. Journal of Machine e Learning Research, 8:1659–1685, 2007. M. Boull´ . e An efficient parameter-free method for large scale offline learning. In ICML 2008 Workshop for PASCAL Large Scale Learning Challenge, 2008. http://largescale.first.fraunhofer.de/workshop/. L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996. O. Chapelle. Training a support vector machine in the primal. Neural Computation, 19:1155–1178, 2007. P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth. CRISP-DM 1.0 : Step-by-step Data Mining Guide, 2000. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of the 12th International Conference on Machine Learning, pages 194– 202. Morgan Kaufmann, San Francisco, CA, 1995. T. Fawcett. ROC graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories, 2003. U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Knowledge discovery and data mining: towards a unifying framework. In KDD, pages 82–88, 1996. D.J. Hand and K. Yu. Idiot bayes ? not so stupid after all? International Statistical Review, 69(3): 385–399, 2001. 1384 A PARAMETER -F REE C LASSIFICATION M ETHOD FOR L ARGE S CALE L EARNING J.A. Hoeting, D. Madigan, A.E. Raftery, and C.T. Volinsky. Bayesian model averaging: A tutorial. Statistical Science, 14(4):382–417, 1999. C-J. Hsieh, K-W. Chang, C-J. Lin, S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In ICML ’08: Proceedings of the 25th international conference on Machine learning, pages 408–415, New York, NY, USA, 2008. ACM. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML), pages 137–142, Berlin, 1998. Springer. R. Kohavi and G. John. Wrappers for feature selection. Artificial Intelligence, 97(1-2):273–324, 1997. P. Langley and S. Sage. Induction of selective Bayesian classifiers. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, pages 399–406. Morgan Kaufmann, 1994. P. Langley, W. Iba, and K. Thompson. An analysis of Bayesian classifiers. In 10th National Conference on Artificial Intelligence, pages 223–228. AAAI Press, 1992. H. Liu, F. Hussain, C.L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 4(6):393–423, 2002. R. Mamdouh. Data Preparation for Data Mining Using SAS. Morgan Kaufmann Publishers, 2006. D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc. San Francisco, USA, 1999. S. Sonnenburg, V. Franc, E. Yom-Tov, and M. Sebag. Pascal large scale learning challenge, 2008. http://largescale.first.fraunhofer.de/about/. S. Vempala. The Random Projection Method, volume 65 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society, 2004. 1385