jmlr jmlr2009 jmlr2009-3 jmlr2009-3-reference knowledge-graph by maker-knowledge-mining

3 jmlr-2009-A Parameter-Free Classification Method for Large Scale Learning

Source: pdf

Author: Marc Boullé

Abstract: With the rapid growth of computer storage capacities, available data and demand for scoring models both follow an increasing trend, sharper than that of the processing power. However, the main limitation to a wide spread of data mining solutions is the non-increasing availability of skilled data analysts, which play a key role in data preparation and model selection. In this paper, we present a parameter-free scalable classiﬁcation method, which is a step towards fully automatic data mining. The method is based on Bayes optimal univariate conditional density estimators, naive Bayes classiﬁcation enhanced with a Bayesian variable selection scheme, and averaging of models using a logarithmic smoothing of the posterior distribution. We focus on the complexity of the algorithms and show how they can cope with data sets that are far larger than the available central memory. We ﬁnally report results on the Large Scale Learning challenge, where our method obtains state of the art performance within practicable computation time. Keywords: large scale learning, naive Bayes, Bayesianism, model selection, model averaging

reference text

A. Bordes and L. Bottou. Sgd-qn, larank: Fast optimizers for linear svms. In ICML 2008 Workshop for PASCAL Large Scale Learning Challenge, 2008. http://largescale.ﬁrst.fraunhofer.de/workshop/. M. Boull´ . A Bayes optimal approach for partitioning the values of categorical attributes. Journal e of Machine Learning Research, 6:1431–1452, 2005. M. Boull´ . MODL: a Bayes optimal discretization method for continuous attributes. Machine e Learning, 65(1):131–165, 2006. M. Boull´ . Compression-based averaging of selective naive Bayes classiﬁers. Journal of Machine e Learning Research, 8:1659–1685, 2007. M. Boull´ . e An efﬁcient parameter-free method for large scale ofﬂine learning. In ICML 2008 Workshop for PASCAL Large Scale Learning Challenge, 2008. http://largescale.ﬁrst.fraunhofer.de/workshop/. L. Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996. O. Chapelle. Training a support vector machine in the primal. Neural Computation, 19:1155–1178, 2007. P. Chapman, J. Clinton, R. Kerber, T. Khabaza, T. Reinartz, C. Shearer, and R. Wirth. CRISP-DM 1.0 : Step-by-step Data Mining Guide, 2000. J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Proceedings of the 12th International Conference on Machine Learning, pages 194– 202. Morgan Kaufmann, San Francisco, CA, 1995. T. Fawcett. ROC graphs: Notes and practical considerations for researchers. Technical Report HPL-2003-4, HP Laboratories, 2003. U.M. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. Knowledge discovery and data mining: towards a unifying framework. In KDD, pages 82–88, 1996. D.J. Hand and K. Yu. Idiot bayes ? not so stupid after all? International Statistical Review, 69(3): 385–399, 2001. 1384 A PARAMETER -F REE C LASSIFICATION M ETHOD FOR L ARGE S CALE L EARNING J.A. Hoeting, D. Madigan, A.E. Raftery, and C.T. Volinsky. Bayesian model averaging: A tutorial. Statistical Science, 14(4):382–417, 1999. C-J. Hsieh, K-W. Chang, C-J. Lin, S. Keerthi, and S. Sundararajan. A dual coordinate descent method for large-scale linear svm. In ICML ’08: Proceedings of the 25th international conference on Machine learning, pages 408–415, New York, NY, USA, 2008. ACM. T. Joachims. Text categorization with support vector machines: Learning with many relevant features. In European Conference on Machine Learning (ECML), pages 137–142, Berlin, 1998. Springer. R. Kohavi and G. John. Wrappers for feature selection. Artiﬁcial Intelligence, 97(1-2):273–324, 1997. P. Langley and S. Sage. Induction of selective Bayesian classiﬁers. In Proceedings of the 10th Conference on Uncertainty in Artiﬁcial Intelligence, pages 399–406. Morgan Kaufmann, 1994. P. Langley, W. Iba, and K. Thompson. An analysis of Bayesian classiﬁers. In 10th National Conference on Artiﬁcial Intelligence, pages 223–228. AAAI Press, 1992. H. Liu, F. Hussain, C.L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 4(6):393–423, 2002. R. Mamdouh. Data Preparation for Data Mining Using SAS. Morgan Kaufmann Publishers, 2006. D. Pyle. Data Preparation for Data Mining. Morgan Kaufmann Publishers, Inc. San Francisco, USA, 1999. S. Sonnenburg, V. Franc, E. Yom-Tov, and M. Sebag. Pascal large scale learning challenge, 2008. http://largescale.ﬁrst.fraunhofer.de/about/. S. Vempala. The Random Projection Method, volume 65 of DIMACS Series in Discrete Mathematics and Theoretical Computer Science. American Mathematical Society, 2004. 1385