jmlr jmlr2012 jmlr2012-72 jmlr2012-72-reference knowledge-graph by maker-knowledge-mining

72 jmlr-2012-Multi-Target Regression with Rule Ensembles


Source: pdf

Author: Timo Aho, Bernard Ženko, Sašo Džeroski, Tapio Elomaa

Abstract: Methods for learning decision rules are being successfully applied to many problem domains, in particular when understanding and interpretation of the learned model is necessary. In many real life problems, we would like to predict multiple related (nominal or numeric) target attributes simultaneously. While several methods for learning rules that predict multiple targets at once exist, they are all based on the covering algorithm, which does not work well for regression problems. A better solution for regression is the rule ensemble approach that transcribes an ensemble of decision trees into a large collection of rules. An optimization procedure is then used to select the best (and much smaller) subset of these rules and to determine their respective weights. We introduce the F IRE algorithm for solving multi-target regression problems, which employs the rule ensembles approach. We improve the accuracy of the algorithm by adding simple linear functions to the ensemble. We also extensively evaluate the algorithm with and without linear functions. The results show that the accuracy of multi-target regression rule ensembles is high. They are more accurate than, for instance, multi-target regression trees, but not quite as accurate as multi-target random forests. The rule ensembles are significantly more concise than random forests, and it is also possible to create compact rule sets that are smaller than a single regression tree but still comparable in accuracy. Keywords: multi-target prediction, rule learning, rule ensembles, regression ∗. Also in Microtask, Tampere, Finland. †. Also in the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Ljubljana, Slovenia. ‡. Also in the Centre of Excellence for Integrated Approaches in Chemistry and Biology of Proteins, Ljubljana, Slovenia and the Joˇ ef Stefan International Postgraduate School, Ljubljana, Slovenia. z ˇ c 2012 Timo Aho, Bernard Zenko, Saˇo Dˇ eroski and Tapio Elomaa. s z ˇ ˇ


reference text

Timo Aho. Steps on Multi-Target Prediction and Adaptability to Dynamic Input. PhD thesis, Tampere University of Technology, Department of Software Systems, Tampere, Finland, 2012. ˇ Timo Aho, Bernard Zenko, and Saˇo Dˇ eroski. Rule ensembles for multi-target regression. In s z Wei Wang, Hillol Kargupta, Sanjay Ranka, Philip S. Yu, and Xindong Wu, editors, Proceedings of the Ninth IEEE International Conference on Data Mining (ICDM 2009), pages 21–30. IEEE Computer Society, 2009. Annalisa Appice and Saˇo Dˇ eroski. Stepwise induction of multi-target model trees. In Joost N. s z Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladeni´ , and Andrzej c Skowron, editors, Proceedings of the 18th European Conference on Machine Learning (ECML 2007), LNCS, pages 502–509. Springer, 2007. Andreas Argyriou, Theodoros Evgeniou, and Massimiliano Pontil. Convex multi-task feature learning. Machine Learning, 73(3):243–272, 2008. Arthur Asuncion and David J. Newman. UCI machine learning repository, 2011. URL http: //archive.ics.uci.edu/ml/. Steffen Bickel, Jasmina Bogojeska, Thomas Lengauer, and Tobias Scheffer. Multi-task learning for HIV therapy screening. In William W. Cohen, Andrew McCallum, and Sam T. Roweis, editors, Proceedings of the 25th International Conference on Machine Learning (ICML 2008), AICPS, pages 56–63. ACM, 2008. Hendrik Blockeel. Top-down Induction of First Order Logical Decision Trees. PhD thesis, Katholieke Universiteit Leuven, Department of Computer Science, Leuven, Belgium, 1998. Hendrik Blockeel and Jan Struyf. Efficient algorithms for decision tree cross-validation. Journal of Machine Learning Research, 3:621–650, 2002. Hendrik Blockeel, Luc De Raedt, and Jan Ramon. Top-down induction of clustering trees. In Jude W. Shavlik, editor, Proceedings of the 15th International Conference on Machine Learning (ICML 1998), pages 55–63. Morgan Kaufmann, 1998. Leo Breiman. Bagging predictors. Machine Learning, 24(2):123–140, 1996. Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001. Leo Breiman, Jerome H. Friedman, Richard A. Olshen, and Charles J. Stone. Classification and Regression Trees. Wadsworth International Group, Belmont, CA, 1984. Rich Caruana. Multitask learning. Machine Learning, 28(1):41–75, 1997. Olivier Chapelle, Pannagadatta Shivaswamy, Srinivas Vadrevu, Kilian Weinberger, Ya Zhang, and Belle Tseng. Boosted multi-task learning. Machine Learning, 85(1):1–25, 2010. Delve. Data for evaluating learning in valid experiments, 2011. URL http://www.cs.toronto. edu/˜delve/index.html. 2403 ˇ ˇ A HO , Z ENKO , D Z EROSKI AND E LOMAA Krzysztof Dembczy´ ski, Wojciech Kotłowski, and Roman Słowi´ ski. Solving regression by learnn n ing an ensemble of decision rules. In Leszek Rutkowski, Ryszard Tadeusiewicz, Lotfi A. Zadeh, and Jacek M. Zurada, editors, Proceedings of the Ninth International Conference on Artificial Intelligence and Soft Computing (ICAISC 2008), LNCS, pages 533–544. Springer, 2008. Krzysztof Dembczy´ ski, Wojciech Kotłowski, and Roman Słowi´ ski. Maximum likelihood rule n n ensembles. In William W. Cohen, Andrew McCallum, and Sam T. Roweis, editors, Proceedings of the 25th International Conference on Machine Learning (ICML 2008), AICPS, pages 224–231. ACM, 2008. Damjan Demˇar, Marko Debeljak, Claire Lavigne, and Saˇo Dˇ eroski. Modelling pollen dispersal s s z of genetically modified oilseed rape within the field. In Abstracts, the 90th ESA Annual Meeting, page 152. The Ecological Society of America, 2005. Damjan Demˇar, Saˇo Dˇ eroski, Thomas Larsen, Jan Struyf, Jørgen Axelsen, Marianne Bruus Peds s z ersen, and Paul Henning Krogh. Using multi-objective classification to model communities of soil microarthropods. Ecological Modelling, 191(1):131–143, 2006. Janez Demˇar. Statistical comparisons of classifiers over multiple data sets. Journal of Machine s Learning Research, 7:1–30, 2006. Thomas Dietterich. Ensemble methods in machine learning. In Josef Kittler and Fabio Roli, editors, Proceedings of the First International Workshop on Multiple Classifier Systems (MCS 2000), LNCS, pages 1–15. Springer, 2000. Saˇo Dˇ eroski, Ljupˇ o Todorovski, and Hendrik Blockeel. Relational ranking with predictive cluss z c tering trees. In Proceedings of the Workshop on Active Mining (in ICDM 2002), pages 9–15. IEEE Computer Society, 2002. Saˇo Dˇ eroski, Andrej Kobler, Valentin Gjorgjioski, and Panˇ e Panov. Using decision trees to pres z c dict forest stand height and canopy cover from LANDSAT and LIDAR data. In Klaus Tochtermann and Arno Scharl, editors, Proceedings of the 20th International Conference on Informatics for Environmental Protection (EnviroInfo 2006), pages 125–133. Shaker, 2006. Saˇo Dˇ eroski, Damjan Demˇar, and Jasna Grbovi´ . Predicting chemical parameters of river water s z s c quality from bioindicator data. Applied Intelligence, 13(1):7–17, 2000. Saˇo Dˇ eroski, Nathalie Colbach, and Antoine Mess´ an. Analysing the effect of field character s z e on gene flow between oilseed rape varieties and volunteers with regression trees. In Antoine Mess´ an, editor, Proceedings of the Second International Conference on Co-existence between e GM and non-GM based Agricultural Supply Chains, pages 207–211. Agropolis Productions, 2005. Peter Flach and Nada Lavraˇ . Rule induction. In Michael R. Berthold and David J. Hand, editors, c Intelligent Data Analysis, pages 229–267. Springer, 2003. Second edition. Jerome H. Friedman and Bogdan E. Popescu. Importance sampled learning ensembles. Technical report, Stanford University, Department of Statistics, Stanford, CA, 2003. 2404 M ULTI -TARGET R EGRESSION WITH RULE E NSEMBLES Jerome H. Friedman and Bogdan E. Popescu. Gradient directed regularization for linear regression and classification. Technical report, Stanford University, Department of Statistics, Stanford, CA, 2004. Jerome H. Friedman and Bogdan E. Popescu. Predictive learning via rule ensembles. Technical report, Stanford University, Department of Statistics, Stanford, CA, 2005. Jerome H. Friedman and Bogdan E. Popescu. Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3):916–954, 2008. Valentin Gjorgjioski, Saˇo Dˇ eroski, and Matt White. Clustering analysis of vegetation data. Techs z nical Report 10065, Joˇ ef Stefan Institute, Ljubljana, Slovenia, 2008. z Nitin Indurkhya and Sholom M. Weiss. Solving regression problems with rule-based ensemble classifiers. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2001), pages 287–292. ACM, 2001. Ali Jalali, Pradeep Ravikumar, Sujay Sanghavi, and Chao Ruan. A dirty model for multi-task learning. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems (NIPS 2010), pages 964–972. MIT Press, 2011. Minwoo Jeong and Gary Geunbae Lee. Multi-domain spoken language understanding with transfer learning. Speech Communication, 51(5):412–424, 2009. Christian Kampichler, Saˇo Dˇ eroski, and Ralf Wieland. Application of machine learning techs z niques to the analysis of soil ecological data bases: relationships between habitat features and collembolan community characteristics. Soil Biology and Biochemistry, 32(2):197–209, 2000. Aram Karaliˇ . Employing linear regression in regression tree leaves. In B. Neumann, editor, Proc ceedings of the Tenth European Conference on Artificial intelligence (ECAI 1992), pages 440– 441. John Wiley & Sons, 1992. Aram Karaliˇ and Ivan Bratko. First order regression. Machine Learning, 26(2-3):147–176, 1997. c Dragi Kocev, Celine Vens, Jan Struyf, and Saˇo Dˇ eroski. Ensembles of multi-objective decis z sion trees. In Joost N. Kok, Jacek Koronacki, Ramon Lopez de Mantaras, Stan Matwin, Dunja Mladeni´ , and Andrzej Skowron, editors, Proceedings of the 18th European Conference on Mac chine Learning (ECML 2007), LNCS, pages 624–631. Springer, 2007. Qi Liu, Qian Xu, Vincent W. Zheng, Hong Xue, Zhiwei Cao, and Qiang Yang. Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study. BMC Bioinformatics, 11(1): 181–196, 2010. Karim Lounici, Massimiliano Pontil, Alexandre B. Tsybakov, and Sara A. van de Geer. Taking advantage of sparsity in multi-task learning. In Proceedings of the 22nd Conference on Learning Theory (COLT 2009), 2009. Ryszard S. Michalski. On the quasi-minimal solution of the general covering problem. In Proceedings of the Fifth International Symposium on Information Processing (FCIP 1969), volume A3, Switching Circuits, pages 125–128, 1969. 2405 ˇ ˇ A HO , Z ENKO , D Z EROSKI AND E LOMAA Shibin Parameswaran and Kilian Weinberger. Large margin multi-task metric learning. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R.S. Zemel, and A. Culotta, editors, Advances in Neural Information Processing Systems (NIPS 2010), pages 1867–1875. MIT Press, 2011. Beau Piccart, Jan Struyf, and Hendrik Blockeel. Empirical asymmetric selective transfer in multiobjective decision trees. In T. Horvath, J.-F. Boulicaut, and M. Berthold, editors, Proceedings of the Eleventh International Conference on Discovery Science (DS 2008), LNAI, pages 64–75. Springer, 2008. Ross J. Quinlan. Learning with continuous classes. In A. Adams and L. Sterling, editors, Proceedings of the Fifth Australian Joint Conference on Artificial Intelligence (AI 1992), pages 343–348. World Scientific, 1992. Alain Rakotomamonjy, R´ mi Flamary, Gilles Gasso, and St´ phane Canu. ℓ p – ℓq penalty for sparse e e linear and sparse multiple kernel multitask learning. IEEE Transactions on Neural Networks, 22 (8):1307–1320, 2011. StatLib. Data sets archive, 2011. URL http://stat.cmu.edu/datasets/. Daniela Stojanova. Estimating forest properties from remotely sensed data by using machine learning. Master’s thesis, Joˇ ef Stefan International Postgraduate School, Ljubljana, Slovenia, 2009. z Jan Struyf and Saˇo Dˇ eroski. Constraint based induction of multi-objective regression trees. In s z F. Bonchi and J. Boulicaut, editors, Proceedings of the Fourth International Workshop on Knowledge Discovery in Inductive Databases (KDID 2005), LNCS, pages 222–233. Springer, 2006. Einoshin Suzuki, Masafumi Gotoh, and Yuta Choki. Bloomy decision tree for multi-objective classification. In Luc De Raedt and Arno Siebes, editors, Proceedings of the Fifth European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2001), LNCS, pages 436–447. Springer, 2001. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), 58(1):267–288, 1996. Lu´s Torgo. Regression DataSets, 2011. URL http://www.liaad.up.pt/˜ltorgo/Regression/ ı DataSets.html. Vladimir N. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, NY, 1995. Yong Wang and Ian H. Witten. Inducing model trees for continuous classes. In Proceedings of the Ninth European Conference on Machine Learning (ECML 1997) Poster Papers, 1997. Weka. Collections of datasets, 2011. URL http://www.cs.waikato.ac.nz/˜ml/weka/index_ datasets.html. Guo-Xun Yuan, Kai-Wei Chang, Cho-Jui Hsieh, and Chih-Jen Lin. A comparison of optimization methods and software for large-scale L1-regularized linear classification. Journal of Machine Learning Research, 11:3183–3234, 2010. ˇ Bernard Zenko. Learning predictive clustering rules. PhD thesis, University of Ljubljana, Faculty of computer and information science, Ljubljana, Slovenia, 2007. 2406 M ULTI -TARGET R EGRESSION WITH RULE E NSEMBLES ˇ Bernard Zenko and Saˇo Dˇ eroski. Learning classification rules for multiple target attributes. In s z Takashi Washio, Einoshin Suzuki, Kai Ming Ting, and Akihiro Inokuchi, editors, Proceedings of the 12th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2008), LNCS, pages 454–465. Springer, 2008. 2407