jmlr jmlr2010 jmlr2010-63 jmlr2010-63-reference knowledge-graph by maker-knowledge-mining

63 jmlr-2010-Learning Instance-Specific Predictive Models

Source: pdf

Author: Shyam Visweswaran, Gregory F. Cooper

Abstract: This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-speciﬁc heuristic to locate a set of suitable models to average over. We call this method the instance-speciﬁc Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using ﬁve different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. Keywords: instance-speciﬁc, Bayesian network, Markov blanket, Bayesian model averaging

reference text

D. W. Aha. Feature weighting for lazy learning algorithms. In L. Huan and M. Hiroshi, editors, Feature Extraction, Construction and Selection: A Data Mining Perspective, pages 13–32. Kluwer Academic Publisher, Norwell, MA, 1998. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. Hiton: A novel markov blanket algorithm for optimal variable selection. In Proceedings of the American Medical Informatics Association (AMIA) Annual Symposium, pages 21–5, 2003. C. F. Aliferis, A. Statnikov, I. Tsamardinos, S Mani, and X. D. Koutsoukos. Local causal and markov blanket induction for causal discovery and feature selection for classiﬁcation part i: Algorithms and empirical evaluation. Journal of Machine Learning Research, 11(Jan):171–234, 2010a. C. F. Aliferis, A. Statnikov, I. Tsamardinos, S Mani, and X. D. Koutsoukos. Local causal and markov blanket induction for causal discovery and feature selection for classiﬁcation part ii: Analysis and extensions. Journal of Machine Learning Research, 11(Jan):235–284, 2010b. C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. Artiﬁcial Intelligence Review, 11(1-5):11–73, 1997. R. Caruana and N-M. Alexandru. Data mining in metric space: An empirical analysis of supervised learning performance criteria. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 69–78, Seattle, WA, 2004. ACM Press. J. Cerquides and R. Mantaras. Robust bayesian linear classiﬁer ensembles. In Machine Learning: ECML 2005, volume 3720 of Lecture Notes in Computer Science, pages 72–83. Springer Berlin / Heidelberg, 2005. G. F. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309–347, 1992. T. Cover and P. Hart. Nearest neighbor pattern classiﬁcation. IEEE Transactions on Information Theory, 13(1):21–27, 1967. T. M. Cover and A. T. Joy. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. B. Dasarathy. Nearest Neighbor (NN) Norms: NN Pattern Classiﬁcation Techniques. IEEE Computer Society Press, Los Alamitos, California, 1991. D. Dash and G. F. Cooper. Exact model averaging with naive bayesian classiﬁers. In C. Sammut and A. Hoffmann, editors, Proceedings of the Nineteenth International Conference on Machine Learning, pages 91–98, Sydney, Australia, 2002. Morgan Kaufmann. 3366 L EARNING I NSTANCE -S PECIFIC P REDICTIVE M ODELS D. Dash and G. F. Cooper. Model averaging for prediction with discrete bayesian networks. Journal of Machine Learning Research, 5(Sep):1177–1203, 2004. U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classiﬁcation. In Proceedings of the International Joint Conference on Artiﬁcial Intelligence, pages 1022–1027, Chambry, France, 1993. Morgan Kaufmann. A. Frank and A. Asuncion. Uci machine learning repository, http://archive.ics.uci.edu/ml. 2010. URL J. H. Friedman, R. Kohavi, and Y. Yun. Lazy decision trees. In Proceedings of the Thirteenth National Conference on Artiﬁcial Intelligence, pages 717–724, Portland, Oregon, 1996. AAAI Press. N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classiﬁers. Machine Learning, 29 (2-3):131–163, 1997. N. Friedman, I. Nachman, and D. Pe’er. Learning bayesian network structure from massive datasets: The ’sparse-candidate’ algorithm. In K. B. Laskey and H. Prade, editors, Proceedings of the Fifteenth Annual Conference in Uncertainty in Artiﬁcial Intelligence, pages 206–215, Stockholm, Sweden, 1999. Morgan Kaufmann. S. Fu and M. Desmarais. Tradeoff analysis of different markov blanket local learning approaches. In PAKDD’08: Proceedings of the 12th Paciﬁc-Asia conference on Advances in knowledge discovery and data mining, pages 562–571, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3540-68124-8, 978-3-540-68124-3. C. Gottrup, K. Thomsen, P. Locht, O. Wu, A. G. Sorensen, W. J. Koroshetz, and L. Ostergaard. Applying instance-based techniques to prediction of ﬁnal outcome in acute stroke. Artiﬁcial Intelligence in Medicine, 33(3):223–236, 2005. D. J. Hand and R. J. Till. A simple generalisation of the area under the roc curve for multiple class classiﬁcation problems. Machine Learning, 45(2):171–186, 2001. D. Heckerman. A tutorial on learning with bayesian networks. In M. Jordan, editor, Learning in Graphical Models. MIT Press, Cambridge, MA, 1999. D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks - the combination of knowledge and statistical data. Machine Learning, 20(3):197–243, 1995. J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: A tutorial. Statistical Science, 14(4):382–401, 1999. K. B. Hwang and B. T. Zhang. Bayesian model averaging of bayesian network classiﬁers over multiple node-orders: application to sparse datasets. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 35(6):1302–10, 2005. R. Kohavi. Scaling up the accuracy of naive-bayes classiﬁers: A decision-tree hybrid. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 202–207, Portland, Oregon, 1996. AAAI Press. 3367 V ISWESWARAN AND C OOPER D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 284–292, 1996. M. G. Madden. A new bayesian network structure for classiﬁcation tasks. In AICS ’02: Proceedings of the 13th Irish International Conference on Artiﬁcial Intelligence and Cognitive Science, pages 203–208, London, UK, 2002a. Springer-Verlag. M. G. Madden. Evaluation of the performance of the markov blanket bayesian classiﬁer algorithm. CoRR, cs.LG/0211003, 2002b. D. Madigan and A. E. Raftery. Model selection and accounting for model uncertainty in graphical models using occam’s window. Journal of the American Statistical Association, 89:1335–1346, 1994. D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In S. A. Solla, T. K. Leen, and K.-R. Mller, editors, Proceedings of the 1999 Conference on Advances in Neural Information Processing Systems, Denver, CO, 1999. MIT Press. T. P. Minka. Bayesian model averaging is not model combination. Technical report, MIT Media Lab, 2002. A. Moore and W. K. Wong. Optimal reinsertion: A new search operator for accelerated and more accurate bayesian network structure learning. In T. Fawcett and N. Mishra, editors, Proceedings of the 20th International Conference on Machine Learning, pages 552–559. AAAI Press, 2003. R. E. Neapolitan. Learning Bayesian Networks. Prentice Hall, Upper Saddle River, New Jersey, 1st edition, 2003. M. J. Pazzani. Searching for dependencies in bayesian classiﬁers. In D. Fisher and H. J. Lenz, editors, Proceedings of the Fifth International Workshop on Artiﬁcial Intelligence and Statistics, pages 239–248, Fort Lauderdale, Florida, 1995. Springer-Verlag. M. J. Pazzani. Constructive induction of cartesian product attributes. In L. Huan and M. Hiroshi, editors, Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publisher, Norwell, MA, 1998. J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, California, 1988. A. E. Raftery, D. Madigan, and J. A. Hoeting. Model selection and accounting for model uncertainty in linear regression models. Journal of the American Statistical Association, 92:179–191, 1997. K. M. Ting, Z. Zheng, and G. I. Webb. Learning lazy rules to improve the performance of classiﬁers. In Proceedings of the Nineteenth SGES International Conference on Knowledge Based Systems and Applied Artiﬁcial Intelligence, pages 122–131, Cambridge, UK, 1999. Springer-Verlag. I. Tsamardinos and C. Aliferis. Towards principled feature selection: Relevancy, ﬁlters and wrappers. In Christopher M. Bishop and Brendan J. Frey, editors, Ninth International Workshop on Artiﬁcial Intelligence and Statistics, Key West, FL, USA, 2003. 3368 L EARNING I NSTANCE -S PECIFIC P REDICTIVE M ODELS I. Tsamardinos, L. Brown, and C. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1):31–78, 2006. S. Visweswaran and G. F. Cooper. Instance-speciﬁc bayesian model averaging for classiﬁcation. In Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems, Vancouver, Canada, 2004. S. Visweswaran and G. F. Cooper. Counting markov blanket structures. Technical Report DBMI09-12, University of Pittsburgh, 2009. S. Visweswaran, D. C. Angus, M. Hsieh, L. Weissfeld, D. Yealy, and G. F. Cooper. Learning patientspeciﬁc predictive models from clinical data. Journal of Biomedical Informatics, 43(5):669–85, 2010. L. Wasserman. Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1):92–107, 2000. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2nd edition, 2005. K. Y. Yeung, R. E. Bumgarner, and A. E. Raftery. Bayesian model averaging: Development of an improved multi-class, gene selection and classiﬁcation tool for microarray data. Bioinformatics, 21(10):2394–402, 2005. J. P. Zhang, Y. S. Yim, and J. M. Yang. Intelligent selection of instances for prediction functions in lazy learning algorithms. Artiﬁcial Intelligence Review, 11(1-5):175–191, 1997. Z. J. Zheng and G. I. Webb. Lazy learning of bayesian rules. Machine Learning, 41(1):53–84, 2000. 3369