jmlr jmlr2010 jmlr2010-63 jmlr2010-63-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shyam Visweswaran, Gregory F. Cooper
Abstract: This paper introduces a Bayesian algorithm for constructing predictive models from data that are optimized to predict a target variable well for a particular instance. This algorithm learns Markov blanket models, carries out Bayesian model averaging over a set of models to predict a target variable of the instance at hand, and employs an instance-specific heuristic to locate a set of suitable models to average over. We call this method the instance-specific Markov blanket (ISMB) algorithm. The ISMB algorithm was evaluated on 21 UCI data sets using five different performance measures and its performance was compared to that of several commonly used predictive algorithms, including nave Bayes, C4.5 decision tree, logistic regression, neural networks, k-Nearest Neighbor, Lazy Bayesian Rules, and AdaBoost. Over all the data sets, the ISMB algorithm performed better on average on all performance measures against all the comparison algorithms. Keywords: instance-specific, Bayesian network, Markov blanket, Bayesian model averaging
D. W. Aha. Feature weighting for lazy learning algorithms. In L. Huan and M. Hiroshi, editors, Feature Extraction, Construction and Selection: A Data Mining Perspective, pages 13–32. Kluwer Academic Publisher, Norwell, MA, 1998. C. F. Aliferis, I. Tsamardinos, and A. Statnikov. Hiton: A novel markov blanket algorithm for optimal variable selection. In Proceedings of the American Medical Informatics Association (AMIA) Annual Symposium, pages 21–5, 2003. C. F. Aliferis, A. Statnikov, I. Tsamardinos, S Mani, and X. D. Koutsoukos. Local causal and markov blanket induction for causal discovery and feature selection for classification part i: Algorithms and empirical evaluation. Journal of Machine Learning Research, 11(Jan):171–234, 2010a. C. F. Aliferis, A. Statnikov, I. Tsamardinos, S Mani, and X. D. Koutsoukos. Local causal and markov blanket induction for causal discovery and feature selection for classification part ii: Analysis and extensions. Journal of Machine Learning Research, 11(Jan):235–284, 2010b. C. G. Atkeson, A. W. Moore, and S. Schaal. Locally weighted learning. Artificial Intelligence Review, 11(1-5):11–73, 1997. R. Caruana and N-M. Alexandru. Data mining in metric space: An empirical analysis of supervised learning performance criteria. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 69–78, Seattle, WA, 2004. ACM Press. J. Cerquides and R. Mantaras. Robust bayesian linear classifier ensembles. In Machine Learning: ECML 2005, volume 3720 of Lecture Notes in Computer Science, pages 72–83. Springer Berlin / Heidelberg, 2005. G. F. Cooper and E. Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309–347, 1992. T. Cover and P. Hart. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13(1):21–27, 1967. T. M. Cover and A. T. Joy. Elements of Information Theory. Wiley-Interscience, 2nd edition, 2006. B. Dasarathy. Nearest Neighbor (NN) Norms: NN Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos, California, 1991. D. Dash and G. F. Cooper. Exact model averaging with naive bayesian classifiers. In C. Sammut and A. Hoffmann, editors, Proceedings of the Nineteenth International Conference on Machine Learning, pages 91–98, Sydney, Australia, 2002. Morgan Kaufmann. 3366 L EARNING I NSTANCE -S PECIFIC P REDICTIVE M ODELS D. Dash and G. F. Cooper. Model averaging for prediction with discrete bayesian networks. Journal of Machine Learning Research, 5(Sep):1177–1203, 2004. U. M. Fayyad and K. B. Irani. Multi-interval discretization of continuous-valued attributes for classification. In Proceedings of the International Joint Conference on Artificial Intelligence, pages 1022–1027, Chambry, France, 1993. Morgan Kaufmann. A. Frank and A. Asuncion. Uci machine learning repository, http://archive.ics.uci.edu/ml. 2010. URL J. H. Friedman, R. Kohavi, and Y. Yun. Lazy decision trees. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, pages 717–724, Portland, Oregon, 1996. AAAI Press. N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Machine Learning, 29 (2-3):131–163, 1997. N. Friedman, I. Nachman, and D. Pe’er. Learning bayesian network structure from massive datasets: The ’sparse-candidate’ algorithm. In K. B. Laskey and H. Prade, editors, Proceedings of the Fifteenth Annual Conference in Uncertainty in Artificial Intelligence, pages 206–215, Stockholm, Sweden, 1999. Morgan Kaufmann. S. Fu and M. Desmarais. Tradeoff analysis of different markov blanket local learning approaches. In PAKDD’08: Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining, pages 562–571, Berlin, Heidelberg, 2008. Springer-Verlag. ISBN 3540-68124-8, 978-3-540-68124-3. C. Gottrup, K. Thomsen, P. Locht, O. Wu, A. G. Sorensen, W. J. Koroshetz, and L. Ostergaard. Applying instance-based techniques to prediction of final outcome in acute stroke. Artificial Intelligence in Medicine, 33(3):223–236, 2005. D. J. Hand and R. J. Till. A simple generalisation of the area under the roc curve for multiple class classification problems. Machine Learning, 45(2):171–186, 2001. D. Heckerman. A tutorial on learning with bayesian networks. In M. Jordan, editor, Learning in Graphical Models. MIT Press, Cambridge, MA, 1999. D. Heckerman, D. Geiger, and D. M. Chickering. Learning bayesian networks - the combination of knowledge and statistical data. Machine Learning, 20(3):197–243, 1995. J. A. Hoeting, D. Madigan, A. E. Raftery, and C. T. Volinsky. Bayesian model averaging: A tutorial. Statistical Science, 14(4):382–401, 1999. K. B. Hwang and B. T. Zhang. Bayesian model averaging of bayesian network classifiers over multiple node-orders: application to sparse datasets. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics, 35(6):1302–10, 2005. R. Kohavi. Scaling up the accuracy of naive-bayes classifiers: A decision-tree hybrid. In E. Simoudis, J. Han, and U. Fayyad, editors, Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, pages 202–207, Portland, Oregon, 1996. AAAI Press. 3367 V ISWESWARAN AND C OOPER D. Koller and M. Sahami. Toward optimal feature selection. In Proceedings of the Thirteenth International Conference on Machine Learning, pages 284–292, 1996. M. G. Madden. A new bayesian network structure for classification tasks. In AICS ’02: Proceedings of the 13th Irish International Conference on Artificial Intelligence and Cognitive Science, pages 203–208, London, UK, 2002a. Springer-Verlag. M. G. Madden. Evaluation of the performance of the markov blanket bayesian classifier algorithm. CoRR, cs.LG/0211003, 2002b. D. Madigan and A. E. Raftery. Model selection and accounting for model uncertainty in graphical models using occam’s window. Journal of the American Statistical Association, 89:1335–1346, 1994. D. Margaritis and S. Thrun. Bayesian network induction via local neighborhoods. In S. A. Solla, T. K. Leen, and K.-R. Mller, editors, Proceedings of the 1999 Conference on Advances in Neural Information Processing Systems, Denver, CO, 1999. MIT Press. T. P. Minka. Bayesian model averaging is not model combination. Technical report, MIT Media Lab, 2002. A. Moore and W. K. Wong. Optimal reinsertion: A new search operator for accelerated and more accurate bayesian network structure learning. In T. Fawcett and N. Mishra, editors, Proceedings of the 20th International Conference on Machine Learning, pages 552–559. AAAI Press, 2003. R. E. Neapolitan. Learning Bayesian Networks. Prentice Hall, Upper Saddle River, New Jersey, 1st edition, 2003. M. J. Pazzani. Searching for dependencies in bayesian classifiers. In D. Fisher and H. J. Lenz, editors, Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics, pages 239–248, Fort Lauderdale, Florida, 1995. Springer-Verlag. M. J. Pazzani. Constructive induction of cartesian product attributes. In L. Huan and M. Hiroshi, editors, Feature Extraction, Construction and Selection: A Data Mining Perspective. Kluwer Academic Publisher, Norwell, MA, 1998. J. Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, San Mateo, California, 1988. A. E. Raftery, D. Madigan, and J. A. Hoeting. Model selection and accounting for model uncertainty in linear regression models. Journal of the American Statistical Association, 92:179–191, 1997. K. M. Ting, Z. Zheng, and G. I. Webb. Learning lazy rules to improve the performance of classifiers. In Proceedings of the Nineteenth SGES International Conference on Knowledge Based Systems and Applied Artificial Intelligence, pages 122–131, Cambridge, UK, 1999. Springer-Verlag. I. Tsamardinos and C. Aliferis. Towards principled feature selection: Relevancy, filters and wrappers. In Christopher M. Bishop and Brendan J. Frey, editors, Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, 2003. 3368 L EARNING I NSTANCE -S PECIFIC P REDICTIVE M ODELS I. Tsamardinos, L. Brown, and C. Aliferis. The max-min hill-climbing bayesian network structure learning algorithm. Machine Learning, 65(1):31–78, 2006. S. Visweswaran and G. F. Cooper. Instance-specific bayesian model averaging for classification. In Proceedings of the Eighteenth Annual Conference on Neural Information Processing Systems, Vancouver, Canada, 2004. S. Visweswaran and G. F. Cooper. Counting markov blanket structures. Technical Report DBMI09-12, University of Pittsburgh, 2009. S. Visweswaran, D. C. Angus, M. Hsieh, L. Weissfeld, D. Yealy, and G. F. Cooper. Learning patientspecific predictive models from clinical data. Journal of Biomedical Informatics, 43(5):669–85, 2010. L. Wasserman. Bayesian model selection and model averaging. Journal of Mathematical Psychology, 44(1):92–107, 2000. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, 2nd edition, 2005. K. Y. Yeung, R. E. Bumgarner, and A. E. Raftery. Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics, 21(10):2394–402, 2005. J. P. Zhang, Y. S. Yim, and J. M. Yang. Intelligent selection of instances for prediction functions in lazy learning algorithms. Artificial Intelligence Review, 11(1-5):175–191, 1997. Z. J. Zheng and G. I. Webb. Lazy learning of bayesian rules. Machine Learning, 41(1):53–84, 2000. 3369