jmlr jmlr2009 jmlr2009-15 jmlr2009-15-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Luke K. McDowell, Kalyan Moy Gupta, David W. Aha
Abstract: Many collective classification (CC) algorithms have been shown to increase accuracy when instances are interrelated. However, CC algorithms must be carefully applied because their use of estimated labels can in some cases decrease accuracy. In this article, we show that managing this label uncertainty through cautious algorithmic behavior is essential to achieving maximal, robust performance. First, we describe cautious inference and explain how four well-known families of CC algorithms can be parameterized to use varying degrees of such caution. Second, we introduce cautious learning and show how it can be used to improve the performance of almost any CC algorithm, with or without cautious inference. We then evaluate cautious inference and learning for the four collective inference families, with three local classifiers and a range of both synthetic and real-world data. We find that cautious learning and cautious inference typically outperform less cautious approaches. In addition, we identify the data characteristics that predict more substantial performance differences. Our results reveal that the degree of caution used usually has a larger impact on performance than the choice of the underlying inference algorithm. Together, these results identify the most appropriate CC algorithms to use for particular task characteristics and explain multiple conflicting findings from prior CC research. Keywords: collective inference, statistical relational learning, approximate probabilistic inference, networked data, cautious inference
Regina Barzilay and Mirella Lapata. Collective content selection for concept-to-text generation. In Proceedings of the Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP), pages 331–338, 2005. 2831 M C D OWELL , G UPTA AND A HA Julian Besag. On the statistical analysis of dirty pictures. Journal of the Royal Statistical Society, 48(3):259–302, 1986. Julian Besag. Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society, 36(2):192–236, 1974. Mustafa Bilgic and Lise Getoor. Effective label acquisition for collective classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 43–51, 2008. B´ la Bollob´ s, Christian Borgs, Jennifer Chayes, and Oliver Riordan. Directed scale-free graphs. In e a Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 132–139, 2003. Soumen Chakrabarti, Byron Dom, and Piotr Indyk. Enhanced hypertext categorization using hyperlinks. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 307–318, 1998. Mark Craven, Dan DiPasquo, Dayne Freitag, Andrew K. McCallum, Tom M. Mitchell, Kamal Nigam, and Se´ n Slattery. Learning to extract symbolic knowledge from the World Wide Web. a In Proceedings of the 15th Conference of the American Association for Artificial Intelligence (AAAI), pages 509–516, 1998. Stephen Della Pietra, Vincent Della Pietra, and John Lafferty. Inducing features of random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(4):380–393, 1997. Roland L. Dobrushin. The description of a random field by means of conditional probabilities and conditions of its regularity. Theory of Probability and its Applications, 13(2):197–224, 1968. Andrew Fast and David Jensen. Why stacked models perform effective collective classification. In Proceedings of the IEEE International Conference on Data Mining (ICDM), 2008. Karen Yuen Fung and Barbara A. Wrobel. The treatment of missing values in logistic regression. Biometrical Journal, 31(1):35–47, 1989. Brian Gallagher and Tina Eliassi-Rad. Leveraging label-independent features for classification in sparsely labeled networks: An empirical study. In Proceedings of the 2nd Workshop on Social Network Mining and Analysis at the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008. Brian Gallagher, Hanghang Tong, Tina Eliassi-Rad, and Christos Faloutsos. Using ghost edges for classification in sparsely labeled networks. In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 256–264, 2008. Aram Galstyan and Paul R. Cohen. Empirical comparison of “hard” and “soft” label propagation for relational classification. In Proceedings of the 17th International Conference on Inductive Logic Programming (ILP), pages 98–111, 2007. Tayfun Gurel and Kristian Kersting. On the trade-off betweeen iterative classification and collective classification: first experimental results. In Working Notes of the 3rd International ECML/PKDD Workshop on Mining Graphs, Trees, and Sequences, 2005. 2832 C AUTIOUS C OLLECTIVE C LASSIFICATION David Heckerman. A tutorial on learning with bayesian networks. In M. Jordan, editor, Learning in Graphical Models. MIT Press, 1999. Andreas Heß and Nicholas Kushmerick. Iterative ensemble classification for relational data: A case study of semantic web services. In Proceedings of the 15th European Conference on Machine Learning (ECML), pages 156–167, 2004. Cecil Huang and Adnan Darwiche. Inference in belief networks: A procedural guide. International Journal of Approximate Reasoning, 15(3):225–263, 1996. David Jensen and Jennifer Neville. Linkage and autocorrelation cause feature selection bias in relational learning. In Proceedings of the 19th International Conference on Machine Learning (ICML), pages 259–266, 2002. David Jensen, Jennifer Neville, and Michael Hay. Avoiding bias when aggregating relational data with degree disparity. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 274–281, 2003. David Jensen, Jennifer Neville, and Brian Gallagher. Why collective inference improves relational classification. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 593–598, 2004. Ron Kohavi and George H. John. Wrappers for feature subset selection. Artifical Intelligence, 97 (1-2):273–324, 1997. Daphne Koller, Nir Friedman, Lise Getoor, and Benjamin Taskar. Graphical models in a nutshell. In L. Getoor and B. Taskar, editors, An Introduction to Statistical Relational Learning. MIT Press, 2007. Zhenzhen Kou and William W. Cohen. Stacked graphical models for efficient inference in Markov Random Fields. In Proceedings of the 7th SIAM International Conference on Data Mining (SDM), pages 533–538, 2007. Qing Lu and Lise Getoor. Link-based classification. In Proceedings of the 20th International Conference on Machine Learning (ICML), pages 496–503, 2003a. Qing Lu and Lise Getoor. Link-based classification using labeled and unlabeled data. In Proceedings of the Workshop on the Continuum from Labeled to Unlabeled data at the 20th International Conference on Machine Learning (ICML), 2003b. Sofus A. Macskassy. Improving learning in networked data by combining explicit and mined links. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pages 590–595, 2007. Sofus A. Macskassy and Foster Provost. Suspicion scoring based on guilt-by-association, collective inference, and focused data access. In Proceedings of the International Conference on Intelligence Analysis, 2005. Sofus A. Macskassy and Foster Provost. Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8:935–983, 2007. 2833 M C D OWELL , G UPTA AND A HA Sofus A. Macskassy and Foster Provost. A brief survey of machine learning methods for classification in networked data and an application to suspicion scoring. In Proceedings of the Workshop on Statistical Network Analysis at the 23rd International Conference on Machine Learning (ICML), 2006. Andrew McCallum, Dayne Freitag, and Fernando C. N. Pereira. Maximum entropy markov models for information extraction and segmentation. In Proceedings of the 17th International Conference on Machine Learning, pages 591–598, 2000a. Andrew McCallum, Kamal Nigam, Jason Rennie, and Kristie Seymore. Automating the construction of internet portals with machine learning. Information Retrieval, 3:127–163, 2000b. Luke K. McDowell, Kalyan Moy Gupta, and David W. Aha. Cautious inference in collective classification. In Proceedings of the 22nd AAAI Conference on Artificial Intelligence, pages 596–601, 2007a. Luke K. McDowell, Kalyan Moy Gupta, and David W. Aha. Case-based collective classification. In Proceedings of the 20th International Florida Artificial Intelligence Research Society Conference (FLAIRS), pages 399–404, 2007b. Robert J. McEliece, David J. C. MacKay, and Jung-Fu Cheng. Turbo decoding as an instance of Pearl’s “belief propagation” algorithm. IEEE Journal on Selected Areas in Communications, 16 (2):140–152, 1998. Miller McPherson, Lynn Smith-Lovin, and James M. Cook. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415–444, 2001. Kevin P. Murphy, Yair Weiss, and Michael I. Jordan. Loopy belief propagation for approximate inference: An empirical study. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence, pages 467–475, 1999. Jennifer Neville and David Jensen. A bias/variance decomposition for models using collective inference. Machine Learning Journal, 73(1):87–106, 2008. Jennifer Neville and David Jensen. Iterative classification in relational data. In Proceedings of the Workshop on Learning Statistical Models from Relational Data at the 17th National Conference on Artificial Intelligence (AAAI), pages 13–20, 2000. Jennifer Neville and David Jensen. Leveraging relational autocorrelation with latent group models. In Proceedings of the 5th IEEE International Conference on Data Mining (ICDM), pages 170– 177, 2005. Jennifer Neville and David Jensen. Relational dependency networks. Journal of Machine Learning Research, 8:653–692, 2007. Jennifer Neville, David Jensen, Lisa Friedland, and Michael Hay. Learning relational probability trees. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 625–630, 2003a. 2834 C AUTIOUS C OLLECTIVE C LASSIFICATION Jennifer Neville, David Jensen, and Brian Gallagher. Simple estimators for relational bayesian classifiers. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM), pages 609–612, 2003b. ¨ u Jennifer Neville, Ozg¨ r Simsek, David Jensen, John Komoroske, Kelly Palmer, and Henry G. Goldberg. Using relational knowledge discovery to prevent securities fraud. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 449–458, 2005. Mark E. Newman. Mixing patterns in networks. Physical Review E, 67(2):026126, 2003. Judea Pearl. Probabilistic Reasoning in Intelligent Systems. Morgan Kaufmann, 1988. Matthew J. Rattigan, Marc Maier, David Jensen, Bin Wu, Xin Pei, JianBin Tan, and Yi Wang:. Exploiting network structure for active inference in collective classification. In Proceedings of the Workshop on Mining Graphs and Complex Structures at the 7th IEEE International Conference on Data Mining (ICDM), pages 429–434, 2007. Matthew Richardson and Pedro Domingos. Markov logic networks. Machine Learning, 62(1-2): 107–136, 2006. Maytal Saar-Tsechansky and Foster Provost. Handling missing values when applying classification models. Journal of Machine Learning Research, 8(Jul):1623–1657, 2007. Prithviraj Sen. Personal communication, 2008. Prithviraj Sen and Lise Getoor. Empirical comparison of approximate inference algorithms for networked data. In Proceedings of the Workshop on Open Problems in Statistical Relational Learning at the 23rd International Conference on Machine Learning (ICML), 2006. Prithviraj Sen and Lise Getoor. Link-based classification. Technical Report CS-TR-4858, University of Maryland, College Park, MD, February 2007. Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher, and Tina Eliassi-Rad. Collective classification in network data. AI Magazine, Special Issue on AI and Networks, 29(3): 93–106, 2008. Ben Taskar, Pieter Abbeel, and Daphne Koller. Discriminative probalistic models for relational data. In Proceedings of the 18th Conference on Uncertainity in Artificial Intelligence (UAI), pages 485–492, 2002. YongHong Tian, Tiejun Huang, and Wen Gao. Latent linkage semantic kernels for collective classification of link data. Journal of Intelligent Information Systems, 26(3):269–301, 2006. Rudolph Triebel, Richard Schmidt, Oscar Martinez Mozos, and Wolfram Burgard. Instance-based AMN classification for improved object recognition in 2D and 3D laser range data. In Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), pages 2225– 2230, 2007. Jonathan S. Yedidia, William T. Freeman, and Yair Weiss. Generalized belief propagation. Advances in Neural Information Processing Systems (NIPS), 13:689–695, 2000. 2835 M C D OWELL , G UPTA AND A HA Nevin Lianwen Zhang and David Poole. Exploiting causal independence in bayesian network inference. Journal of Artificial Intelligence Research, 5:301–328, 1996. Bin Zhao, Prithviraj Sen, and Lise Getoor. Event classification and relationship labeling in affiliation networks. In Proceedings of the Workshop on Statistical Network Analysis (SNA) at the 23rd International Conference on Machine Learning (ICML), 2006. 2836