emnlp emnlp2013 emnlp2013-44 emnlp2013-44-reference knowledge-graph by maker-knowledge-mining

44 emnlp-2013-Centering Similarity Measures to Reduce Hubs

Source: pdf

Author: Ikumi Suzuki ; Kazuo Hara ; Masashi Shimbo ; Marco Saerens ; Kenji Fukumizu

Abstract: The performance of nearest neighbor methods is degraded by the presence of hubs, i.e., objects in the dataset that are similar to many other objects. In this paper, we show that the classical method of centering, the transformation that shifts the origin of the space to the data centroid, provides an effective way to reduce hubs. We show analytically why hubs emerge and why they are suppressed by centering, under a simple probabilistic model of data. To further reduce hubs, we also move the origin more aggressively towards hubs, through weighted centering. Our experimental results show that (weighted) centering is effective for natural language data; it improves the performance of the k-nearest neighbor classi- fiers considerably in word sense disambiguation and document classification tasks.

reference text

Arindam Banerjee, Inderjit S. Dhillon, Joydeep Ghosh, and Suvrit Sra. 2005. Clustering on the unit hypersphere using von Mises-Fisher distributions. Journal of Machine Learning Research, 6: 1345–1382. P. Yu. Chebotarev and E. V. Shamis. 1997. The matrixforest theorem and measuring relations in small social groups. Automation and Remote Control, 58(9): 1505– 1514. Yihua Chen, Eric K. Garcia, Maya R. Gupta, Ali Rahimi, and Luca Cazzanti. 2009. Similarity-based classification: Concepts and algorithms. Journal of Machine Learning Research, 10:747–776. Bart Decadt, V ´eronique Hoste, Walter Daelemans, and Antal Van den Bosch. 2004. GAMBL, genetic algorithm optimization of memory-based WSD. In Rada Mihalcea and Phil Edmonds, editors, Proceedings of the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval3), pages 108–1 12. L. Eriksson, E. Johansson, N. Kettaneh-Wold, J. Trygg, C. Wikstr¨ om, and S. Wold. 2006. Multi- and Megavariate Data Analysis, Part 1, Basic Principles and Applications. Umetrics, Inc. Katrin Erk and Sebastian Pad o´. 2008. A structured vector space model for word meaning in context. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP ’08), pages 897–906, Honolulu, Hawaii, USA. Douglas H. Fisher and Hans-Joachim Lenz, editors. 1996. Learning from Data: Artificial Intelligence and Statistics V: Workshop on Artificial Intelligence and Statistics. Lecture Notes in Statistics 112. Springer. Ruixin Guo and Sounak Chakraborty. 2010. Bayesian adaptive nearest neighbor. Statistical Analysis and Data Mining, 3(2):92–105. Daniel Jurafsky and James H. Martin. 2008. Speech and Language Processing. Prentice Hall, 2nd edition. David D. Lewis, Yiming Yang, Tony G. Rose, and Fan Li. 2004. RCV1 : a new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397. Colin Mallows. 1991 . Another comment on O’Cinneide. The American Statistician, 45(3):257. K. V. Mardia and P. Jupp. 2000. Directional Statistics. John Wiley and Sons, 2nd edition. K. V. Mardia, J. T. Kent, and J. M. Bibby. 1979. Multivariate Analysis. Academic Press. Brij M. Masand, Gordon Linoff, and David L. Waltz. 1992. Classifying news stories using memory based reasoning. In Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’92), pages 59–65. Rada Mihalcea, Timothy Chklovski, and Adam Kilgarriff. 2004. The Senseval-3 English lexical sample task. In Rada Mihalcea and Phil Edmonds, editors, Proceedings of the 3rd International Workshop on the Evaluation of Systems for the Semantic Analysis of Text (Senseval-3), pages 25–28, Barcelona, Spain. Rada Mihalcea. 2004. Co-training and self-training for word sense disambiguation. In Hwee Tou Ng and Ellen Riloff, editors, Proceedings of the 8th Conference on Computational Natural Language Learning (CoNLL ’04), pages 33–40, Boston, Massachusetts, USA. Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of the 46th Annual Meeting of the Association of Computational Linguistics: Human Language Technologies (ACL ’08), pages 236–244, Columbus, Ohio, USA. Roberto Navigli. 2009. Word sense disambiguation: A survey. ACM Computing Surveys, 41:10: 1–10:69. Ali Mustafa Qamar, E´ric Gaussier, Jean-Pierre Chevallet, and Joo-Hwee Lim. 2008. Similarity learning for nearest neighbor classification. In Proceedings of the 8th International Conference on Data Mining (ICDM ’08), pages 983–988, Pisa, Italy. Milo sˇ Radovanovi c´, Alexandros Nanopoulos, and Mirjana Ivanovi c´. 2010a. Hubs in space: Popular nearest neighbors in high-dimensional data. Journal of Machine Learning Research, 11:2487–253 1. Milo sˇ Radovanovi c´, Alexandros Nanopoulos, and Mirjana Ivanovi c´. 2010b. On the existence of obstinate 623 results in vector space models. In Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’10), pages 186–193, Geneva, Switzerland. Marco Saerens, Fran ¸cois Fouss, Luh Yen, and Pierr Dupont. 2004. The principal components analysis of graph, and its relationships to spectral clustering. In Proceedings of the 15th European Conference on Machine Learning (ECML ’04), Lecture Notes in Artificial Intelligence 3201, pages 371–383, Pisa, Italy. Springer. Dominik Schnitzer, Arthur Flexer, Markus Schedl, and Gerhard Widmer. 2012. Local and global scaling reduce hubs in space. Journal of Machine Learning Research, 13:2871–2902. Hinrich Sch u¨tze. 1998. Automatic word sense discrimination. Computational Linguistics, 24:97–123. Alexander J. Smola and Risi Kondor. 2003. Kernels and regularization on graphs. In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, Proceedings, Lecture Notes in Artificial Intelligence 2777, pages 144– 158. Springer. Anders Søgaard. 2011. Semisupervised condensed nearest neighbor for part-of-speech tagging. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL ’11), pages 48– 52, Portland, Oregon, USA. Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, Yuji Matsumoto, and Marco Saerens. 2012. Investigating the effectiveness of Laplacian-based kernels in hub reduction. In Proceedings of the 26th AAAI Conference on Artificial Intelligence (AAAI-12), pages 1112–1 118, Toronto, Ontario, Canada. Stefan Thater, Hagen F ¨urstenau, and Manfred Pinkal. 2010. Contextualizing semantic representations using syntactically enriched vector models. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL ’10), pages 948–957, Uppsala, Sweden. Jigang Wang, Predrag Neskovic, and Leon N. Cooper. 2006. Neighborhood size selection in the k-nearestneighbor rule using statistical confidence. Pattern Recognition, 39(3):417–423. Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research, 10:207–244. Yiming Yang and Xin Liu. 1999. A re-examination of text categorization methods. In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’99), pages 42–49, Berkeley, California, USA.