acl acl2013 acl2013-281 acl2013-281-reference knowledge-graph by maker-knowledge-mining

281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures


Source: pdf

Author: Jose G. Moreno ; Gael Dias ; Guillaume Cleuziou

Abstract: Post-retrieval clustering is the task of clustering Web search results. Within this context, we propose a new methodology that adapts the classical K-means algorithm to a third-order similarity measure initially developed for NLP tasks. Results obtained with the definition of a new stopping criterion over the ODP-239 and the MORESQUE gold standard datasets evidence that our proposal outperforms all reported text-based approaches.


reference text

A.C. Aitken. 1926. On bernoulli’s numerical solution of algebraic equations. Research Society Edinburgh, 46:289–305. E. Amig o´, J. Gonzalo, J. Artiles, and F. Verdejo. 2009. A comparison of extrinsic clustering evaluation metrics based on formal constraints. Information Retrieval, 12(4):461–486. C. Carpineto and G. Romano. 2010. Optimal meta search results clustering. In 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 170–177. C. Carpineto, S. Osinski, G. Romano, and D. Weiss. 2009. A survey of web clustering engines. ACM Computer Survey, 41(3): 1–38. K. Church and P. Hanks. 1990. Word association norms mutual information and lexicography. Computational Linguistics, 16(1):23–29. A. Di Marco and R. Navigli. 2013. Clustering and diversifying web search results with graph-based word sense induction. Computational Linguistics, 39(4): 1–43. G. Dias, E. Alves, and J.G.P. Lopes. 2007. Topic segmentation algorithms for text summarization and passage retrieval: An exhaustive evaluation. In Proceedings of 22nd Conference on Artificial Intelligence (AAAI), pages 1334–1339. P. Ferragina and A. Gulli. 2008. A personalized search engine based on web-snippet hierarchical clustering. Software: Practice and Experience, 38(2): 189–225. P. Ferragina and U. Scaiella. 2010. Tagme: On-thefly annotation of short text fragments (by wikipedia entities). In Proceedings of the 19th ACM International Conference on Information and Knowledge Management (CIKM), pages 1625–1628. M. Kuroda, M. Sakakihara, and Z. Geng. 2008. Acceleration of the em and ecm algorithms using the aitken δ2 method for log-linear models with partially classified data. Statistics & Probability Letters, 78(15):2332–2338. A. Likasa, Vlassis. N., and J. Verbeek. The global k-means clustering algorithm. Recognition, 36:45 1–461. 2003. Pattern S.P. Lloyd. 1982. Least squares quantization in pcm. IEEE Transactions on Information Theory, 28(2): 129–137. D. Machado, T. Barbosa, S. Pais, B. Martins, and G. Dias. 2009. Universal mobile information retrieval. In Proceedings of the 5th International Conference on Universal Access in Human-Computer Interaction (HCI), pages 345–354. R. Mihalcea, C. Corley, and C. Strapparava. 2006. Corpus-based and knowledge-based measures of text semantic similarity. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI), pages 775–780. G.W. Milligan and M.C. Cooper. 1985. An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2): 159– 179. R. Navigli and G. Crisafulli. 2010. Inducing word senses to improve web search result clustering. In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 116–126. S. Osinski and D. Weiss. 2005. A concept-driven algorithm for clustering search results. IEEE Intelligent Systems, 20(3):48–54. P. Pecina and P. Schlesinger. 2006. Combining association measures for collocation extraction. In Proceedings of the Joint Conference of the International Committee on Computational Linguistics and the Association for Computational Linguistics (COLING/ACL), pages 651–658. U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita. 2012. Topical clustering of search results. In Proceedings of the 5th ACM International Conference on Web Search and Data Mining (WSDM), pages 223–232. J. Silva, G. Dias, S. Guillor e´, and J.G.P. Lopes. 1999. Using localmaxs algorithm for the extraction of contiguous and non-contiguous multiword lexical units. In Proceedings of 9th Portuguese Conference in Artificial Intelligence (EPIA), pages 113–132. M. Timonen. 2013. Term Weighting in Short Documents for Document Categorization, Keyword Extraction and Query Expansion. Ph.D. thesis, University of Helsinki, Finland. O. Zamir and O. Etzioni. 1998. Web document clustering: A feasibility demonstration. In 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), pages 46–54. 158