acl acl2011 acl2011-137 acl2011-137-reference knowledge-graph by maker-knowledge-mining

137 acl-2011-Fine-Grained Class Label Markup of Search Queries

Source: pdf

Author: Joseph Reisinger ; Marius Pasca

Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.

reference text

R. Baeza-Yates and A. Tiberi. 2007. Extracting semantic relations from query logs. In Proceedings of the 13th ACM Conference on Knowledge Discovery and Data Mining (KDD-07), pages 76–85. San Jose, California. 1208 D. Beeferman and A. Berger. 2000. Agglomerative clustering of a search engine query log. In Proceedings of the 6th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD-00), pages 407–416. S. Bergsma and Q. Wang. 2007. Learning noun phrase query segmentation. In Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP-07), pages 819–826. Prague, Czech Republic. J. Dean and S. Ghemawat. 2004. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Symposium on Operating Systems Design and Implementation (OSDI-04), pages 137–150. San Francisco, California. J. Finkel, C. Manning, and A. Ng. 2006. Solving the problem of cascading errors: Approximate Bayesian inference for linguistic annotation pipelines. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP-06), pages 618–626. Sydney, Australia. M. Hearst. 1992. Automatic acquisition of hyponyms from large text corpora. In Proceedings of the 14th International Conference on Computational Linguistics (COLING-92), pages 539–545. Nantes, France. M. Johnson. 2010. PCFGs, topic models, adaptor grammars and learning topical collocations and the structure of proper names. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), pages 1148–1 157. Uppsala, Sweden. M. Johnson, T. Griffiths, and S. Goldwater. 2007a. Adaptor grammars: a framework for specifying compositional nonparametric bayesian models. In Advances in Neural Information Processing Systems 19, pages 641–648. Vancouver, Canada. M. Johnson, T. Griffiths, and S. Goldwater. 2007b. Bayesian inference for PCFGs via Markov Chain Monte Carlo. In Proceedings of the 2007 Conference of the North American Association for Computational Linguistics (NAACL-HLT-07), pages 139–146. Rochester, New York. R. Jones, B. Rey, O. Madani, and W. Greiner. 2006. Generating query substitutions. In Proceedings of the 15h World Wide Web Conference (WWW-06), pages 387– 396. Edinburgh, Scotland. X. Li. 2010. Understanding the semantic structure of noun phrase queries. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), pages 1337–1345. Uppsala, Sweden. P. Liang, S. Petrov, M. Jordan, and D. Klein. 2007. The infinite PCFG using hierarchical Dirichlet processes. In Proceedings of the 2007 Conference on Empirical Methods in Natural Language Processing (EMNLP07), pages 688–697. Prague, Czech Republic. M. Pas ¸ca. 2010. The role of queries in ranking labeled instances extracted from text. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING-10), pages 955–962. Beijing, China. A. Popescu, P. Pantel, and G. Mishne. 2010. Semantic lexicon adaptation for use in query interpretation. In Proceedings of the 19th World Wide Web Conference (WWW-10), pages 1167–1 168. Raleigh, North Carolina. A. Ritter, Mausam, and O. Etzioni. 2010. A latent Dirichlet allocation method for selectional preferences. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), pages 424–434. Uppsala, Sweden. A. Smola and S. Narayanamurthy. 2010. An architecture for parallel topic models. In Proceedings of the 36th Conference on Very Large Data Bases (VLDB10), pages 703–710. singapore. R. Snow, D. Jurafsky, and A. Ng. 2006. Semantic taxonomy induction from heterogenous evidence. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLINGACL-06), pages 801–808. Sydney, Australia. I. Szpektor, I. Dagan, R. Bar-Haim, and J. Goldberger. 2008. Contextual preferences. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics (ACL-08), pages 683–691. Columbus, Ohio. P. Talukdar and F. Pereira. 2010. Experiments in graphbased semi-supervised learning methods for classinstance acquisition. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-10), pages 1473–1481. Upp- sala, Sweden. B. Tan and F. Peng. 2008. Unsupervised query segmentation using generative language models and Wikipedia. In Proceedings of the 17th World Wide Web Conference (WWW-08), pages 347–356. Beijing, China. Y. Teh, M. Jordan, M. Beal, and D. Blei. 2006. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476): 1566–1581. S. Tratz and E. Hovy. 2010. A taxonomy, dataset, and classifier for automatic noun compound interpretation. In Proceedings ofthe 48thAnnualMeeting oftheAsso1209 ciationfor Computational Linguistics (ACL-10), pages 678–687. Uppsala, Sweden. B. Van Durme and M. Pas ¸ca. 2008. Finding cars, goddesses and enzymes: Parametrizable acquisition of labeled instances for open-domain information extraction. In Proceedings of the 23rd National Conference on Artificial Intelligence (AAAI-08), pages 1243– 1248. Chicago, Illinois. T. Wang, R. Hoffmann, X. Li, and J. Szymanski. 2009. Semi-supervised learning of semantic classes for query understanding: from the Web and for the Web. In Proceedings of the 18th International Conference on Information and Knowledge Management (CIKM-09), pages 37–46. Hong Kong, China.