acl acl2011 acl2011-182 knowledge-graph by maker-knowledge-mining

182 acl-2011-Joint Annotation of Search Queries


Source: pdf

Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith

Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. [sent-13, score-1.176]

2 Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. [sent-14, score-0.411]

3 To address this challenge, we propose a probabilistic approach for performing joint query annotation. [sent-15, score-0.672]

4 We evaluate our method using a range of queries extracted from a web search log. [sent-18, score-0.402]

5 Accordingly, in this paper, we focus on annotating search queries submitted by the users to a search engine. [sent-22, score-0.445]

6 , news 102 standard NLP techniques for annotating queries and require development of novel annotation approaches for query corpora (Bergsma and Wang, 2007; Barr et al. [sent-25, score-1.12]

7 Most search queries are very short, and even longer queries are usually shorter than the average written sentence. [sent-30, score-0.671]

8 Due to their brevity, queries often cannot be divided into sub-parts, and do not provide enough context for accurate annotations to be made using the standard NLP tools such as taggers, parsers or chunkers, which are trained on more syntactically coherent textual units. [sent-31, score-0.515]

9 A recent analysis of web query logs by Bendersky and Croft (2009) shows, however, that despite their brevity, queries are grammatically diverse. [sent-32, score-0.883]

10 Some queries are keyword concatenations, some are semicomplete verbal phrases and some are wh-questions. [sent-33, score-0.573]

11 It is essential for the search engine to correctly annotate the query structure, and the quality of these query annotations has been shown to be a crucial first step towards the development of reliable and robust query processing, representation and understanding algorithms (Barr et al. [sent-34, score-1.852]

12 However, in current query annotation systems, even sentence-like queries are often hard to parse and annotate, as they are prone to contain misspellings and idiosyncratic grammatical structures. [sent-38, score-1.137]

13 In this paper, we propose a novel joint query annotation method to improve the effectiveness of existing query annotations, especially for longer, more complex search queries. [sent-43, score-1.498]

14 To this end, we propose a probabilistic method for performing a joint query annotation. [sent-48, score-0.672]

15 For instance, our method can leverage the information about estimated partsof-speech tags and capitalization of query terms to improve the accuracy of query segmentation. [sent-50, score-1.285]

16 We empirically evaluate the joint query annotation method on a range of query types. [sent-51, score-1.418]

17 , 2008), we also explore the performance of our annotations with more complex natural language search queries such as verbal phrases and whquestions, which often pose a challenge for IR applications (Bendersky et al. [sent-54, score-0.696]

18 103 We show that even with a very limited amount of training data, our joint annotation method significantly outperforms annotations that were done independently for these queries. [sent-56, score-0.559]

19 Then, in Section 3, we introduce our joint query annotation method. [sent-59, score-0.886]

20 In Section 4 we describe two types of independent query annotations that are used as input for the joint query annotation. [sent-60, score-1.449]

21 In this scheme, each query is markedup using three annotations: capitalization, POS tags, and segmentation indicators. [sent-64, score-0.711]

22 Note that all the query terms are non-capitalized, and no punctuation is provided by the user, which complicates the query annotation process. [sent-65, score-1.309]

23 3 Joint Query Annotation Given a search query Q, which consists of a sequence of terms (q1, . [sent-72, score-0.583]

24 In other words, each symbol ζi ∈ zQ annotates a single query term. [sent-81, score-0.532]

25 Many query annotations that are useful for IR can be represented using this simple form, including capitalization, POS tagging, phrase chunking, named entity recognition, and stopword indicators, to name just a few. [sent-82, score-0.773]

26 Most previous work on query annotation makes the independence assumption every annotation zQ ∈ ZQ is done separately from the others. [sent-85, score-1.022]

27 That is, it is∈ ∈a Zssumed that the optimal linguistic annotation is the annotation that has the highest probability given the query Q, regardless of the other annotations in the set ZQ. [sent-86, score-1.253]

28 Knowing that a query term is capitalized, we are more 104 likely to decide that it is a proper noun. [sent-90, score-0.532]

29 To address the problem of joint query annotation, we first assume that we have an initial set of annotations which were performed for query Q independently of one another (we will show an example of how to derive such a set in Section 4). [sent-93, score-1.408]

30 Figure 2 outlines the algorithm for performing the joint query annotation. [sent-103, score-0.672]

31 It then produces a set of independent annotation estimates, which are jointly used, together with the ground truth annotations, to learn a CRF model for each annotation type. [sent-105, score-0.561]

32 Note that this formulation of joint query annotation can be viewed as a stacked classification, in which a second, more effective, classifier is trained using the labels inferred by the first classifier as features. [sent-107, score-0.931]

33 The main benefits of these two annotation methods are that they can be easily implemented using standard software tools, do not require any labeled data, and provide reasonable annotation accuracy. [sent-112, score-0.49]

34 (2010) take a bag-of-words approach, and assume independence between both the query terms and the corresponding annotation symbols. [sent-118, score-0.777]

35 (2010) we use a large n-gram corpus (Brants and Franz, 2006) to estimate p(ζi |qi) for annotating the query with capitalization and segmentation mark-up, and a standard POS tagger1 for part-of-speech tagging of the query. [sent-126, score-1.046]

36 For instance, a keyword query hawaiian falls, which refers to a location, is inaccurately interpreted by a standard POS tagger as a noun-verb pair. [sent-130, score-0.718]

37 On the other hand, given a sentence from a corpus that is relevant to the query such as “Hawaiian Falls is a family-friendly waterpark”, the word “falls” is correctly identified by a standard POS tagger as a proper noun. [sent-131, score-0.532]

38 Accordingly, the document corpus can be bootstrapped in order to better estimate the query annotation. [sent-132, score-0.556]

39 (2010) employ the pseudo-relevance feedback (PRF) a method that has a long record of success in IR for tasks such as query expansion (Buckley, 1995; Lavrenko and Croft, 2001). [sent-134, score-0.532]

40 Xr∈C Since for most sentences the conditional probability of relevance to the query p(r|Q) is vanish- ingly small, etvhaen acbeo vtoe can q buee closely approximated 1http : / / cr ft agge r . [sent-136, score-0.558]

41 net / by considering only a set of sentences R, retrieved at top-k positions in response to the query Q. [sent-138, score-0.532]

42 Xr∈R Intuitively, the equation above models the query as a mixture of top-k retrieved sentences, where each sentence is weighted by its relevance to the query. [sent-140, score-0.558]

43 3, since here the annotation symbols are not independent given the query Q. [sent-143, score-0.848]

44 5 Related Work In recent years, linguistic annotation of search queries has been receiving increasing attention as an important step toward better query processing and understanding. [sent-154, score-1.164]

45 The literature on query annotation includes query segmentation (Bergsma and Wang, 2007; Jones et al. [sent-155, score-1.488]

46 Most of the previous work on query annotation focuses on performing a particular annotation task (e. [sent-166, score-1.053]

47 However, these annotations are often related, and thus we take a joint annotation approach, which 106 combines several independent annotations to improve the overall annotation accuracy. [sent-169, score-1.08]

48 (2008) focus on query refinement (spelling corrections, word splitting, etc. [sent-174, score-0.564]

49 Instead, we are interested in annotation of queries of different types, including verbose natural language queries. [sent-176, score-0.624]

50 While there is an overlap between query refinement and annotation, the focus of the latter is on providing linguistic information about existing queries (after initial refinement has been performed). [sent-177, score-0.962]

51 Similarly to this work in NLP, we demonstrate that a joint approach for modeling the linguistic query structure can also be beneficial for IR applications. [sent-186, score-0.667]

52 1 Experimental Setup For evaluating the performance of our query annotation methods, we use a random sample of 250 queries2 from a search log. [sent-188, score-0.851]

53 ing a verb, and 61 short keyword queries (Figure 1 contains a single example of each of these types). [sent-198, score-0.466]

54 In order to test the effectiveness of the joint query annotation, we compare four methods. [sent-199, score-0.67]

55 j-QRY uses only the annotations performed by iQRY (3 initial independent annotation estimates), while j-PRF combines the annotations performed by i-QRY with the annotations performed by i-PRF (6 initial annotation estimates). [sent-207, score-1.236]

56 The first measure is classification-oriented treating the annotation decision for each query term as a classification. [sent-216, score-0.777]

57 In case of capitalization and segmentation annotations these decisions are binary and we compute the precision and recall metrics, and report F1 their harmonic mean. [sent-217, score-0.605]

58 Accordingly, we report the mean of classification accuracies per query (MQA). [sent-221, score-0.532]

59 Formally, MQA is computed as PiN=1accQi PN, where accQi is the classification accuracy for query Qi, and N is the number of queries. [sent-222, score-0.532]

60 2, we discuss the general performance of the four annotation techniques, and compare the effectiveness of independent and joint annotations. [sent-225, score-0.477]

61 3, we analyze the performance of the independent and joint annotation methods by query type. [sent-227, score-0.98]

62 4, we compare the difficulty of performing query annotations for different query types. [sent-229, score-1.3]

63 5, we compare the effectiveness of the proposed joint annotation for query segmentation with the existing query segmentation methods. [sent-231, score-1.805]

64 2 General Evaluation Table 1 shows the summary of the performance of the two independent and two joint annotation methods for the entire set of 250 queries. [sent-233, score-0.448]

65 These results attest to both the importance of doing a joint optimization over the entire set of annotations and to the robustness of the initial annotations done by the i-PRF method. [sent-281, score-0.549]

66 In all but one case, the j-PRF method, which uses these annotations as features, outperforms the j-QRY method that only uses the annotation done by i-QRY. [sent-282, score-0.45]

67 The most significant improvements as a result of joint annotation are observed for the segmentation task. [sent-283, score-0.533]

68 We also note that, in case of segmentation, the differences in performance between the two joint annotation methods, j-QRY and j-PRF, are not significant, indicating that the context of additional annotations in j-QRY makes up for the lack of more robust pseudo-relevance feedback based features. [sent-286, score-0.609]

69 3 Evaluation by Query Type Table 2 presents a detailed analysis of the performance of the best independent (i-PRF) and joint (jPRF) annotation methods by the three query types used for evaluation: verbal phrases, questions and keyword queries. [sent-291, score-1.238]

70 From the analysis in Table 2, we note that the contribution of joint annotation varies significantly across query types. [sent-292, score-0.886]

71 Table 2 also demonstrates that joint annotation has a different impact on various annotations for the same query type. [sent-295, score-1.121]

72 For instance, j-PRF has a significant positive effect on capitalization and segmentation for keyword queries, but only marginally improves the POS tagging. [sent-296, score-0.556]

73 While dependence between the annotations plays an important role for question and keyword queries, which often share a common grammatical structure, this dependence is less use- ful for verbal phrases, which have a more diverse linguistic structure. [sent-300, score-0.489]

74 Accordingly, a more in-depth investigation of the linguistic structure of the verbal phrase queries is an interesting direction for future work. [sent-301, score-0.412]

75 Figure 3 shows a plot that contrasts the relative performance for these three query types of our best-performing joint annotation method, j-PRF, on capitalization, POS tagging and segmentation annotation tasks. [sent-304, score-1.39]

76 The performance for keyword queries is much higher with improvement over 20% compared to either of the other two types. [sent-307, score-0.489]

77 We attribute this increase to both a larger number of positive examples in the short keyword queries (a higher percentage of terms in keyword queries is capitalized) and their simpler syntactic structure (ad— 109 jS -SE PEG RG-F210 F0. [sent-308, score-0.932]

78 For the segmentation task, the performance is at its best for the question and keyword queries, and at its worst (with a drop of 11%) for the verbal phrases. [sent-319, score-0.434]

79 We hypothesize that this is due to the fact that question queries and keyword queries tend to have repetitive structures, while the grammatical structure for verbose queries is much more diverse. [sent-320, score-1.181]

80 For question queries the performance is the best (6% increase over the keyword queries), since they resemble sentences encountered in traditional corpora. [sent-322, score-0.489]

81 It is important to note that the results reported in Figure 3 are based on training the joint annotation model on all available queries with 10-fold crossvalidation. [sent-323, score-0.664]

82 We might get different profiles if a separate annotation model was trained for each query type. [sent-324, score-0.801]

83 We leave the investigation of separate training of joint annotation models by query type to future work. [sent-326, score-0.886]

84 5 Additional Comparisons In order to further evaluate the proposed joint annotation method, j-PRF, in this section we compare its performance to other query annotation methods previously reported in the literature. [sent-328, score-1.154]

85 Unfortunately, there is not much published work on query capitalization and query POS tagging that goes beyond the simple query-based methods described in Section 4. [sent-329, score-1.342]

86 The published work on the more advanced methods usually requires access to large amounts of proprietary user data such as query logs and clicks (Barr et al. [sent-331, score-0.532]

87 Therefore, in this section we focus on recent work on query segmentation (Bergsma and Wang, 2007; Hagen et al. [sent-335, score-0.711]

88 We compare the segmentation effectiveness of our best performing method, j-PRF, to that of these query segmentation methods. [sent-337, score-0.95]

89 It is currently the most effective publicly disclosed unsupervised query segmentation method. [sent-340, score-0.711]

90 The optimal segmentation for query Q, SQ∗, is then obtained using SQ∗= arSg∈mSQaxs∈SX,|s|>1|s||s|count(s), where SQ is the set of all possible query segmentations, S S is a possible segmentation, s is a segment in S, and count(s) is the frequency of s in the web n-gram corpus. [sent-342, score-1.284]

91 SEG-2 employs a large set of features, and is pre-trained on the query collection described by Bergsma and Wang (2007). [sent-344, score-0.532]

92 (2009), and include, among others, n-gram frequencies in a sample of a query log, web corpus and Wikipedia titles. [sent-346, score-0.573]

93 This result demonstrates that the segmentation produced by the j-PRF method is as effective as 110 the segmentation produced by the current supervised state-of-the-art segmentation methods, which employ external data sources and high-order n-grams. [sent-353, score-0.567]

94 The benefit of the j-PRF method compared to the SEG-2 method, is that, simultaneously with the segmentation, it produces several additional query annotations (in this case, capitalization and POS tagging), eliminating the need to construct separate sequence classifiers for each annotation. [sent-354, score-0.958]

95 7 Conclusions In this paper, we have investigated a joint approach for annotating search queries with linguistic structures, including capitalization, POS tags and segmentation. [sent-355, score-0.529]

96 To this end, we proposed a probabilistic approach for performing joint query annotation that takes into account the dependencies that exist between the different annotation types. [sent-356, score-1.162]

97 Our experimental findings over a range of queries from a web search log unequivocally point to the superiority of the joint annotation methods over both query-based and pseudo-relevance feedback based independent annotation methods. [sent-357, score-1.072]

98 We are encouraged by the success of our joint query annotation technique, and intend to pursue the investigation of its utility for IR applications. [sent-359, score-0.886]

99 In the future, we intend to research the use of joint query annotations for additional IR tasks, e. [sent-360, score-0.846]

100 Unsupervised query segmentation using generative language models and Wikipedia. [sent-522, score-0.711]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('query', 0.532), ('zq', 0.348), ('queries', 0.31), ('annotation', 0.245), ('capitalization', 0.221), ('annotations', 0.205), ('bendersky', 0.197), ('segmentation', 0.179), ('keyword', 0.156), ('guo', 0.145), ('joint', 0.109), ('barr', 0.092), ('hagen', 0.084), ('bergsma', 0.077), ('verbal', 0.076), ('pos', 0.076), ('independent', 0.071), ('mqa', 0.069), ('verbose', 0.069), ('jones', 0.058), ('tagging', 0.057), ('accordingly', 0.055), ('sigir', 0.052), ('ir', 0.052), ('argzmqaxp', 0.051), ('derby', 0.051), ('kentucky', 0.051), ('prf', 0.051), ('termcaptagseg', 0.051), ('tzu', 0.051), ('search', 0.051), ('kumaran', 0.05), ('crf', 0.047), ('manshadi', 0.045), ('stacked', 0.045), ('bruce', 0.043), ('fuchun', 0.042), ('web', 0.041), ('croft', 0.04), ('sq', 0.037), ('peng', 0.037), ('amherst', 0.036), ('rosie', 0.036), ('stopword', 0.036), ('balasubramanian', 0.034), ('bemike', 0.034), ('jprf', 0.034), ('kindred', 0.034), ('lavrenko', 0.034), ('namedentity', 0.034), ('qry', 0.034), ('statistically', 0.034), ('parentheses', 0.033), ('brants', 0.033), ('annotating', 0.033), ('capitalized', 0.032), ('refinement', 0.032), ('performing', 0.031), ('phrases', 0.031), ('estimator', 0.03), ('giridhar', 0.03), ('ibe', 0.03), ('anx', 0.03), ('hawaiian', 0.03), ('jiafeng', 0.03), ('potthast', 0.03), ('xueqi', 0.03), ('initial', 0.03), ('demonstrates', 0.03), ('effectiveness', 0.029), ('attained', 0.028), ('allan', 0.028), ('qi', 0.028), ('martins', 0.027), ('health', 0.027), ('differences', 0.027), ('wang', 0.027), ('relevance', 0.026), ('falls', 0.026), ('seg', 0.026), ('cap', 0.026), ('matthias', 0.026), ('fain', 0.026), ('linguistic', 0.026), ('brevity', 0.026), ('questions', 0.026), ('grammatical', 0.026), ('li', 0.026), ('tan', 0.026), ('retrieval', 0.025), ('shih', 0.025), ('benno', 0.025), ('stein', 0.025), ('finkel', 0.025), ('idiosyncratic', 0.024), ('benoit', 0.024), ('profiles', 0.024), ('estimate', 0.024), ('performance', 0.023), ('toutanova', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000011 182 acl-2011-Joint Annotation of Search Queries

Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith

Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

2 0.36780092 256 acl-2011-Query Weighting for Ranking Model Adaptation

Author: Peng Cai ; Wei Gao ; Aoying Zhou ; Kam-Fai Wong

Abstract: We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. We present two query weighting schemes. The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods.

3 0.34948391 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

Author: Joseph Reisinger ; Marius Pasca

Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.

4 0.33768126 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

Author: Bo Pang ; Ravi Kumar

Abstract: Web search is an information-seeking activity. Often times, this amounts to a user seeking answers to a question. However, queries, which encode user’s information need, are typically not expressed as full-length natural language sentences in particular, as questions. Rather, they consist of one or more text fragments. As humans become more searchengine-savvy, do natural-language questions still have a role to play in web search? Through a systematic, large-scale study, we find to our surprise that as time goes by, web users are more likely to use questions to express their search intent. —

5 0.31710052 258 acl-2011-Ranking Class Labels Using Query Sessions

Author: Marius Pasca

Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.

6 0.29038769 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

7 0.17413144 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

8 0.15019222 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search

9 0.1381889 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

10 0.13739912 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

11 0.13165516 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity

12 0.12680632 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

13 0.12310551 333 acl-2011-Web-Scale Features for Full-Scale Parsing

14 0.10946803 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

15 0.10685823 135 acl-2011-Faster and Smaller N-Gram Language Models

16 0.1061725 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

17 0.10417107 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

18 0.10208789 11 acl-2011-A Fast and Accurate Method for Approximate String Search

19 0.086491123 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

20 0.086066984 238 acl-2011-P11-2093 k2opt.pdf


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.21), (1, 0.08), (2, -0.137), (3, 0.071), (4, -0.204), (5, -0.331), (6, -0.073), (7, -0.34), (8, 0.227), (9, 0.001), (10, 0.177), (11, 0.034), (12, -0.088), (13, -0.002), (14, -0.015), (15, -0.031), (16, -0.008), (17, 0.016), (18, -0.009), (19, 0.095), (20, -0.052), (21, 0.036), (22, -0.074), (23, 0.006), (24, 0.04), (25, 0.124), (26, 0.007), (27, -0.003), (28, -0.018), (29, 0.002), (30, 0.009), (31, -0.021), (32, 0.002), (33, -0.03), (34, -0.06), (35, 0.029), (36, 0.045), (37, 0.061), (38, -0.057), (39, -0.047), (40, 0.045), (41, 0.045), (42, -0.05), (43, -0.063), (44, -0.085), (45, 0.016), (46, -0.03), (47, -0.046), (48, -0.036), (49, -0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98513108 182 acl-2011-Joint Annotation of Search Queries

Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith

Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

2 0.89762217 258 acl-2011-Ranking Class Labels Using Query Sessions

Author: Marius Pasca

Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.

3 0.86732644 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

Author: Joseph Reisinger ; Marius Pasca

Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.

4 0.86158186 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

Author: Bo Pang ; Ravi Kumar

Abstract: Web search is an information-seeking activity. Often times, this amounts to a user seeking answers to a question. However, queries, which encode user’s information need, are typically not expressed as full-length natural language sentences in particular, as questions. Rather, they consist of one or more text fragments. As humans become more searchengine-savvy, do natural-language questions still have a role to play in web search? Through a systematic, large-scale study, we find to our surprise that as time goes by, web users are more likely to use questions to express their search intent. —

5 0.82185811 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

Author: Patrick Pantel ; Ariel Fuxman

Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.

6 0.80184114 256 acl-2011-Query Weighting for Ranking Model Adaptation

7 0.73883271 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search

8 0.58904564 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

9 0.54954106 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity

10 0.54144412 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

11 0.50034666 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

12 0.47223243 135 acl-2011-Faster and Smaller N-Gram Language Models

13 0.46009105 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

14 0.40458056 11 acl-2011-A Fast and Accurate Method for Approximate String Search

15 0.38742676 333 acl-2011-Web-Scale Features for Full-Scale Parsing

16 0.38133249 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA

17 0.33060148 215 acl-2011-MACAON An NLP Tool Suite for Processing Word Lattices

18 0.3274653 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

19 0.32598278 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

20 0.32542625 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.025), (17, 0.037), (18, 0.202), (26, 0.08), (37, 0.093), (39, 0.119), (41, 0.054), (55, 0.028), (59, 0.035), (72, 0.048), (91, 0.034), (96, 0.136), (97, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8336128 182 acl-2011-Joint Annotation of Search Queries

Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith

Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

2 0.82493544 121 acl-2011-Event Discovery in Social Media Feeds

Author: Edward Benson ; Aria Haghighi ; Regina Barzilay

Abstract: We present a novel method for record extraction from social streams such as Twitter. Unlike typical extraction setups, these environments are characterized by short, one sentence messages with heavily colloquial speech. To further complicate matters, individual messages may not express the full relation to be uncovered, as is often assumed in extraction tasks. We develop a graphical model that addresses these problems by learning a latent set of records and a record-message alignment simultaneously; the output of our model is a set of canonical records, the values of which are consistent with aligned messages. We demonstrate that our approach is able to accurately induce event records from Twitter messages, evaluated against events from a local city guide. Our method achieves significant error reduction over baseline methods.1

3 0.75442821 162 acl-2011-Identifying the Semantic Orientation of Foreign Words

Author: Ahmed Hassan ; Amjad AbuJbara ; Rahul Jha ; Dragomir Radev

Abstract: We present a method for identifying the positive or negative semantic orientation of foreign words. Identifying the semantic orientation of words has numerous applications in the areas of text classification, analysis of product review, analysis of responses to surveys, and mining online discussions. Identifying the semantic orientation of English words has been extensively studied in literature. Most of this work assumes the existence of resources (e.g. Wordnet, seeds, etc) that do not exist in foreign languages. In this work, we describe a method based on constructing a multilingual network connecting English and foreign words. We use this network to identify the semantic orientation of foreign words based on connection between words in the same language as well as multilingual connections. The method is experimentally tested using a manually labeled set of positive and negative words and has shown very promising results.

4 0.74059135 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico

Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.

5 0.71325123 97 acl-2011-Discovering Sociolinguistic Associations with Structured Sparsity

Author: Jacob Eisenstein ; Noah A. Smith ; Eric P. Xing

Abstract: We present a method to discover robust and interpretable sociolinguistic associations from raw geotagged text data. Using aggregate demographic statistics about the authors’ geographic communities, we solve a multi-output regression problem between demographics and lexical frequencies. By imposing a composite ‘1,∞ regularizer, we obtain structured sparsity, driving entire rows of coefficients to zero. We perform two regression studies. First, we use term frequencies to predict demographic attributes; our method identifies a compact set of words that are strongly associated with author demographics. Next, we conjoin demographic attributes into features, which we use to predict term frequencies. The composite regularizer identifies a small number of features, which correspond to communities of authors united by shared demographic and linguistic properties.

6 0.70823503 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

7 0.70630908 27 acl-2011-A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging

8 0.70577288 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

9 0.70318699 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation

10 0.69793785 192 acl-2011-Language-Independent Parsing with Empty Elements

11 0.6969496 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

12 0.69617391 300 acl-2011-The Surprising Variance in Shortest-Derivation Parsing

13 0.69177264 242 acl-2011-Part-of-Speech Tagging for Twitter: Annotation, Features, and Experiments

14 0.68995833 258 acl-2011-Ranking Class Labels Using Query Sessions

15 0.68951809 333 acl-2011-Web-Scale Features for Full-Scale Parsing

16 0.68885809 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

17 0.68823969 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

18 0.68544644 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

19 0.68500113 178 acl-2011-Interactive Topic Modeling

20 0.68461931 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition