acl acl2011 acl2011-181 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Patrick Pantel ; Ariel Fuxman
Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. [sent-2, score-0.454]
2 Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. [sent-3, score-2.481]
3 Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. [sent-4, score-0.448]
4 A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. [sent-5, score-0.805]
5 The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. [sent-6, score-0.993]
6 Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision. [sent-7, score-0.468]
7 1 Introduction Commercial search engines use query associations in a variety of ways, including the recommendation of related queries in Bing, ‘something different’ in Google, and ‘also try’ and related concepts in Yahoo. [sent-8, score-1.017]
8 Mining techniques to extract such query associations generally fall into four categories: (a) clustering queries by their co-clicked url patterns (Wen et al. [sent-9, score-0.693]
9 , 2004); (b) leveraging co-occurrences of sequential queries in web search 83 Ariel Fuxman Microsoft Research Mountain View, CA, USA arie l f@mi cro s o ft . [sent-11, score-0.413]
10 com query sessions (Zhang and Nasraoui, 2006; Boldi et al. [sent-12, score-0.356]
11 , 2009); (c) pattern-based extraction over lexicosyntactic structures of individual queries (Pa ¸sca and Durme, 2008; Jain and Pantel, 2009); and (d) distributional similarity techniques over news or web corpora (Agirre et al. [sent-13, score-0.323]
12 Or for the query string “ice fishing”, we aim to recommend products in a commercial catalog, such as jigs or lures. [sent-21, score-0.529]
13 By knowing the entity identifiers associated with a query (instead of strings), one can greatly improve both the presentation of search results as well as the click-through experience. [sent-23, score-0.537]
14 Not only can we present the product name to the web user, but we can also display the image, price, and reviews associated with the entity identifier. [sent-25, score-0.298]
15 Once the entity is clicked, instead of issuing a simple web search query, we can now directly show a product page for the exact product; or we can even perform actions directly on the entity, such as buying the entity on Amazon. [sent-26, score-0.517]
16 In this paper, we define the association between a query string q and an entity id e as the probability that e is relevant given the query q, P(e|q). [sent-31, score-0.801]
17 l r Feolel-vance as the likelihood that a user would click on e given q, events which can be observed in large query-click graphs. [sent-34, score-0.486]
18 Due to the extreme sparsity of query click graphs (Baeza-Yates, 2004), we propose several smoothing models that extend the click graph with query synonyms and then use the syn- onym click probabilities as a background model. [sent-35, score-2.198]
19 Queries in session logs are annotated using our association probabilities and recommendations are obtained by modeling sessionlevel query-product co-occurrences in the annotated sessions. [sent-38, score-0.269]
20 Finally, we demonstrate that our models affect 9% of general web queries with 94% recommendation precision. [sent-39, score-0.507]
21 2 Related Work We introduce a novel application of significant commercial value: entity recommendations for general Web queries. [sent-40, score-0.281]
22 This is different from the vast body of work on query suggestions (Baeza-Yates et al. [sent-41, score-0.318]
23 , 2011), because our suggestions are actual entities (as opposed to queries or documents). [sent-45, score-0.353]
24 , 2001), including successful commercial systems such as the Amazon product recommendation system (Linden et al. [sent-47, score-0.359]
25 , actual movie rentals or product purchases), whereas our recommendation system leverages a different re84 source, namely query sessions. [sent-56, score-0.603]
26 , 2007) as a mechanism for associating queries to entities. [sent-58, score-0.31]
27 For example, if we type the query “canon eos digital camera” on a commerce search engine such as Bing Shopping or Google Products, we get a listing of digital camera entities that satisfy our query. [sent-59, score-0.62]
28 However, vertical search engines are essentially rankers that given a query, return a sorted list of (pointers to) entities that are related to the query. [sent-60, score-0.348]
29 Initial proposals were based on learning global smoothing models, where the smoothing of a word would be independent of the document that the word belongs to (Zhai and Lafferty, 2001). [sent-66, score-0.29]
30 Intuitively, the smoothing of a word in a document is influenced by the smoothing of the word in similar documents. [sent-71, score-0.29]
31 Given a search query q, our task is to compute P(e|q), the probability that an entity e is relevant to q, (feo|rq a)l,l t e ∈ Erob. [sent-77, score-0.573]
32 (2004), we model relevance as the click probability of an entity given a query, which we can observe from click logs of vertical search engines, i. [sent-81, score-1.221]
33 , domain-specific search engines such as the product search engine at Amazon, the local search engine at Yelp, or the travel search engine at Bing Travel. [sent-83, score-0.73]
34 Clicked results in a vertical search engine are edges between queries and entities e in the vertical’s knowledge base. [sent-84, score-0.721]
35 General search query click logs, which capture direct user intent signals, have shown significant improvements when used for web search ranking (Agichtein et al. [sent-85, score-1.002]
36 Unlike for general search engines, vertical search engines have typically much less traffic resulting in extremely sparse click logs. [sent-87, score-0.736]
37 In this section, we define a graph structure for recording click information and we propose several models for estimating P(e|q) using the graph. [sent-88, score-0.534]
38 Each edge in Cu and Ce represents the number of clicks observed between query-url pairs and query-entity pairs, respectively. [sent-91, score-0.264]
39 Let wu(q, u) be the click weight of the edges in Cu, and we(q, e) be the click weight of the edges in Ce. [sent-92, score-0.982]
40 If Ce is very large, then we can model the association probability, P(e|q), as the maximum likelihood teisotinm partoiobnab (MLE) o(ef| observing mclaixckims on e given otohed query q: Pˆmle(e|q) = Pe0w∈Ee(wq,ee()q,e0) (3. [sent-93, score-0.318]
41 1) 85 Figure 1 illustrates an example query entity graph linking general web queries to entities in a large commercial product catalog. [sent-94, score-1.143]
42 Figure 1a illustrates eight queries in Q with their observed clicks (solid lines) with products in E1. [sent-95, score-0.546]
43 The sparsity of the graph comes in two forms: a) there are many queries for which an entity is relevant that will never be seen in the click logs (e. [sent-101, score-1.053]
44 , “panfish jig” in Figure 1a); and b) the query-click distribution is Zipfian and most observed edges will have very low click counts yielding unreliable statistics. [sent-103, score-0.541]
45 In the following subsections, we present a method to expand QEC with unseen queries that are associated with entities in E. [sent-104, score-0.353]
46 Then we propose smoothing methods for leveraging a background model over the expanded click graph. [sent-105, score-0.589]
47 (2009), we address the sparsity of edges in Ce by inferring new edges through traversing the query-url click subgraph, UC(Q ∪ U, Cu), wgh tihche cqouentrayi-nusr many more edges UthCa(nQ Ce. [sent-109, score-0.718]
48 If two queries qi and qj are synonyms or near synonyms2, then we expect their click patterns to be similar. [sent-110, score-0.808]
49 2A query qi is a near synonym of a query qj if most relevant results of qi are also relevant to qj . [sent-113, score-1.022]
50 2); (c) Zoom on edges in Ce where solid lines indicate observed clicks with weight we (q, e) and dotted lines indicate inferred clicks with smoothed weight wˆ e (q, e) (Section 3. [sent-119, score-0.477]
51 3); and (d) A temporal sequence of queries in a search session illustrating entity associations propagating from the QEC graph pmi(q,u) to the queries in the session (Section 4). [sent-120, score-1.131]
52 Figure 1b illustrates similarity edges created between query “ice and both “power and “d rock”. [sent-129, score-0.413]
53 wˆe(q, e) represents the weight of our background model, which can be viewed as smoothed click counts, and are obtained Label UNIF MLE HYBR INTU INTP Model Pˆunif (e|q) Pˆmle (e|q) Pˆhybr (e|q) Pˆintu (e|q) Pˆintp (e|q) × Reference Eq. [sent-146, score-0.444]
54 by propagating clicks to unseen edges using the synonymy model as follows: wˆe(q, e) = Pq0∈Q s(Nqs,qq0) Pˆmle(e|q0) (3. [sent-157, score-0.312]
55 Basic Interpolation: This smoothing model, linearly combines our foreground and background mlinoedaerllsy using a mesod oeulr parameter α: Pˆintu(e|q), Pˆintu(e|q)=αPˆmle(e|q)+(1−α)Pˆbsim(e|q) (3. [sent-163, score-0.262]
56 7) In practice, we bucket the observed click parameter, we(q, e), into eleven buckets: {1-click, 2-clicks, . [sent-167, score-0.497]
57 8) In the following sectionP, we apply these models to the task of extracting product recommendations for general web search queries. [sent-175, score-0.337]
58 Many systems extract recommendations by mining temporal query chains from search sessions and clickthrough patterns (Zhang and Nasraoui, 2006). [sent-178, score-0.524]
59 Specifically, our universe of entities, E, consists of the entities in a large commercial product catalog, for which we observe query-click-product clicks, Ce, from the vertical search logs. [sent-183, score-0.473]
60 Our QEC graph is completed by extracting query-click-urls from a search engine’s general search logs, Cu. [sent-184, score-0.28]
61 2 Recommendation Algorithm We hypothesize that if an entity is relevant to a query, then it is relevant to all other queries cooccurring in the same session. [sent-188, score-0.456]
62 Step 1 Query Annotation: For each query q in a session s, we annotate it with a set Eq, consisting of every pair {e, Pˆ(e|q)}, where e ∈ E such that there eexveisryts an edge {q, e} ∈ Ce rweit eh probability Nexoistets st ahant Eq ew {ilql , bee} empty for many queries. [sent-190, score-0.457]
63 Step 2 Session Analysis: We build a queryentity frequency co-occurrence matrix, A, consisting of n|Q| rows and n|E| columns, where each row corresponds to a query and each column to an entity. [sent-192, score-0.318]
64 The final recommendations for a query q are obtained by returning the top-k entities e according to Step 3. [sent-196, score-0.494]
65 1 Datasets We instantiate our models from Sections 3 and 4 using search query logs and a large catalog of products from a commercial search engine. [sent-199, score-0.797]
66 We form our QEC graphs by first collecting in Ce aggregate query-click-entity counts observed over two years in a commerce vertical search engine. [sent-200, score-0.335]
67 Similarly, Cu is formed by collecting aggregate query-click-url counts observed over six months in a web search engine, where each query must have frequency at least 10. [sent-201, score-0.526]
68 , α(10), α(> 10) for Pˆintp, where α(x) is the observed click bucket as described in Section 3. [sent-210, score-0.497]
69 MSE measures against each edge type, which makes it sensitive to the long tail of the click graph. [sent-236, score-0.48]
70 Conversely, MSEW measures against each edge instance, which makes it a good measure against the head of the click graph. [sent-237, score-0.444]
71 , without any graph expansion nor any smoothing against a background FiguSME0r. [sent-244, score-0.33]
72 Buckets scaled by query instance coverage of all queries with 10 or fewer clicks. [sent-247, score-0.573]
73 As expected, for larger observed click counts (>4), all models perform roughly the same, indicating that smoothing is not necessary. [sent-258, score-0.591]
74 However, for low click counts, which in our dataset accounts for over 20% of the overall click instances, we see a large reduction in MSE with outperforming Pˆintu, which in turn outperforms Pˆhybr. [sent-259, score-0.792]
75 The reason it does worse as the observed click count rises is that head queries tend to result in more distinct urls with high-variance clicks, which in turn makes a uniform model susceptible to more error. [sent-261, score-0.777]
76 Figure 3 illustrates that the benefit of the smoothing models is in the tail of the click graph, which supports the larger error reductions seen in MSE in Table 2. [sent-262, score-0.577]
77 We created two samples from the TEST dataset: one randomly sampled by taking click Pˆintp Pˆintp weights into account, and the other sampled uniformly at random. [sent-265, score-0.46]
78 Bold queries resulted from the expansion algorithm in Section 3. [sent-271, score-0.292]
79 We created a Mechanical Turk HIT5 where we show to the Mechanical Turk workers the query and the actual Web page in a Product search engine. [sent-276, score-0.441]
80 from a one-month snapshot of user query sessions at a Web search engine, where session boundaries occur when 60 seconds elapse in between user queries. [sent-317, score-0.617]
81 , the queries for which there is some recommendation) divided by the total number of queries in the log. [sent-322, score-0.51]
82 For our performance metrics, we sampled two sets of queries: (a) Query Set Sample: uniform random sample of 100 queries from the unique queries in the one-month log; and (b) Que ry Bag Sample: weighted random sample, by query frequency, of 100 queries from the query instances in the onemonth log. [sent-323, score-1.512]
83 04%} of faigl u query instances posed by the millions of users of a major search engine, with a precision of 94%. [sent-346, score-0.408]
84 8% of the unique queries, the fact that it covers many head queries such as walmart and iphone accounts for the large query instance coverage. [sent-348, score-0.573]
85 Also since there may be many gen- eral web queries for which there is no appropriate product in the database, a coverage of 100% is not attainable (nor desirable); in fact the upper bound for the coverage is likely to be much lower. [sent-349, score-0.424]
86 First, ambiguity is introduced both by the click model and the graph expansion algorithm of Section 3. [sent-355, score-0.533]
87 In many cases, the ambiguity is resolved by user click patterns (i. [sent-357, score-0.436]
88 , users disambiguate queries through their browsing behavior), but one such error was seen for the query “shark attack videos” where several Shark-branded vacuum cleaners are recommended. [sent-359, score-0.573]
89 This is because of the ambiguous query “shark” that is found in the ourPˆintp Pˆintp Pˆmle. [sent-360, score-0.318]
90 Pˆintp click logs and in query sessions co-occurring with the query “shark attack videos”. [sent-361, score-1.17]
91 The second source of errors is caused by systematic user errors commonly found in session logs such as a user accidentally submitting a query while typing. [sent-362, score-0.589]
92 An example session is: {“speedo”, “speedometer”} where the intseensdsieodn s iess:s {io“nsp was just tsphee esdecoomnedte query ahnerde th thee unintended first query is associated with products such as Speedo swimsuits. [sent-363, score-0.802]
93 This ultimately causes our system to recommend various swimsuits for the query “speedometer”. [sent-364, score-0.318]
94 6 Conclusion Learning associations between web queries and entities has many possible applications, including query-entity recommendation, personalization by associating entity vectors to users, and direct advertising. [sent-365, score-0.725]
95 Although many techniques have been developed for associating queries to queries or queries to documents, to the best of our knowledge this is the first that aims to associate queries to entities by leveraging click graphs from both general search logs and vertical search logs. [sent-366, score-2.003]
96 The sparsity of query entity graphs is addressed by first expanding the graph with query synonyms, and then smoothing query-entity click counts over these unseen queries. [sent-368, score-1.487]
97 Our best performing model, which interpolates between a foreground click model and a smoothed background model, significantly reduces testing error when compared against a strong baseline, by 18%. [sent-369, score-0.513]
98 We applied our best performing model to the task of query-entity recommendation, by analyzing session co-occurrences between queries and annotated entities. [sent-371, score-0.346]
99 Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our topperforming model affects 9% of general web queries with 94% precision. [sent-372, score-0.468]
100 Weakly-supervised acquisition of open-domain classes and class attributes from web documents and query logs. [sent-525, score-0.386]
wordName wordTfidf (topN-words)
[('click', 0.396), ('query', 0.318), ('queries', 0.255), ('intp', 0.234), ('qec', 0.234), ('recommendation', 0.184), ('mle', 0.17), ('clicks', 0.166), ('smoothing', 0.145), ('mse', 0.143), ('intu', 0.14), ('entity', 0.129), ('associations', 0.12), ('vertical', 0.11), ('msew', 0.109), ('product', 0.101), ('graph', 0.1), ('logs', 0.1), ('entities', 0.098), ('edges', 0.095), ('unif', 0.094), ('ce', 0.091), ('session', 0.091), ('qj', 0.09), ('search', 0.09), ('cu', 0.079), ('recommendations', 0.078), ('auger', 0.078), ('pantel', 0.076), ('products', 0.075), ('commercial', 0.074), ('engine', 0.073), ('foreground', 0.069), ('web', 0.068), ('qi', 0.067), ('bsim', 0.062), ('fuxman', 0.062), ('hybr', 0.062), ('jigs', 0.062), ('mei', 0.057), ('associating', 0.055), ('craswell', 0.055), ('synonymy', 0.051), ('bucket', 0.051), ('engines', 0.05), ('observed', 0.05), ('catalog', 0.05), ('edge', 0.048), ('background', 0.048), ('ice', 0.048), ('sample', 0.047), ('sigir', 0.047), ('aqe', 0.047), ('jagabathula', 0.047), ('kurland', 0.047), ('nasraoui', 0.047), ('queryproduct', 0.047), ('sarwar', 0.047), ('shark', 0.047), ('jelinek', 0.046), ('graphs', 0.044), ('urls', 0.044), ('interpolation', 0.042), ('commerce', 0.041), ('boldi', 0.041), ('linden', 0.041), ('uq', 0.041), ('bell', 0.041), ('user', 0.04), ('witten', 0.039), ('sessions', 0.038), ('estimating', 0.038), ('kneser', 0.038), ('wen', 0.038), ('sparsity', 0.037), ('expansion', 0.037), ('tail', 0.036), ('nie', 0.036), ('ponte', 0.036), ('relevant', 0.036), ('clicked', 0.034), ('workers', 0.033), ('movies', 0.032), ('uniform', 0.032), ('patrick', 0.032), ('sampled', 0.032), ('borderline', 0.031), ('honeypots', 0.031), ('milan', 0.031), ('mlep', 0.031), ('netflix', 0.031), ('panfish', 0.031), ('ronaldinho', 0.031), ('speedo', 0.031), ('speedometer', 0.031), ('strawman', 0.031), ('szummer', 0.031), ('tao', 0.031), ('agichtein', 0.031), ('katz', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999911 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities
Author: Patrick Pantel ; Ariel Fuxman
Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.
2 0.33885074 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes
Author: Bo Pang ; Ravi Kumar
Abstract: Web search is an information-seeking activity. Often times, this amounts to a user seeking answers to a question. However, queries, which encode user’s information need, are typically not expressed as full-length natural language sentences in particular, as questions. Rather, they consist of one or more text fragments. As humans become more searchengine-savvy, do natural-language questions still have a role to play in web search? Through a systematic, large-scale study, we find to our surprise that as time goes by, web users are more likely to use questions to express their search intent. —
3 0.30000111 248 acl-2011-Predicting Clicks in a Vocabulary Learning System
Author: Aaron Michelony
Abstract: We consider the problem of predicting which words a student will click in a vocabulary learning system. Often a language learner will find value in the ability to look up the meaning of an unknown word while reading an electronic document by clicking the word. Highlighting words likely to be unknown to a readeris attractive due to drawing his orher attention to it and indicating that information is available. However, this option is usually done manually in vocabulary systems and online encyclopedias such as Wikipedia. Furthurmore, it is never on a per-user basis. This paper presents an automated way of highlighting words likely to be unknown to the specific user. We present related work in search engine ranking, a description of the study used to collect click data, the experiment we performed using the random forest machine learning algorithm and finish with a discussion of future work.
4 0.29038769 182 acl-2011-Joint Annotation of Search Queries
Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith
Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.
5 0.25932163 258 acl-2011-Ranking Class Labels Using Query Sessions
Author: Marius Pasca
Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.
6 0.24860001 256 acl-2011-Query Weighting for Ranking Model Adaptation
7 0.23683076 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
8 0.14052349 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
9 0.13993464 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
10 0.13852355 169 acl-2011-Improving Question Recommendation by Exploiting Information Need
11 0.12523954 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search
12 0.11994845 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
13 0.11830762 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
14 0.10730682 12 acl-2011-A Generative Entity-Mention Model for Linking Entities with Knowledge Base
15 0.1064963 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content
16 0.10119464 333 acl-2011-Web-Scale Features for Full-Scale Parsing
17 0.098234303 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
18 0.089511171 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
19 0.085218728 117 acl-2011-Entity Set Expansion using Topic information
20 0.084616475 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity
topicId topicWeight
[(0, 0.191), (1, 0.096), (2, -0.148), (3, 0.114), (4, -0.133), (5, -0.295), (6, -0.118), (7, -0.313), (8, 0.117), (9, -0.044), (10, 0.166), (11, -0.006), (12, -0.056), (13, -0.014), (14, 0.009), (15, -0.018), (16, 0.032), (17, 0.063), (18, -0.075), (19, -0.01), (20, 0.029), (21, 0.021), (22, 0.022), (23, 0.018), (24, -0.006), (25, -0.007), (26, -0.033), (27, 0.033), (28, -0.075), (29, 0.048), (30, -0.016), (31, -0.006), (32, -0.086), (33, 0.008), (34, 0.06), (35, -0.039), (36, 0.009), (37, 0.073), (38, -0.038), (39, -0.042), (40, -0.039), (41, -0.035), (42, 0.056), (43, -0.031), (44, 0.019), (45, -0.004), (46, -0.035), (47, 0.067), (48, 0.061), (49, 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.97678119 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities
Author: Patrick Pantel ; Ariel Fuxman
Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.
2 0.90927607 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes
Author: Bo Pang ; Ravi Kumar
Abstract: Web search is an information-seeking activity. Often times, this amounts to a user seeking answers to a question. However, queries, which encode user’s information need, are typically not expressed as full-length natural language sentences in particular, as questions. Rather, they consist of one or more text fragments. As humans become more searchengine-savvy, do natural-language questions still have a role to play in web search? Through a systematic, large-scale study, we find to our surprise that as time goes by, web users are more likely to use questions to express their search intent. —
3 0.82872796 258 acl-2011-Ranking Class Labels Using Query Sessions
Author: Marius Pasca
Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.
4 0.78857529 182 acl-2011-Joint Annotation of Search Queries
Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith
Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.
5 0.78322023 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
Author: Joseph Reisinger ; Marius Pasca
Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
6 0.74554902 256 acl-2011-Query Weighting for Ranking Model Adaptation
7 0.74137712 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search
8 0.65954602 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
9 0.6427688 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
10 0.60842711 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity
11 0.56831121 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
12 0.51565415 248 acl-2011-Predicting Clicks in a Vocabulary Learning System
13 0.50757289 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search
14 0.47939733 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content
15 0.46621707 135 acl-2011-Faster and Smaller N-Gram Language Models
16 0.45455685 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering
17 0.42246327 11 acl-2011-A Fast and Accurate Method for Approximate String Search
18 0.42170128 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
19 0.37191758 128 acl-2011-Exploring Entity Relations for Named Entity Disambiguation
20 0.34639072 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining
topicId topicWeight
[(5, 0.026), (17, 0.056), (26, 0.078), (37, 0.074), (39, 0.048), (41, 0.043), (55, 0.037), (59, 0.03), (61, 0.016), (70, 0.253), (72, 0.035), (91, 0.04), (96, 0.155), (97, 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.78435439 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities
Author: Patrick Pantel ; Ariel Fuxman
Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.
2 0.69621259 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
Author: Els Lefever ; Veronique Hoste ; Martine De Cock
Abstract: This paper describes a set of exploratory experiments for a multilingual classificationbased approach to Word Sense Disambiguation. Instead of using a predefined monolingual sense-inventory such as WordNet, we use a language-independent framework where the word senses are derived automatically from word alignments on a parallel corpus. We built five classifiers with English as an input language and translations in the five supported languages (viz. French, Dutch, Italian, Spanish and German) as classification output. The feature vectors incorporate both the more traditional local context features, as well as binary bag-of-words features that are extracted from the aligned translations. Our results show that the ParaSense multilingual WSD system shows very competitive results compared to the best systems that were evaluated on the SemEval-2010 Cross-Lingual Word Sense Disambiguation task for all five target languages.
3 0.64264333 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
Author: Alexander M. Rush ; Michael Collins
Abstract: We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97% of test examples; it has comparable speed to state-of-the-art decoders.
4 0.63419896 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
Author: Joseph Reisinger ; Marius Pasca
Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.
5 0.63269013 28 acl-2011-A Statistical Tree Annotator and Its Applications
Author: Xiaoqiang Luo ; Bing Zhao
Abstract: In many natural language applications, there is a need to enrich syntactical parse trees. We present a statistical tree annotator augmenting nodes with additional information. The annotator is generic and can be applied to a variety of applications. We report 3 such applications in this paper: predicting function tags; predicting null elements; and predicting whether a tree constituent is projectable in machine translation. Our function tag prediction system outperforms significantly published results.
6 0.63198477 258 acl-2011-Ranking Class Labels Using Query Sessions
7 0.62778813 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents
8 0.62396574 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
9 0.62223756 193 acl-2011-Language-independent compound splitting with morphological operations
10 0.62142891 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
11 0.62066126 38 acl-2011-An Empirical Investigation of Discounting in Cross-Domain Language Models
12 0.61969531 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
13 0.61796159 133 acl-2011-Extracting Social Power Relationships from Natural Language
14 0.61760342 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment
15 0.61685669 182 acl-2011-Joint Annotation of Search Queries
16 0.61648512 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes
17 0.61640513 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
18 0.61541414 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction
19 0.61442697 178 acl-2011-Interactive Topic Modeling
20 0.61424506 253 acl-2011-PsychoSentiWordNet