emnlp emnlp2013 emnlp2013-97 knowledge-graph by maker-knowledge-mining

97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching


Source: pdf

Author: Ahmed Hassan

Abstract: Web search users frequently modify their queries in hope of receiving better results. This process is referred to as “Query Reformulation”. Previous research has mainly focused on proposing query reformulations in the form of suggested queries for users. Some research has studied the problem of predicting whether the current query is a reformulation of the previous query or not. However, this work has been limited to bag-of-words models where the main signals being used are word overlap, character level edit distance and word level edit distance. In this work, we show that relying solely on surface level text similarity results in many false positives where queries with different intents yet similar topics are mistakenly predicted as query reformulations. We propose a new representation for Web search queries based on identifying the concepts in queries and show that we can sig- nificantly improve query reformulation performance using features of query concepts.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Web search users frequently modify their queries in hope of receiving better results. [sent-2, score-0.422]

2 Previous research has mainly focused on proposing query reformulations in the form of suggested queries for users. [sent-4, score-1.047]

3 Some research has studied the problem of predicting whether the current query is a reformulation of the previous query or not. [sent-5, score-1.659]

4 In this work, we show that relying solely on surface level text similarity results in many false positives where queries with different intents yet similar topics are mistakenly predicted as query reformulations. [sent-7, score-0.955]

5 We propose a new representation for Web search queries based on identifying the concepts in queries and show that we can sig- nificantly improve query reformulation performance using features of query concepts. [sent-8, score-2.517]

6 Oftentimes, users modify their search queries in hope of getting better results. [sent-11, score-0.422]

7 Typical search users have low tolerance to viewing lowly ranked search results and they prefer to reformulate the query rather than wade through result listings (Jansen and Spink, 2006). [sent-12, score-0.843]

8 Previous studies have also shown that 37% of search queries are reformulations to previous queries (Jansen et al. [sent-13, score-0.828]

9 Understanding query reformulation behavior and being able to accurately identify reformulation queries have several benefits. [sent-16, score-1.88]

10 One of these benefits is learning from user behavior to better suggest automatic query refinements or query alterations. [sent-17, score-1.317]

11 Another benefit is using query reformulation predic- tion to identify boundaries between search tasks and hence segmenting user activities into topically coherent units. [sent-18, score-1.339]

12 Also, if we are able to accurately identify query reformulations, then we will be in a better position to evaluate the satisfaction of users with query results. [sent-19, score-1.278]

13 Identifying query reformulation can be very useful for finding cases where the users are not satisfied even after a click on a result that may have seemed relevant given its title and summary but then turned out to be not relevant to the user’s information need. [sent-21, score-1.201]

14 Previous work on query reformulation has either focused on automatic query refinement by the search system, e. [sent-22, score-1.767]

15 , 2008) or on defining taxonomies for query reformulation strategies, e. [sent-26, score-1.151]

16 Other work has proposed solutions for the query reformulation prediction problem or for the similar problem of task boundary identification (Radlinski and Joachims, 2005; Jones and Klinkner, 2008). [sent-29, score-1.111]

17 The two queries are very likely to have been issued by a user who is planning to travel to New York City. [sent-34, score-0.415]

18 Hence, most of the solutions proposed in previous work for this problem will incorrectly assume that the second query is a reformulation of the first due to the high word overlap ratio and the small edit distance. [sent-36, score-1.175]

19 Hence, despite similar in terms of shared terms, the two queries have differ- ent intents and are not reformulations of one another. [sent-39, score-0.502]

20 To this end, we conducted a study where we collected thousands of consecutive queries and trained judges to label them as either reformulations or not. [sent-40, score-0.527]

21 We then built a classifier to identify query reformulation pairs and showed that the proposed classifier outperforms the state-of-the-art methods on identifying query reformulations. [sent-41, score-1.784]

22 2 Related Work There are three areas of work related to the research presented in this paper: (i) query reformulation taxonomies, (ii) automatic query refinement, and (iii) search tasks boundary identification. [sent-43, score-1.775]

23 1 Query Reformulation Taxonomies Existing research has studied how web search engines can propose reformulations, but has given less attention to how people perform query reformulations. [sent-46, score-0.72]

24 Most of the research on manual query re- formulation has focused on building taxonomies of query reformulation. [sent-47, score-1.238]

25 These taxonomies are generally constructed by examining a small set of query logs. [sent-48, score-0.656]

26 (2007) identified 6 different kinds of reformulation states (New, Assistance, Content Change, Generalization, Reformulation, and Specialization) and provided heuristics for identifying them. [sent-51, score-0.543]

27 They also used them to predict when a user is most receptive to automatic query suggestions. [sent-52, score-0.693]

28 (2006) constructed a taxonomy of query re-finding by manually examining query logs, and implemented algorithms to identify repeat queries, equal click queries and overlapping click queries. [sent-62, score-1.556]

29 This line of work is relevant to our work because it studies query reformulation strategies. [sent-67, score-1.077]

30 Our work is different because we build a machine-learned predictive model to identify query reformulation while this line of work mainly focuses on defining taxonomies for reformulation strategies. [sent-68, score-1.673]

31 2 Automatic Query Refinement A close problem that has received most of the research attention in this area is the problem of automatically generating query refinements. [sent-70, score-0.582]

32 These refinements are typically offered as query suggestions to the users or used to alter the user query before submitting it to the search engine. [sent-71, score-1.483]

33 (2008) introduced the concept of the query-flow graph where every query is represented by a node and edges connect queries if it is likely for users to move from one query to another. [sent-73, score-1.566]

34 (2008) used random walks over a bipartite graph of queries and URLs to find query refinements. [sent-75, score-0.863]

35 Query logs were used to suggest query re- finements in (Baeza-Yates et al. [sent-76, score-0.646]

36 Other research has adopted methods based on query expansion (Mitra et al. [sent-79, score-0.582]

37 This line of work is different from our work because it focuses on automatically generating query refinements while this work focuses on identifying cases of manual query reformulations. [sent-82, score-1.277]

38 3 Search Task Boundary Identification The problem of classifying the boundaries of the user search tasks within sessions in web search logs has been widely addressed before. [sent-84, score-0.437]

39 This problem is closely related to the problem of identifying query reformulation. [sent-85, score-0.63]

40 On the other hand, a query reformulation is intended to modify a previous query in hope of getting better results to satisfy the same information need. [sent-89, score-1.708]

41 From these definitions, it is clear how query reformulation and task boundary detection are two sides of the same problem. [sent-90, score-1.111]

42 A query-flow graph represents chains of related queries in query logs. [sent-93, score-0.863]

43 They use this model for finding logical session boundaries and query recommendation. [sent-94, score-0.695]

44 He demonstrated that time interval, search pattern and position of a query in a user session, are effective for shifting to a new topic. [sent-96, score-0.775]

45 Our work is different because it goes beyond the bag of words approach and tries to assess query similarity based on the concepts represented in each query. [sent-108, score-0.779]

46 3 Problem Definition We start by defining some terms that will be used throughout the paper: Definition: Query Reformulation is the act of submitting a query Q2 to modify a previous search query Q1 in hope of retrieving better results to satisfy the same information need. [sent-110, score-1.271]

47 Our objective is to solve the following problem: Given a query Q1, and the following query Q2, predict whether Q2 is reformulation of Q1. [sent-117, score-1.659]

48 4 Approach In this section, we propose methods for predicting whether the current query has been issued by the user to reformulate the previous query. [sent-118, score-0.754]

49 This becomes a frequent problems with queries when users do not observe the correct word boundaries (for example: “southjerseycraigslist” for “south jersey craiglist”) or when users are searching for a part of a URL (for example “quincycollege” for “quincy college”). [sent-125, score-0.472]

50 2 Queries to Concepts Lexical similarity between queries has been often used to identify related queries (Jansen et al. [sent-129, score-0.62]

51 Take the following query pair as an example Q1: weather in new york city and Q2: “hotels in new york city”. [sent-135, score-0.746]

52 Hence, any lexical similarity feature would predict that the user submitted Q2 as a reformulation of Q1. [sent-137, score-0.66]

53 What we would like to do is to have a query representation that recognizes the difference between Q1 and Q2. [sent-138, score-0.582]

54 If we look closely at the two queries, we will notice that in the first query, the user is looking for the “weather”, while in the second query the user is looking for “hotels”. [sent-139, score-0.804]

55 To build such a representation, we start by segmenting each query into phrases. [sent-141, score-0.582]

56 Query segmentation is the process of taking a users search query and dividing the tokens into individual phrases or semantic units (Bergsma and Wang, 2007). [sent-142, score-0.796]

57 Many approaches to query segmentation have been presented in recent research. [sent-143, score-0.628]

58 On the other hand, many unsupervised methods for query segmentation have also been proposed (Hagen et al. [sent-146, score-0.628]

59 We opt for the unsupervised techniques to perform query segmentation. [sent-152, score-0.582]

60 A seg- mentation for a query is obtained by computing the pointwise mutual information score for each pair of consecutive words. [sent-154, score-0.606]

61 no break can be introduced between “hotels” and “in” or “in” and “new York” in the query “hotels in new york city”). [sent-165, score-0.66]

62 In addition to breaking the query into phrases, we were also interested in grouping multi-word keywords together (e. [sent-167, score-0.652]

63 The intuition behind that is that a query containing the keyword “new york” and another containing the keyword “new mexico” should not be awarded because they share the word “new”. [sent-171, score-0.77]

64 For example the query “kodak easyshare recharger chord” consists of a single semantic unit (phrase) and two keywords “Kodak easyshare” and “recharger cord”. [sent-194, score-0.738]

65 This shows that the user had two different intents even though most of the words in the two queries are shared. [sent-199, score-0.429]

66 To capture concept similarity, 1004 we define four different ways of matching concepts ranked from the most to the least strict: • • • • Exact Match: The head and the attributes of the tEwxoa concepts m Thatech he exactly. [sent-202, score-0.447]

67 , sJ}, can dbse, d Qef i=ned {q as: YI XJ P(S|Q) =iY=1jX=1P(si|qj)P(qj|Q) (2) where P(q|Q) is the unigram probability of wwoherdre q Pin( query Q. [sent-229, score-0.582]

68 We used the following features to predict the query reformulation type: • Length (num. [sent-257, score-1.077]

69 of concepts in Q1 but not in Q2 • 1if Q1 contains all Q2s concepts • 1if Q2 contains all Q1s concepts • all concept features above recomputed for keyawllor cdosn cinespteta fdea otufr concepts 1006 5 Experiments and Results 5. [sent-269, score-0.751]

70 1 Data Our data consists of query pairs randomly sampled from the queries submitted to a commercial search engine during a week in mid-2012. [sent-270, score-1.009]

71 Every record in our data consisted of a consecutive query pair (Qi,Qi+1) submitted to the search engine by the same user and in the same session (i. [sent-271, score-0.934]

72 Identical queries were excluded from the data because they are always labeled as reformulation and their label is very easy to predict. [sent-276, score-0.776]

73 All data in the session to which the sampled query pair belongs were recorded. [sent-278, score-0.653]

74 In addition to queries, the data contained a timestamp for each page view, all elements shown in response to that query (e. [sent-279, score-0.582]

75 Additionally, they were also shown queries and clicks before and after the query pair of interest. [sent-288, score-0.912]

76 They were asked to then use their assessment of the user’s objectives to determine whether Qi+1 is a reformulation of Qi. [sent-289, score-0.495]

77 Each query pair was labeled by three judges and the majority vote among judges was used. [sent-290, score-0.658]

78 Because the number of positive instances is much smaller than the number of negative instances, we used all positive instances and an equal number of randomly selected negative instances leaving us with approximately 6000 query pairs. [sent-291, score-0.582]

79 45 Figure Types 2 Distribution of Query Reformulation query is intended to correct spelling mistakes), and Same Intent (second query is intended to express the same intent in a different way). [sent-294, score-1.383]

80 2 Predicting Query Reformulation In this section we describe the experiments we conducted to evaluate the reformulation prediction classifier. [sent-296, score-0.495]

81 We compare the performance of four different systems: • The first one, Heuristic, simply computes the similarity o bneet,w Heeenu two queries as otmhep percentage of common words to the length of the longer query in terms of the number of words. [sent-298, score-0.894]

82 The second query is predicted to be a reformulation of the first if similarity ≥ τsim and the time difference ≤ τtime mmiilnaruitteys. [sent-307, score-1.108]

83 The concept features were able to achieve higher precision rates while not sacrificing recall because they were more effective in eliminating false reformulation cases. [sent-329, score-0.581]

84 The classifier also failed in cases where the keyword extractor and/or the POS tagger failed to cor- rectly parse the queries (e. [sent-337, score-0.519]

85 In many such cases, the query was a non wellformed sequence of words (e. [sent-342, score-0.582]

86 3 Predicting Reformulation Type We conducted another experiment to evaluate the performance of the reformulation type classifier. [sent-348, score-0.495]

87 We performed experiments using the data described earlier where judges were asked to select the type of reformulation for every reformulation query. [sent-349, score-1.028]

88 The figure shows that most popular reformulations types are those where users move to a more specific intent or express the same intent in a different way. [sent-351, score-0.387]

89 Reformulations with spelling suggestions and query generalizations are less popular. [sent-352, score-0.631]

90 6 Conclusions Identifying query reformulations is an interesting and useful application in Information Retrieval. [sent-358, score-0.766]

91 Reformulation identification is useful for automatic query refinements, task boundary identification and satisfaction prediction. [sent-359, score-0.644]

92 Previous work on this problem has adopted a bag-of-words approach where lexical similarity and word overlap are the key features for identifying query reformulation. [sent-360, score-0.693]

93 We proposed a method for identifying concepts in search queries and using them to identify query reformula- tions. [sent-361, score-1.186]

94 The proposed method outperforms previous work because it can better represent the information intent underlying the query and hence can better assess query similarity. [sent-362, score-1.236]

95 We also showed that we can reliably predict the type of the reformulation with high accuracy. [sent-364, score-0.495]

96 Learning lexicon models from search logs for query expansion. [sent-426, score-0.728]

97 Exploring web scale language models for search query processing. [sent-460, score-0.72]

98 Patterns and transitions of query reformulation during web searching. [sent-485, score-1.133]

99 Beyond the session timeout: Automatic hierarchical segmentation of search topics in query logs. [sent-489, score-0.781]

100 Patterns of search: Analyzing and modeling web query refinement. [sent-497, score-0.638]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('query', 0.582), ('reformulation', 0.495), ('queries', 0.281), ('reformulations', 0.184), ('concepts', 0.166), ('user', 0.111), ('jansen', 0.101), ('klinkner', 0.099), ('jones', 0.098), ('keyword', 0.094), ('search', 0.082), ('proceeding', 0.075), ('taxonomies', 0.074), ('hotels', 0.073), ('intent', 0.072), ('session', 0.071), ('keywords', 0.07), ('mins', 0.068), ('edit', 0.066), ('logs', 0.064), ('concept', 0.062), ('users', 0.059), ('radlinski', 0.057), ('web', 0.056), ('head', 0.053), ('weather', 0.052), ('hagen', 0.049), ('boldi', 0.049), ('clicks', 0.049), ('timeout', 0.049), ('spelling', 0.049), ('intended', 0.049), ('failed', 0.048), ('identifying', 0.048), ('segmentation', 0.046), ('match', 0.045), ('bhama', 0.043), ('easyshare', 0.043), ('kodak', 0.043), ('lucchese', 0.043), ('nnx', 0.043), ('recharger', 0.043), ('tommy', 0.043), ('boundaries', 0.042), ('click', 0.042), ('refinements', 0.042), ('engine', 0.041), ('york', 0.039), ('break', 0.039), ('judges', 0.038), ('reformulate', 0.038), ('qj', 0.037), ('intents', 0.037), ('rosie', 0.037), ('xi', 0.036), ('acm', 0.036), ('boundary', 0.034), ('lau', 0.034), ('city', 0.034), ('noun', 0.032), ('overlap', 0.032), ('gao', 0.032), ('similarity', 0.031), ('searching', 0.031), ('qi', 0.03), ('sigir', 0.03), ('bergsma', 0.029), ('anick', 0.028), ('annudm', 0.028), ('cord', 0.028), ('perfume', 0.028), ('spink', 0.028), ('teevan', 0.028), ('tnheu', 0.028), ('satisfaction', 0.028), ('levenshtein', 0.028), ('specification', 0.028), ('boosted', 0.027), ('phrases', 0.027), ('identify', 0.027), ('cikm', 0.027), ('lemma', 0.027), ('refinement', 0.026), ('white', 0.025), ('potthast', 0.025), ('rug', 0.025), ('recomputed', 0.025), ('berger', 0.025), ('ryen', 0.025), ('submits', 0.025), ('submitting', 0.025), ('classifier', 0.025), ('consecutive', 0.024), ('false', 0.024), ('url', 0.023), ('seek', 0.023), ('www', 0.023), ('cases', 0.023), ('submitted', 0.023), ('issued', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching

Author: Ahmed Hassan

Abstract: Web search users frequently modify their queries in hope of receiving better results. This process is referred to as “Query Reformulation”. Previous research has mainly focused on proposing query reformulations in the form of suggested queries for users. Some research has studied the problem of predicting whether the current query is a reformulation of the previous query or not. However, this work has been limited to bag-of-words models where the main signals being used are word overlap, character level edit distance and word level edit distance. In this work, we show that relying solely on surface level text similarity results in many false positives where queries with different intents yet similar topics are mistakenly predicted as query reformulations. We propose a new representation for Web search queries based on identifying the concepts in queries and show that we can sig- nificantly improve query reformulation performance using features of query concepts.

2 0.31328896 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries

Author: Xiao Ding ; Zhicheng Dou ; Bing Qin ; Ting Liu ; Ji-rong Wen

Abstract: Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results. In a structured annotation, the original query is split into different units that are associated with semantic attributes in the corresponding domain. We evaluate our techniques using real world queries and achieve significant improvement. . 1

3 0.16311027 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

Author: Artem Sokokov ; Laura Jehl ; Felix Hieber ; Stefan Riezler

Abstract: We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available.

4 0.14341544 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

Author: Jerome White ; Douglas W. Oard ; Nitendra Rajput ; Marion Zalk

Abstract: Building search engines that can respond to spoken queries with spoken content requires that the system not just be able to find useful responses, but also that it know when it has heard enough about what the user wants to be able to do so. This paper describes a simulation study with queries spoken by non-native speakers that suggests that indicates that finding relevant content is often possible within a half minute, and that combining features based on automatically recognized words with features designed for automated prediction of query difficulty can serve as a useful basis for predicting when that useful content has been found.

5 0.12601726 95 emnlp-2013-Identifying Multiple Userids of the Same Author

Author: Tieyun Qian ; Bing Liu

Abstract: This paper studies the problem of identifying users who use multiple userids to post in social media. Since multiple userids may belong to the same author, it is hard to directly apply supervised learning to solve the problem. This paper proposes a new method, which still uses supervised learning but does not require training documents from the involved userids. Instead, it uses documents from other userids for classifier building. The classifier can be applied to documents of the involved userids. This is possible because we transform the document space to a similarity space and learning is performed in this new space. Our evaluation is done in the online review domain. The experimental results using a large number of userids and their reviews show that the proposed method is highly effective. 1

6 0.097441278 180 emnlp-2013-The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

7 0.091314055 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries

8 0.086009584 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

9 0.080591552 20 emnlp-2013-An Efficient Language Model Using Double-Array Structures

10 0.080077924 24 emnlp-2013-Application of Localized Similarity for Web Documents

11 0.066906556 160 emnlp-2013-Relational Inference for Wikification

12 0.064498447 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

13 0.059048742 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

14 0.058385849 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

15 0.056898437 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems

16 0.053449057 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

17 0.050953828 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge

18 0.050143976 34 emnlp-2013-Automatically Classifying Edit Categories in Wikipedia Revisions

19 0.049401335 85 emnlp-2013-Fast Joint Compression and Summarization via Graph Cuts

20 0.048207555 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.175), (1, 0.043), (2, -0.043), (3, 0.019), (4, -0.021), (5, 0.02), (6, 0.104), (7, 0.223), (8, 0.123), (9, -0.094), (10, -0.082), (11, 0.219), (12, -0.109), (13, -0.117), (14, 0.119), (15, -0.057), (16, -0.156), (17, -0.347), (18, 0.2), (19, 0.03), (20, 0.101), (21, -0.159), (22, -0.197), (23, 0.036), (24, -0.064), (25, 0.007), (26, -0.053), (27, -0.02), (28, 0.05), (29, 0.145), (30, -0.079), (31, 0.006), (32, 0.002), (33, 0.018), (34, 0.005), (35, -0.047), (36, 0.029), (37, 0.026), (38, 0.062), (39, -0.007), (40, -0.019), (41, 0.001), (42, 0.031), (43, 0.005), (44, 0.036), (45, -0.0), (46, -0.036), (47, -0.044), (48, -0.029), (49, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98483855 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching

Author: Ahmed Hassan

Abstract: Web search users frequently modify their queries in hope of receiving better results. This process is referred to as “Query Reformulation”. Previous research has mainly focused on proposing query reformulations in the form of suggested queries for users. Some research has studied the problem of predicting whether the current query is a reformulation of the previous query or not. However, this work has been limited to bag-of-words models where the main signals being used are word overlap, character level edit distance and word level edit distance. In this work, we show that relying solely on surface level text similarity results in many false positives where queries with different intents yet similar topics are mistakenly predicted as query reformulations. We propose a new representation for Web search queries based on identifying the concepts in queries and show that we can sig- nificantly improve query reformulation performance using features of query concepts.

2 0.92918479 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries

Author: Xiao Ding ; Zhicheng Dou ; Bing Qin ; Ting Liu ; Ji-rong Wen

Abstract: Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results. In a structured annotation, the original query is split into different units that are associated with semantic attributes in the corresponding domain. We evaluate our techniques using real world queries and achieve significant improvement. . 1

3 0.71983945 173 emnlp-2013-Simulating Early-Termination Search for Verbose Spoken Queries

Author: Jerome White ; Douglas W. Oard ; Nitendra Rajput ; Marion Zalk

Abstract: Building search engines that can respond to spoken queries with spoken content requires that the system not just be able to find useful responses, but also that it know when it has heard enough about what the user wants to be able to do so. This paper describes a simulation study with queries spoken by non-native speakers that suggests that indicates that finding relevant content is often possible within a half minute, and that combining features based on automatically recognized words with features designed for automated prediction of query difficulty can serve as a useful basis for predicting when that useful content has been found.

4 0.64560431 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

Author: Artem Sokokov ; Laura Jehl ; Felix Hieber ; Stefan Riezler

Abstract: We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available.

5 0.5703724 142 emnlp-2013-Open-Domain Fine-Grained Class Extraction from Web Search Queries

Author: Marius Pasca

Abstract: This paper introduces a method for extracting fine-grained class labels ( “countries with double taxation agreements with india ”) from Web search queries. The class labels are more numerous and more diverse than those produced by current extraction methods. Also extracted are representative sets of instances (singapore, united kingdom) for the class labels.

6 0.52746749 95 emnlp-2013-Identifying Multiple Userids of the Same Author

7 0.39590734 180 emnlp-2013-The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

8 0.36317259 131 emnlp-2013-Mining New Business Opportunities: Identifying Trend related Products by Leveraging Commercial Intents from Microblogs

9 0.33854645 20 emnlp-2013-An Efficient Language Model Using Double-Array Structures

10 0.32645908 24 emnlp-2013-Application of Localized Similarity for Web Documents

11 0.29135296 122 emnlp-2013-Learning to Freestyle: Hip Hop Challenge-Response Induction via Transduction Rule Segmentation

12 0.24880794 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

13 0.24600333 184 emnlp-2013-This Text Has the Scent of Starbucks: A Laplacian Structured Sparsity Model for Computational Branding Analytics

14 0.23729226 200 emnlp-2013-Well-Argued Recommendation: Adaptive Models Based on Words in Recommender Systems

15 0.2171375 160 emnlp-2013-Relational Inference for Wikification

16 0.21653609 141 emnlp-2013-Online Learning for Inexact Hypergraph Search

17 0.20727423 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery

18 0.20727284 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

19 0.20207617 198 emnlp-2013-Using Soft Constraints in Joint Inference for Clinical Concept Recognition

20 0.18930206 23 emnlp-2013-Animacy Detection with Voting Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.045), (18, 0.027), (22, 0.046), (29, 0.345), (30, 0.074), (45, 0.022), (50, 0.018), (51, 0.16), (66, 0.039), (71, 0.034), (75, 0.025), (77, 0.02), (90, 0.018), (96, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.78402603 43 emnlp-2013-Cascading Collective Classification for Bridging Anaphora Recognition using a Rich Linguistic Feature Set

Author: Yufang Hou ; Katja Markert ; Michael Strube

Abstract: Recognizing bridging anaphora is difficult due to the wide variation within the phenomenon, the resulting lack of easily identifiable surface markers and their relative rarity. We develop linguistically motivated discourse structure, lexico-semantic and genericity detection features and integrate these into a cascaded minority preference algorithm that models bridging recognition as a subtask of learning finegrained information status (IS). We substantially improve bridging recognition without impairing performance on other IS classes.

same-paper 2 0.76793778 97 emnlp-2013-Identifying Web Search Query Reformulation using Concept based Matching

Author: Ahmed Hassan

Abstract: Web search users frequently modify their queries in hope of receiving better results. This process is referred to as “Query Reformulation”. Previous research has mainly focused on proposing query reformulations in the form of suggested queries for users. Some research has studied the problem of predicting whether the current query is a reformulation of the previous query or not. However, this work has been limited to bag-of-words models where the main signals being used are word overlap, character level edit distance and word level edit distance. In this work, we show that relying solely on surface level text similarity results in many false positives where queries with different intents yet similar topics are mistakenly predicted as query reformulations. We propose a new representation for Web search queries based on identifying the concepts in queries and show that we can sig- nificantly improve query reformulation performance using features of query concepts.

3 0.76190901 180 emnlp-2013-The Answer is at your Fingertips: Improving Passage Retrieval for Web Question Answering with Search Behavior Data

Author: Mikhail Ageev ; Dmitry Lagun ; Eugene Agichtein

Abstract: Passage retrieval is a crucial first step of automatic Question Answering (QA). While existing passage retrieval algorithms are effective at selecting document passages most similar to the question, or those that contain the expected answer types, they do not take into account which parts of the document the searchers actually found useful. We propose, to the best of our knowledge, the first successful attempt to incorporate searcher examination data into passage retrieval for question answering. Specifically, we exploit detailed examination data, such as mouse cursor movements and scrolling, to infer the parts of the document the searcher found interesting, and then incorporate this signal into passage retrieval for QA. Our extensive experiments and analysis demonstrate that our method significantly improves passage retrieval, compared to using textual features alone. As an additional contribution, we make available to the research community the code and the search behavior data used in this study, with the hope of encouraging further research in this area.

4 0.68897688 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

Author: Artem Sokokov ; Laura Jehl ; Felix Hieber ; Stefan Riezler

Abstract: We present an approach to learning bilingual n-gram correspondences from relevance rankings of English documents for Japanese queries. We show that directly optimizing cross-lingual rankings rivals and complements machine translation-based cross-language information retrieval (CLIR). We propose an efficient boosting algorithm that deals with very large cross-product spaces of word correspondences. We show in an experimental evaluation on patent prior art search that our approach, and in particular a consensus-based combination of boosting and translation-based approaches, yields substantial improvements in CLIR performance. Our training and test data are made publicly available.

5 0.52813405 105 emnlp-2013-Improving Web Search Ranking by Incorporating Structured Annotation of Queries

Author: Xiao Ding ; Zhicheng Dou ; Bing Qin ; Ting Liu ; Ji-rong Wen

Abstract: Web users are increasingly looking for structured data, such as lyrics, job, or recipes, using unstructured queries on the web. However, retrieving relevant results from such data is a challenging problem due to the unstructured language of the web queries. In this paper, we propose a method to improve web search ranking by detecting Structured Annotation of queries based on top search results. In a structured annotation, the original query is split into different units that are associated with semantic attributes in the corresponding domain. We evaluate our techniques using real world queries and achieve significant improvement. . 1

6 0.49495238 108 emnlp-2013-Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data

7 0.4926554 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks

8 0.49217123 7 emnlp-2013-A Hierarchical Entity-Based Approach to Structuralize User Generated Content in Social Media: A Case of Yahoo! Answers

9 0.49061817 95 emnlp-2013-Identifying Multiple Userids of the Same Author

10 0.48840305 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

11 0.48839152 143 emnlp-2013-Open Domain Targeted Sentiment

12 0.48699263 140 emnlp-2013-Of Words, Eyes and Brains: Correlating Image-Based Distributional Semantic Models with Neural Representations of Concepts

13 0.48652861 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

14 0.48645756 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

15 0.48425379 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

16 0.48389903 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

17 0.48350367 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

18 0.4830181 69 emnlp-2013-Efficient Collective Entity Linking with Stacking

19 0.4828749 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

20 0.48285949 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation