acl acl2011 acl2011-256 knowledge-graph by maker-knowledge-mining

256 acl-2011-Query Weighting for Ranking Model Adaptation


Source: pdf

Author: Peng Cai ; Wei Gao ; Aoying Zhou ; Kam-Fai Wong

Abstract: We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. We present two query weighting schemes. The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 hinkistry of Education, China , Abstract We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. [sent-13, score-1.424]

2 Query weighting is a key step in ranking model adaptation. [sent-14, score-0.572]

3 As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. [sent-15, score-1.946]

4 The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. [sent-17, score-2.956]

5 This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. [sent-18, score-1.323]

6 The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. [sent-19, score-0.952]

7 0 data set demonstrate that query weighting significantly outperforms document instance weighting methods. [sent-21, score-1.619]

8 This motivated the popular domain adaptation solution based on instance weighting, which assigns larger weights to those transferable instances so that the model trained on the source domain can adapt more effectively to the target domain (Jiang and Zhai, 2007). [sent-37, score-0.912]

9 Existing instance weighting schemes mainly focus on the adaptation problem for classification (Zadrozny, 2004; Huang et al. [sent-38, score-0.661]

10 Although instance weighting scheme may be applied to documents for ranking model adaptation, the difference between classification and learning to rank should be highlighted to take careful consideration. [sent-41, score-0.772]

11 , 2007) to take the whole query as a learning object. [sent-45, score-0.565]

12 To avoid losing this information, query weighting takes the query as a whole and directly measures its importance. [sent-49, score-1.55]

13 Inspired by the principle of listwise approach, we hypothesize that the importance weighting for ranking model adapta- tion could be done better at query level rather than document level. [sent-51, score-1.448]

14 Figure 1 demonstrates the difference between instance weighting and query weighting, where there are two queries qs1 and qs2 in the source domain and qt1 and qt2 in the target domain, respectively, and each query has three retrieved documents. [sent-52, score-2.102]

15 It is worth noting that the information about which document instances belong to the same query is lost. [sent-54, score-0.771]

16 To avoid this information loss, query weighting scheme shown as Figure 1(b) directly measures importance weight at query level. [sent-55, score-1.707]

17 Instance weighting makes the importance estimation of document instances inaccurate when documents of the same source query are similar to the documents from different target queries. [sent-56, score-1.614]

18 No matter what weighting schemes are used, it makes sense to assign high weights to source queries qs1 and qs2 because they are similar to target queries qt1 and qt2, respectively. [sent-58, score-1.001]

19 it’s not quite similar to any of qt1 and qt2 at query level, meaning that the ranking knowledge from qs3 is different from that of qt1 and qt2 and thus less useful for the transfer to the target domain. [sent-60, score-0.858]

20 Unfor- tunately, the three source queries qs1, qs2 and qs3 would be weighted equally by document instance weighting scheme. [sent-61, score-0.88]

21 The reason is that all of their documents are similar to the two document instances in target domain despite the fact that the documents of qs3 correspond to their counterparts from different target queries. [sent-62, score-0.647]

22 Therefore, we should consider the source query as a whole and directly measure the query importance. [sent-63, score-1.263]

23 However, it’s not trivial to directly estimate a query’s weight because a query is essentially provided as a matrix where each row represents a vector of document features. [sent-64, score-0.819]

24 2 Instance Weighting Scheme Review The basic idea of instance weighting is to put larger weights on source instances which are more similar to target domain. [sent-66, score-0.828]

25 In this work, we also focus on weighting source queries only using unlabeled target queries. [sent-73, score-0.77]

26 With the domain separator, the probability that a source instance is classified to target domain can be used as the importance weight. [sent-77, score-0.642]

27 Other instance weighting methods were proposed for the sample selection bias or covariate shift in the more general setting of classifier learning (Shimodaira, 2000; Sugiyama et al. [sent-78, score-0.534]

28 For using instance weighting in pairwise ranking algorithms, the weights of document instances should be transformed into those of document pairs (Gao et al. [sent-89, score-1.093]

29 To consider query factor, query weight was furt∗hewr estimated as the average value of the weights over all the pairs, i. [sent-92, score-1.234]

30 , wq = M1 ∑i,j wij, where M is the number of pairs in query ∑q. [sent-94, score-0.592]

31 Additionally, to take the advantage of both query and document information, a probabilistic weighting for ⟨xi, xj⟩ was mforomdealteiod by wq ∗ wij. [sent-95, score-1.149]

32 Through tinheg transformation, instance weighting schemes for classification can be applied to ranking model adaptation. [sent-96, score-0.684]

33 3 Query Weighting In this section, we extend instance weighting to directly estimate query importance for more effective ranking model adaptation. [sent-97, score-1.354]

34 We present two query weighting methods from different perspectives. [sent-98, score-0.985]

35 Note that although our methods are based on domain separator scheme, other instance weighting schemes such as KLIEP (Sugiyama et al. [sent-99, score-0.777]

36 1 Query Weighting by Document Feature Aggregation Our first query weighting method is inspired by the recent work on local learning for ranking (Geng et al. [sent-102, score-1.137]

37 The query can be compressed into a query feature vector, where each feature value is obtained by the aggregate of its corresponding features of all documents in the query. [sent-105, score-1.254]

38 B ia asnedd on dtehen ag- within each query, we can use a domain separator to directly weight the source queries with the set of queries from both domains. [sent-107, score-0.683]

39 P(qsi ∈ Dt), which can be used as the importance of qsi rela∈tiv De to the target domain. [sent-110, score-0.499]

40 From step 1to 9, Ds′ and Dt′ are constructed using query feature vectors from source and target domains. [sent-111, score-0.808]

41 The distance of the query feature vector qsi from Hst are transformed to the probability P(qsi ∈ Dt) using a sigmoid function (Platt and Platt, 1999). [sent-113, score-0.922]

42 2 Query Weighting by Comparing Queries across Domains Although the query feature vector in algorithm 1can approximate a query by aggregating its documents’ features, it potentially fails to capture important feature information due to the averaging effect during the aggregation. [sent-115, score-1.234]

43 For example, the merit of features in some influential documents may be canceled out in the mean-variance calculation, resulting in many distorted feature values in the query feature vector that hurts the accuracy of query classification hyperplane. [sent-116, score-1.315]

44 This urges us to propose another query 115 weighting method from a different perspective of query similarity. [sent-117, score-1.55]

45 Intuitively, the importance of a source query to the target domain is determined by its overall similarity to every target query. [sent-118, score-1.145]

46 Based on this intuition, we leverage domain separator to measure the similarity between a source query and each one of the target queries, where an individual domain separator is created for each pair of queries. [sent-119, score-1.328]

47 We estimate the weight of a source query using algorithm 2. [sent-120, score-0.752]

48 Note that we assume document instances in the same query are conditionally independent and all queries are independent of each other. [sent-121, score-0.913]

49 In step 3, Dq′si is constructed by all the document instances { x⃗k} in query qsi uwctitehd t bhye adlolm thaein do lcaubmele ys. [sent-122, score-1.059]

50 Fsotarn ceaesch { target query qtj , we use the classification hyperplane Hij to estimate P(⃗ xk ∈ Dq′tj), i. [sent-123, score-0.933]

51 the probability that each document xk of qsi is classified into the document set of qtj (step 8). [sent-125, score-0.73]

52 Finally, the probability of qsi belonging to the target domain P(qsi ∈ Dt) is calculated at step 11. [sent-127, score-0.517]

53 4 Ranking Model Adaptation via Query Weighting To adapt the source ranking model to the target domain, we need to incorporate query weights into existing ranking algorithms. [sent-129, score-1.131]

54 Note that query weights can be integrated with either pairwise or listwise algorithms. [sent-130, score-0.733]

55 For pairwise algorithms, a straightforward way is to assign the query weight to all the document pairs associated with this query. [sent-131, score-0.799]

56 However, document instance weighting cannot be appropriately utilized in listwise approach. [sent-132, score-0.701]

57 In order to compare query weighting with document instance weighting, we need to fairly apply them for the same approach of ranking. [sent-133, score-1.199]

58 Therefore, we choose pairwise approach to incorporate query weighting. [sent-134, score-0.612]

59 Let’s assume there are m queries in the data set of source domain, and for each query qi there are ℓ(qi) number of meaningful document pairs that can — 116 be constructed based on the ground truth rank labels. [sent-137, score-1.066]

60 )+, Equation 1 can be turned to the following form: minλ|| w⃗||2+∑m∑ℓ(∑jq=i)1(1 − zij∗ f( w⃗, x⃗qji(1)− x⃗ qji(2) )+ ∑i=1 (2)) Let IW(qi) represent the importance weight of source query qi. [sent-146, score-0.826]

61 Equation 2 is extended for integrating the query weight into the loss function in a straightforward way: min λ|| w⃗||2+ ℓ∑(qi) ∑m ∑IW(qi) ∗ ∑i= ∑1 ∑ (1 − zij ∗ f( w⃗, x⃗jq(i1) −⃗ x jq(i2)))+ j∑= ∑1 where IW(. [sent-147, score-0.708]

62 5 Evaluation We evaluated the proposed two query weighting methods on TREC-2003 and TREC-2004 web track datasets, which were released through LETOR3. [sent-149, score-0.985]

63 Originally, different query tasks were defined on different parts of data in the collection, which can be considered as different domains for us. [sent-152, score-0.594]

64 Our goal is to demonstrate that query weighting can be more effective than the state-of-the-art document instance weighting. [sent-154, score-1.199]

65 1 Datasets and Setup Three query tasks were defined in TREC-2003 and TREC-2004 web track, which are home page finding (HP), named page finding (NP) and topic distillation (TD) (Voorhees, 2003; Voorhees, 2004). [sent-156, score-0.565]

66 In this dataset, each document instance is represented by 64 features, including low-level features such as term frequency, inverse document frequency and document length, and high-level features such as BM25, language-modeling, PageRank and HITS. [sent-157, score-0.488]

67 The baseline ranking model is an RSVM directly trained on the source domain without using any weighting methods, denoted as no-weight. [sent-159, score-0.839]

68 We implemented two weighting measures based on domain separator and Kullback-Leibler divergence, referred to DS and KL, respectively. [sent-160, score-0.665]

69 In DS measure, three document instance weighting methods based on probability principle (Gao et al. [sent-161, score-0.634]

70 Our proposed query weighting methods are denoted by query-aggr and querycomp, corresponding to document feature aggregation in query and query comparison across domains, respectively. [sent-164, score-2.396]

71 All ranking models above were trained only on source domain training data and the labeled data of target domain was just used for testing. [sent-165, score-0.61]

72 T-test on MAP indicates that the improvement of query-aggr over no-weight is statistically significant on two adaptation tasks while the improvement of document instance weighting over no-weight is statistically significant only on one task. [sent-178, score-0.763]

73 This demonstrates the effectiveness of query ModelWeighting methodHP03 to TD03HP04 to TD04NP03 to TD03NP04 to TD04 doc-pair, doc-avg fa MndA dPoc f-ocro HmPb/N, respectively. [sent-180, score-0.565]

74 e †n,c ‡e, ♯le avnedl i sb oseldtf aatc e95 i%nd weighting compared to document instance weighting. [sent-182, score-0.694]

75 By contrast, more accurate query weights can be achieved by the more fine-grained similarity measure between the source query and all target queries in algorithm 2. [sent-185, score-1.599]

76 Specifically, after document feature aggregation, the number of query feature vectors in all adaptation tasks is no more than 150 in source and target domains. [sent-189, score-1.109]

77 As each query contains 1000 documents, they seemed to provide query-comp enough samples for achieving reasonable estimation of the density functions in both domains. [sent-191, score-0.599]

78 2 Adaptation from TD to HP/NP To further validate the effectiveness of query weighting, we also conducted adaptation from TD to HP and TD to NP . [sent-194, score-0.694]

79 We can see that document instance weighting 118 schemes including doc-pair, doc-avg and doc-comb can not outperform no-weight based on MAP measure. [sent-196, score-0.669]

80 The reason is that each query in TD has 1000 retrieved documents in which 10-15 documents are relevant whereas each query in HP or NP only consists 1-2 relevant documents. [sent-197, score-1.238]

81 Since query weighting method directly estimates the query importance instead of document instance importance, both query-aggr and querycomp can avoid such kind of negative influence that is inevitable in the three document instance weighting methods. [sent-200, score-2.535]

82 Query weighting assigns each source query with a weight value. [sent-204, score-1.139]

83 Note that it’s not meaningful to directly compare absolute weight values between query-aggr and query-comp because source query weights from distinct weighting methods have different range and scale. [sent-205, score-1.193]

84 However, it is feasible to compare the weights with the same weighting method. [sent-206, score-0.474]

85 Intuitively, if the ranking model learned from a source query can work well in target domain, it should get high weight. [sent-207, score-0.925]

86 from queries qs1 and qs2 respectively, and fq1s per- forms better than fq2s , then the source query weight of qs1 should be higher than that of qs2. [sent-211, score-0.861]

87 For further analysis, we compare the weight values between each source query pair, for which we trained RSVM on each source query and evaluated the learned model on test data from target domain. [sent-212, score-1.492]

88 For comparison, we also ranked these queries according to randomly generated query weights, which is denoted as query-rand in addition to queryaggr and query-comp. [sent-217, score-0.775]

89 The Kendall’s τ = is used to measure the correlation (Kendall, 1970), where P is the number of concordant query pairs and Q is the number of discordant pairs. [sent-218, score-0.594]

90 4 Efficiency In the situation where there are large scale data in source and target domains, how to efficiently weight a source query is another interesting problem. [sent-227, score-0.954]

91 Without the loss of generality, we reported the weighting time of doc-pair, query-aggr and query-comp from adaptation from TD to HP using DS measure. [sent-228, score-0.575]

92 As shown in table 6, query-aggr can efficiently weight query using query feature vector. [sent-230, score-1.242]

93 The reason is two-fold: one is the operation of query document aggregation can be done very fast, and the other is there are 1000 documents in each query ofTD or HP, which means that the compression ratio is 1000: 1. [sent-231, score-1.392]

94 And query-comp uses a divide-and-conquer method to measure the similarity of source query to each target query, and then efficiently combine these Weighting methodHP03 to TD03HP04 to TD04NP03 to TD03NP04 to TD04 q qu ue er ry y- -racoagngmdrp0 0. [sent-234, score-0.865]

95 , 2008b), the parameters of ranking model trained on the source domain was adjusted with the small set of labeled data in the target domain. [sent-267, score-0.485]

96 , 2010) studied instance weighting based on domain separator for learning to rank by only using training data from source domain. [sent-276, score-0.915]

97 In this work, we propose to directly measure the query importance instead of document instance importance by considering information at both levels. [sent-277, score-1.022]

98 7 Conclusion We introduced two simple yet effective query weighting methods for ranking model adaptation. [sent-278, score-1.137]

99 The first represents a set of document instances within the same query as a query feature vector, and then directly measure the source query importance to the target domain. [sent-279, score-2.28]

100 The second measures the similarity between a source query and each target query, and then combine the fine-grained similarity values to estimate its importance to target domain. [sent-280, score-1.089]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('query', 0.565), ('weighting', 0.42), ('qsi', 0.288), ('ranking', 0.152), ('queries', 0.142), ('document', 0.137), ('qtj', 0.136), ('adaptation', 0.129), ('domain', 0.125), ('td', 0.122), ('separator', 0.12), ('importance', 0.107), ('target', 0.104), ('source', 0.104), ('geng', 0.091), ('rsvm', 0.091), ('dt', 0.082), ('sugiyama', 0.08), ('instance', 0.077), ('rweight', 0.076), ('hp', 0.073), ('gao', 0.073), ('ds', 0.072), ('aggregation', 0.071), ('rank', 0.069), ('instances', 0.069), ('listwise', 0.067), ('dq', 0.067), ('zij', 0.067), ('hyperplane', 0.063), ('hst', 0.061), ('jq', 0.061), ('documents', 0.054), ('weights', 0.054), ('iws', 0.053), ('jn', 0.053), ('chen', 0.052), ('weight', 0.05), ('zadrozny', 0.049), ('qi', 0.049), ('pairwise', 0.047), ('kendall', 0.046), ('herbrich', 0.045), ('hij', 0.045), ('qis', 0.045), ('qji', 0.045), ('qjt', 0.045), ('rmap', 0.045), ('kl', 0.045), ('xjq', 0.04), ('im', 0.04), ('voorhees', 0.039), ('qin', 0.038), ('denoted', 0.038), ('transfer', 0.037), ('covariate', 0.037), ('similarity', 0.036), ('ij', 0.036), ('yan', 0.036), ('schemes', 0.035), ('feature', 0.035), ('density', 0.034), ('vector', 0.034), ('cu', 0.033), ('estimate', 0.033), ('xk', 0.032), ('wi', 0.031), ('aoying', 0.03), ('avnedl', 0.03), ('depin', 0.03), ('dpoc', 0.03), ('kliep', 0.03), ('mnda', 0.03), ('modelweighting', 0.03), ('oseldtf', 0.03), ('queryaggr', 0.03), ('querycomp', 0.03), ('hang', 0.03), ('tao', 0.03), ('jiang', 0.03), ('blitzer', 0.029), ('domains', 0.029), ('platt', 0.029), ('burges', 0.029), ('iw', 0.029), ('rc', 0.029), ('measure', 0.029), ('cao', 0.028), ('zhai', 0.028), ('map', 0.028), ('distorted', 0.027), ('ial', 0.027), ('wq', 0.027), ('efficiently', 0.027), ('loss', 0.026), ('yt', 0.026), ('ys', 0.025), ('toy', 0.025), ('shimodaira', 0.025), ('gang', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 256 acl-2011-Query Weighting for Ranking Model Adaptation

Author: Peng Cai ; Wei Gao ; Aoying Zhou ; Kam-Fai Wong

Abstract: We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. We present two query weighting schemes. The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods.

2 0.36780092 182 acl-2011-Joint Annotation of Search Queries

Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith

Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

3 0.31463423 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

Author: Joseph Reisinger ; Marius Pasca

Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.

4 0.30573621 258 acl-2011-Ranking Class Labels Using Query Sessions

Author: Marius Pasca

Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.

5 0.24860001 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

Author: Patrick Pantel ; Ariel Fuxman

Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.

6 0.24110298 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

7 0.14766875 204 acl-2011-Learning Word Vectors for Sentiment Analysis

8 0.14392661 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

9 0.13615853 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

10 0.12790175 313 acl-2011-Two Easy Improvements to Lexical Weighting

11 0.12707053 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search

12 0.12441836 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

13 0.11959875 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

14 0.11669672 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

15 0.11657692 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity

16 0.11026888 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

17 0.10531395 109 acl-2011-Effective Measures of Domain Similarity for Parsing

18 0.10173581 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

19 0.10071705 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

20 0.098834127 98 acl-2011-Discovery of Topically Coherent Sentences for Extractive Summarization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.206), (1, 0.109), (2, -0.068), (3, 0.136), (4, -0.13), (5, -0.332), (6, -0.135), (7, -0.229), (8, 0.208), (9, 0.007), (10, 0.204), (11, -0.049), (12, -0.016), (13, -0.009), (14, 0.043), (15, -0.011), (16, -0.064), (17, 0.041), (18, -0.032), (19, -0.023), (20, -0.065), (21, -0.09), (22, 0.011), (23, 0.045), (24, -0.028), (25, 0.113), (26, -0.039), (27, -0.062), (28, 0.083), (29, -0.016), (30, -0.047), (31, -0.008), (32, -0.031), (33, -0.047), (34, -0.064), (35, 0.024), (36, -0.051), (37, 0.049), (38, 0.021), (39, -0.034), (40, 0.007), (41, -0.02), (42, -0.005), (43, -0.04), (44, -0.09), (45, 0.028), (46, 0.016), (47, 0.02), (48, -0.007), (49, -0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98286593 256 acl-2011-Query Weighting for Ranking Model Adaptation

Author: Peng Cai ; Wei Gao ; Aoying Zhou ; Kam-Fai Wong

Abstract: We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. We present two query weighting schemes. The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods.

2 0.86216915 258 acl-2011-Ranking Class Labels Using Query Sessions

Author: Marius Pasca

Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.

3 0.84068722 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

Author: Joseph Reisinger ; Marius Pasca

Abstract: We develop a novel approach to the semantic analysis of short text segments and demonstrate its utility on a large corpus of Web search queries. Extracting meaning from short text segments is difficult as there is little semantic redundancy between terms; hence methods based on shallow semantic analysis may fail to accurately estimate meaning. Furthermore search queries lack explicit syntax often used to determine intent in question answering. In this paper we propose a hybrid model of semantic analysis combining explicit class-label extraction with a latent class PCFG. This class-label correlation (CLC) model admits a robust parallel approximation, allowing it to scale to large amounts of query data. We demonstrate its performance in terms of (1) its predicted label accuracy on polysemous queries and (2) its ability to accurately chunk queries into base constituents.

4 0.83380234 182 acl-2011-Joint Annotation of Search Queries

Author: Michael Bendersky ; W. Bruce Croft ; David A. Smith

Abstract: W. Bruce Croft Dept. of Computer Science University of Massachusetts Amherst, MA cro ft @ c s .uma s s .edu David A. Smith Dept. of Computer Science University of Massachusetts Amherst, MA dasmith@ c s .umas s .edu articles or web pages). As previous research shows, these differences severely limit the applicability of Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an impor- tant part of query processing and understanding in information retrieval systems. Due to their brevity and idiosyncratic structure, search queries pose a challenge to existing NLP tools. To address this challenge, we propose a probabilistic approach for performing joint query annotation. First, we derive a robust set of unsupervised independent annotations, using queries and pseudo-relevance feedback. Then, we stack additional classifiers on the independent annotations, and exploit the dependencies between them to further improve the accuracy, even with a very limited amount of available training data. We evaluate our method using a range of queries extracted from a web search log. Experimental results verify the effectiveness of our approach for both short keyword queries, and verbose natural language queries.

5 0.79815602 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

Author: Patrick Pantel ; Ariel Fuxman

Abstract: We propose methods for estimating the probability that an entity from an entity database is associated with a web search query. Association is modeled using a query entity click graph, blending general query click logs with vertical query click logs. Smoothing techniques are proposed to address the inherent data sparsity in such graphs, including interpolation using a query synonymy model. A large-scale empirical analysis of the smoothing techniques, over a 2-year click graph collected from a commercial search engine, shows significant reductions in modeling error. The association models are then applied to the task of recommending products to web queries, by annotating queries with products from a large catalog and then mining query- product associations through web search session analysis. Experimental analysis shows that our smoothing techniques improve coverage while keeping precision stable, and overall, that our top-performing model affects 9% of general web queries with 94% precision.

6 0.7908113 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

7 0.68628979 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search

8 0.57161492 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity

9 0.56470507 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

10 0.54952723 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

11 0.50312972 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

12 0.44888884 135 acl-2011-Faster and Smaller N-Gram Language Models

13 0.44692022 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

14 0.44464862 179 acl-2011-Is Machine Translation Ripe for Cross-Lingual Sentiment Classification?

15 0.41443026 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

16 0.40264452 11 acl-2011-A Fast and Accurate Method for Approximate String Search

17 0.36770755 109 acl-2011-Effective Measures of Domain Similarity for Parsing

18 0.35402483 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

19 0.35014936 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

20 0.33682498 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.012), (10, 0.23), (17, 0.043), (26, 0.079), (37, 0.178), (39, 0.043), (41, 0.062), (55, 0.037), (59, 0.026), (72, 0.044), (88, 0.018), (91, 0.032), (96, 0.115)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82094669 197 acl-2011-Latent Class Transliteration based on Source Language Origin

Author: Masato Hagiwara ; Satoshi Sekine

Abstract: Transliteration, a rich source of proper noun spelling variations, is usually recognized by phonetic- or spelling-based models. However, a single model cannot deal with different words from different language origins, e.g., “get” in “piaget” and “target.” Li et al. (2007) propose a method which explicitly models and classifies the source language origins and switches transliteration models accordingly. This model, however, requires an explicitly tagged training set with language origins. We propose a novel method which models language origins as latent classes. The parameters are learned from a set of transliterated word pairs via the EM algorithm. The experimental results of the transliteration task of Western names to Japanese show that the proposed model can achieve higher accuracy compared to the conventional models without latent classes.

same-paper 2 0.80923814 256 acl-2011-Query Weighting for Ranking Model Adaptation

Author: Peng Cai ; Wei Gao ; Aoying Zhou ; Kam-Fai Wong

Abstract: We propose to directly measure the importance of queries in the source domain to the target domain where no rank labels of documents are available, which is referred to as query weighting. Query weighting is a key step in ranking model adaptation. As the learning object of ranking algorithms is divided by query instances, we argue that it’s more reasonable to conduct importance weighting at query level than document level. We present two query weighting schemes. The first compresses the query into a query feature vector, which aggregates all document instances in the same query, and then conducts query weighting based on the query feature vector. This method can efficiently estimate query importance by compressing query data, but the potential risk is information loss resulted from the compression. The second measures the similarity between the source query and each target query, and then combines these fine-grained similarity values for its importance estimation. Adaptation experiments on LETOR3.0 data set demonstrate that query weighting significantly outperforms document instance weighting methods.

3 0.70942277 333 acl-2011-Web-Scale Features for Full-Scale Parsing

Author: Mohit Bansal ; Dan Klein

Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.

4 0.69292498 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

Author: Danushka Bollegala ; David Weir ; John Carroll

Abstract: We describe a sentiment classification method that is applicable when we do not have any labeled data for a target domain but have some labeled data for multiple other domains, designated as the source domains. We automat- ically create a sentiment sensitive thesaurus using both labeled and unlabeled data from multiple source domains to find the association between words that express similar sentiments in different domains. The created thesaurus is then used to expand feature vectors to train a binary classifier. Unlike previous cross-domain sentiment classification methods, our method can efficiently learn from multiple source domains. Our method significantly outperforms numerous baselines and returns results that are better than or comparable to previous cross-domain sentiment classification methods on a benchmark dataset containing Amazon user reviews for different types of products.

5 0.69266236 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

Author: Phil Blunsom ; Trevor Cohn

Abstract: In this work we address the problem of unsupervised part-of-speech induction by bringing together several strands of research into a single model. We develop a novel hidden Markov model incorporating sophisticated smoothing using a hierarchical Pitman-Yor processes prior, providing an elegant and principled means of incorporating lexical characteristics. Central to our approach is a new type-based sampling algorithm for hierarchical Pitman-Yor models in which we track fractional table counts. In an empirical evaluation we show that our model consistently out-performs the current state-of-the-art across 10 languages.

6 0.69242638 122 acl-2011-Event Extraction as Dependency Parsing

7 0.68875563 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

8 0.68817854 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

9 0.68812358 292 acl-2011-Target-dependent Twitter Sentiment Classification

10 0.68811393 334 acl-2011-Which Noun Phrases Denote Which Concepts?

11 0.68644571 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

12 0.68603843 183 acl-2011-Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora

13 0.68582094 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

14 0.68421167 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

15 0.68174088 204 acl-2011-Learning Word Vectors for Sentiment Analysis

16 0.68153781 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

17 0.67597151 85 acl-2011-Coreference Resolution with World Knowledge

18 0.67494291 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

19 0.67399156 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

20 0.66908574 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation