acl acl2011 acl2011-169 knowledge-graph by maker-knowledge-mining

169 acl-2011-Improving Question Recommendation by Exploiting Information Need

Source: pdf

Author: Shuguang Li ; Suresh Manandhar

Abstract: In this paper we address the problem of question recommendation from large archives of community question answering data by exploiting the users’ information needs. Our experimental results indicate that questions based on the same or similar information need can provide excellent question recommendation. We show that translation model can be effectively utilized to predict the information need given only the user’s query question. Experiments show that the proposed information need prediction approach can improve the performance of question recommendation.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 uk Abstract In this paper we address the problem of question recommendation from large archives of community question answering data by exploiting the users’ information needs. [sent-4, score-1.381]

2 Our experimental results indicate that questions based on the same or similar information need can provide excellent question recommendation. [sent-5, score-0.902]

3 Experiments show that the proposed information need prediction approach can improve the performance of question recommendation. [sent-7, score-0.576]

4 1 Introduction There has recently been a rapid growth in the num- ber of community question answering (CQA) services such as Yahoo! [sent-8, score-0.478]

5 Answers1 , Askville2 and WikiAnswer3 where people answer questions posted by other users. [sent-9, score-0.441]

6 These CQA services have built up very large archives of questions and their answers. [sent-10, score-0.533]

7 They provide a valuable resource for question answering research. [sent-11, score-0.422]

8 In the CQA archives, the title part is the user’s query question, and the user’s information need is usually expressed as natural language statements mixed with questions expressing their interests in the question body part. [sent-14, score-1.128]

9 ty answers from the archives to be retrieved, we need to search CQA archives of previous questions that are closely associated with answers. [sent-24, score-0.882]

10 If a question is found to be interesting to the user, then a previous answer can be provided with very little delay. [sent-25, score-0.399]

11 Question search and question recommendation are proposed to facilitate finding highly relevant or potentially interesting questions. [sent-26, score-0.696]

12 Given a user’s question as the query, question search tries to return the most semantically similar questions from the question archives. [sent-27, score-1.496]

13 As the complement of question search, we define question recommendation as recommending questions whose information need is the same or similar to the user’s original question. [sent-28, score-1.728]

14 For example, the question “What aspects of my computer do I need to upgrade . [sent-29, score-0.471]

15 ” are both good recommendation questions for the user in Table 1. [sent-47, score-0.795]

16 So the recommended questions are not necessarily identical or similar to the query question. [sent-48, score-0.751]

17 In this paper, we discuss methods for question recommendation based on using the similarity between information need in the archive. [sent-49, score-1.056]

18 We also propose two models to predict the information need based on the query question even if there’s no information need expressed in the body of the question. [sent-50, score-0.813]

19 We show that with the proposed models it is possible to recommend questions that have the same or similar information need. [sent-51, score-0.493]

20 (1997) combined a lexical metric and a simple semantic knowledge-based (WordNet) similarity method to retrieve semantically similar questions from frequently asked question (FAQ) data. [sent-62, score-1.027]

21 (2005a) retrieved semantically similar questions from Korean CQA data by calculating the similarity between their answers. [sent-64, score-0.743]

22 The assumption behind their research is that questions with very similar answers tend to be semantically similar. [sent-65, score-0.528]

23 (2005b) also discussed methods for grouping similar questions based on using the similarity be- tween answers in the archive. [sent-67, score-0.704]

24 These grouped question pairs were further used as training data to estimate probabilities for a translation-based question retrieval model. [sent-68, score-0.77]

25 (2009) proposed a tree kernel framework to find similar questions in the CQA archive based on syntactic tree structures. [sent-70, score-0.431]

26 (2008) presented an incremental automatic question recommendation framework based on probabilistic latent semantic analysis. [sent-75, score-0.696]

27 Question recommendation in their work considered both the users’ interests and feedback. [sent-76, score-0.371]

28 (2008) made use of a tree-cut model to represent questions as graphs of topic terms. [sent-78, score-0.476]

29 The recommended questions can provide different aspects around the topic of the query question. [sent-80, score-0.83]

30 The above question search and recommendation research provide different ways to retrieve questions from large archives of question answering data. [sent-81, score-1.682]

31 However, none of them considers the similarity or diversity between questions by exploring their information needs. [sent-82, score-0.641]

32 3 Short Text Similarity Measures In question retrieval systems accurate similarity measures between documents are crucial. [sent-83, score-0.678]

33 However the state-of-theart techniques usually fail to achieve desired results due to short questions and information need texts. [sent-86, score-0.585]

34 In order to measure the similarity between short texts, we make use of three kinds of text similarity measures: TFIDF based, Knowledge based and Latent Dirichlet Allocation (LDA) based similarity measures in this paper. [sent-87, score-0.751]

35 We will compare their performance for the task of question recommendation in the experiment section. [sent-88, score-0.696]

36 The similarity between two text Di and Dj is the cosine similarity in the vector space model: cos(Di,Dj) =kDDiiTkkDDjjk This method is used in most information retrieval systems as it is both efficient and effective. [sent-92, score-0.514]

37 We also found that in CQA data short contents in the question body cannot provide any information about the users’ information needs. [sent-95, score-0.497]

38 Based on the above two reasons, in the test data sets we do not include the questions whose information need parts contain only a few noninformative words . [sent-96, score-0.622]

39 These knowledge-based similarity measures were derived from word semantic similarity by making use of WordNet. [sent-100, score-0.473]

40 , 2006) to derive a text-to-text similarity metric mcs for two given texts Di and Dj : mcs(Di,Dj) =Pw∈DimPaxwS∈iDmi(iwdf,(Dwj) ∗ idf(w) +Pw∈DjmPPaxwS∈iDmj(iwdf,(Dwi) ∗ idf(w) For each word w in Di,P maxSim(w, Dj) computes the maximum semantic similarity between w and any word in Dj . [sent-103, score-0.53]

41 (2010) presented probabilistic topic model based methods to measure the similarity between question and candidate answers. [sent-110, score-0.674]

42 A passage D in the retrieved documents (document collection) is represented as a mixture of fixed topics, with topic z getting weight in passage D and each topic is a distribution over a finite vocabulary of words, with word w having a probability in topic z. [sent-114, score-0.39]

43 , 2010) to measure the similarity between short information need texts. [sent-117, score-0.428]

44 It is often the case that the query question does not have a question body part. [sent-122, score-0.868]

45 So we need a model to predict the information need part based on the query question in order to recommend questions based on the similarity of their information needs. [sent-123, score-1.446]

46 In our collected CQA archive, question title and information need pairs can be considered as a type of parallel corpus, which is used for estimating word-to-word translation probabilities. [sent-159, score-0.659]

47 More specifically, we estimated the IBM-4 model by GIZA++4 with the question part as the source language and information need part as the target language. [sent-160, score-0.505]

48 1 Text Preprocessing The questions posted on community QA sites often contain spelling or grammar errors. [sent-162, score-0.453]

49 In this paper, we use an open source software afterthedeadline5 to automatically correct the spelling errors in the question and information need texts first. [sent-167, score-0.57]

50 Stop word removal and lemmatization are applied to the all the raw texts before feeding into machine translation model training, the LDA model estimating and similarity calculation. [sent-172, score-0.391]

51 2 Construction of Training and Testing Sets We made use of the questions crawled from Yahoo! [sent-174, score-0.397]

52 More specifically, we obtained 2 million questions under two categories at Yahoo! [sent-176, score-0.397]

53 Depending on whether the best answers have been chosen by the asker, questions from Yahoo! [sent-188, score-0.494]

54 From each of the above two categories, we randomly selected 200 resolved questions to construct two testing data sets: ‘Test t’ (‘travel’), and ‘Test c’ (‘computers&internet;’). [sent-190, score-0.429]

55 In order to mea- sure the information need similarity in our experiment we selected only those questions whose information needs part contained at least 3 informative words after stop word removal. [sent-191, score-0.862]

56 The rest of the questions ‘Train t’ and ‘Train c’ under the two categories are left for estimating the LDA topic models and the translation models. [sent-192, score-0.592]

57 3 Experimental Setup For each question (query question) in ‘Test t’ or ‘Test c’, we used the words in the question title part as the main search query and the other words in the information need part as search query expansion to retrieve candidate recommended questions from Yahoo! [sent-195, score-1.802]

58 We obtained an average of 154 resolved questions under ‘travel’ or ‘computers&internet;’ category, and three assessors were involved in the manual judgments. [sent-197, score-0.496]

59 Given a question returned by a recommendation method, two assessors are asked to label it with ‘good’ or ‘bad’ . [sent-198, score-0.763]

60 If a recommended question is considered to express the same or similar information need, the assessor will label it ‘good’ ; otherwise, the assessor will label it as ‘bad’ . [sent-201, score-0.739]

61 Three measures for evaluating the recommendation performance are utilized. [sent-202, score-0.394]

62 In MRR the reciprocal rank of a query question is the multiplicative inverse of the rank of the first ‘good’ recommended question. [sent-205, score-0.751]

63 The top five prediction accuracy for a query question is the number of ‘good’ recommended questions out of the top five ranked questions and the top ten accuracy is calculated out of the top ten ranked questions. [sent-206, score-1.778]

64 4 Similarity Measure The first experiment conducted question recommendation based on their information need parts. [sent-208, score-0.846]

65 Different text similarity methods described in section 3 were used to measure the similarity between the information need texts. [sent-209, score-0.6]

66 We treated each question including the question title and the information need part as a single document of a sequence of words. [sent-213, score-0.934]

67 The results in table 2 show that TFIDF and LDA1 methods perform better for recommending questions than the others. [sent-216, score-0.495]

68 After further analysis of the questions recommended by both methods, we discov8http://ldc. [sent-217, score-0.629]

69 net 1430 Table 4: Question recommendation results by LDA measuring the similarity between information needs ered that the ordering of the recommended questions from TFIDF and LDA1 are quite different. [sent-224, score-1.286]

70 TFIDF similarity method prefers texts with more common words, while the LDA1 method can find the relation between the non-common words between short texts based on a series of third-party topics. [sent-225, score-0.378]

71 The LDA1 method outperforms the TFIDF method in two ways: (1) the top recommended questions’ information needs share less common words with the query question’s; (2) the top recommended questions span wider topics. [sent-226, score-1.128]

72 The questions highly recommended by LDA1 can suggest more useful topics to the user. [sent-227, score-0.683]

73 That is to say, we are able to recommend questions to the users by measuring their information needs. [sent-232, score-0.574]

74 The first two recommended questions for Q1 and Q2 using LDA1 method are shown in table 4. [sent-233, score-0.629]

75 5 Information Need Prediction There are some retrieved questions whose information need parts are empty or become empty or almost empty (one or two words left) after the preprocessing step. [sent-251, score-0.825]

76 The average number of such retrieved questions for each query question is 10 in our experiment. [sent-252, score-0.943]

77 The similarity ranking scores of these questions are quite low or zero in the previous experiment. [sent-253, score-0.607]

78 In this experiment, we will apply information need prediction to the questions whose information needs are missing in order to find out whether we improve the recommendation task. [sent-254, score-1.064]

79 1431 The question and information need pairs in both ‘Train t’ and ‘Train c’ training sets were used to train two IBM-4 translation models by GIZA++ toolkit. [sent-255, score-0.582]

80 This has always been a tough question: not using self-translated words can reduce retrieval performance as the information need parts need the terms to represent the semantic meanings; using self-translated words does not take advantage of the translation approach. [sent-261, score-0.446]

81 The predicted information need words for the retrieved questions are shown in Table 5. [sent-263, score-0.616]

82 In Q1, the information need behind question “recommend web- site for custom built computer parts” may imply that the users need to know some information about building computer parts such as “ram” and “motherboard” for a different purpose such as “gaming”. [sent-264, score-0.746]

83 We also did a small scale comparison between the generated information needs against the real questions whose information need parts are not empty. [sent-266, score-0.695]

84 This reflects that there are some other users asking similar questions with the same or other interests. [sent-270, score-0.445]

85 For example, Q5, Q6, and Q7 in table 5 were retrieved as recommendation candidates for the query question in Table 1. [sent-276, score-0.887]

86 All of the three questions were good recommendation candidates, but only Q6 ranked fifth while Q5 and Q7 were out of the top 30 by LDA1 method. [sent-277, score-0.774]

87 Moreover, in a small number of cases bad recommendation questions received higher scores and jeopardized the performance. [sent-278, score-0.768]

88 For example, for query question “How can you add subtitles to videos? [sent-279, score-0.516]

89 ”, a retrieved question “How would iadd a music file to a video clip. [sent-292, score-0.51]

90 ” was highly recommended by TFIDF approach as predicted information need contained ‘youtube’, ‘video’, ‘music’, ‘download’, . [sent-296, score-0.382]

91 Thus, we can improve the performance of question recommendation by predicting information needs. [sent-307, score-0.73]

92 6 Conclusions In this paper we addressed the problem of recommending questions from large archives of community question answering data based on users’ information needs. [sent-308, score-1.143]

93 We also utilized a translation model and a LDA topic model to predict the information need only given the user’s query question. [sent-309, score-0.428]

94 Different information need similarity measures were compared to prove that it is possible to satisfy user’s information need by recommending questions from large archives of community QA. [sent-310, score-1.25]

95 Experiments showed that the proposed translation based language model for question information need prediction further enhanced the performance of question recommendation methods. [sent-319, score-1.349]

96 Question answering from frequently-asked question files: Experiences with the FAQ Finder system. [sent-347, score-0.422]

97 Searching questions by identifying question topic and question focus. [sent-372, score-1.186]

98 Finding similar questions in large question and answer archives. [sent-388, score-0.796]

99 A syntactic tree matching approach to finding similar questions in community-based qa services. [sent-448, score-0.438]

100 Exploiting salient patterns for question detection and question retrieval in community-based question answering. [sent-452, score-1.125]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('questions', 0.397), ('question', 0.355), ('recommendation', 0.341), ('tfidf', 0.238), ('recommended', 0.232), ('similarity', 0.21), ('cqa', 0.201), ('lda', 0.147), ('archives', 0.136), ('query', 0.122), ('dj', 0.118), ('need', 0.116), ('iz', 0.109), ('jeon', 0.109), ('recommending', 0.098), ('answers', 0.097), ('di', 0.082), ('topic', 0.079), ('pr', 0.078), ('yahoo', 0.077), ('translation', 0.077), ('prediction', 0.071), ('retrieved', 0.069), ('answering', 0.067), ('assessors', 0.067), ('jiwoon', 0.067), ('jz', 0.067), ('texts', 0.065), ('recommend', 0.062), ('retrieval', 0.06), ('assessor', 0.059), ('user', 0.057), ('community', 0.056), ('topics', 0.054), ('measures', 0.053), ('music', 0.053), ('travel', 0.049), ('mrr', 0.049), ('users', 0.048), ('connaway', 0.045), ('iwdf', 0.045), ('joon', 0.045), ('mcs', 0.045), ('satisficing', 0.045), ('silipigni', 0.045), ('xpr', 0.045), ('celikyilmaz', 0.045), ('kl', 0.044), ('answer', 0.044), ('parts', 0.043), ('passage', 0.042), ('reciprocal', 0.042), ('cao', 0.042), ('duan', 0.042), ('qa', 0.041), ('huizhong', 0.039), ('yong', 0.039), ('subtitles', 0.039), ('chandra', 0.039), ('faq', 0.039), ('jcn', 0.039), ('lynn', 0.039), ('yunbo', 0.039), ('needs', 0.039), ('estimating', 0.039), ('title', 0.038), ('short', 0.038), ('uk', 0.037), ('kai', 0.036), ('top', 0.036), ('body', 0.036), ('document', 0.036), ('idf', 0.035), ('semantically', 0.034), ('archive', 0.034), ('burke', 0.034), ('youtube', 0.034), ('wz', 0.034), ('empty', 0.034), ('information', 0.034), ('preprocessed', 0.034), ('bruce', 0.034), ('sigir', 0.034), ('calculating', 0.033), ('video', 0.033), ('measuring', 0.033), ('allocation', 0.033), ('whose', 0.032), ('preprocessing', 0.032), ('resolved', 0.032), ('griffiths', 0.032), ('och', 0.031), ('mihalcea', 0.031), ('kd', 0.031), ('retrieve', 0.031), ('measure', 0.03), ('york', 0.03), ('interests', 0.03), ('received', 0.03), ('ten', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999887 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

Author: Shuguang Li ; Suresh Manandhar

2 0.31672758 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives

Author: Guangyou Zhou ; Li Cai ; Jun Zhao ; Kang Liu

Abstract: Community-based question answer (Q&A;) has become an important issue due to the popularity of Q&A; archives on the web. This paper is concerned with the problem of question retrieval. Question retrieval in Q&A; archives aims to find historical questions that are semantically equivalent or relevant to the queried questions. In this paper, we propose a novel phrase-based translation model for question retrieval. Compared to the traditional word-based translation models, the phrasebased translation model is more effective because it captures contextual information in modeling the translation ofphrases as a whole, rather than translating single words in isolation. Experiments conducted on real Q&A; data demonstrate that our proposed phrasebased translation model significantly outperforms the state-of-the-art word-based translation model.

3 0.20801064 25 acl-2011-A Simple Measure to Assess Non-response

Author: Anselmo Penas ; Alvaro Rodrigo

Abstract: There are several tasks where is preferable not responding than responding incorrectly. This idea is not new, but despite several previous attempts there isn’t a commonly accepted measure to assess non-response. We study here an extension of accuracy measure with this feature and a very easy to understand interpretation. The measure proposed (c@1) has a good balance of discrimination power, stability and sensitivity properties. We show also how this measure is able to reward systems that maintain the same number of correct answers and at the same time decrease the number of incorrect ones, by leaving some questions unanswered. This measure is well suited for tasks such as Reading Comprehension tests, where multiple choices per question are given, but only one is correct.

4 0.17338407 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

Author: Bo Pang ; Ravi Kumar

Abstract: Web search is an information-seeking activity. Often times, this amounts to a user seeking answers to a question. However, queries, which encode user’s information need, are typically not expressed as full-length natural language sentences in particular, as questions. Rather, they consist of one or more text fragments. As humans become more searchengine-savvy, do natural-language questions still have a role to play in web search? Through a systematic, large-scale study, we find to our surprise that as time goes by, web users are more likely to use questions to express their search intent. —

5 0.15793854 161 acl-2011-Identifying Word Translations from Comparable Corpora Using Latent Topic Models

Author: Ivan Vulic ; Wim De Smet ; Marie-Francine Moens

Abstract: A topic model outputs a set of multinomial distributions over words for each topic. In this paper, we investigate the value of bilingual topic models, i.e., a bilingual Latent Dirichlet Allocation model for finding translations of terms in comparable corpora without using any linguistic resources. Experiments on a document-aligned English-Italian Wikipedia corpus confirm that the developed methods which only use knowledge from word-topic distributions outperform methods based on similarity measures in the original word-document space. The best results, obtained by combining knowledge from wordtopic distributions with similarity measures in the original space, are also reported.

6 0.13852355 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities

7 0.13194706 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

8 0.12353618 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments

9 0.11026888 256 acl-2011-Query Weighting for Ranking Model Adaptation

10 0.10585297 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

11 0.10438706 109 acl-2011-Effective Measures of Domain Similarity for Parsing

12 0.10417107 182 acl-2011-Joint Annotation of Search Queries

13 0.097792529 178 acl-2011-Interactive Topic Modeling

14 0.097682297 117 acl-2011-Entity Set Expansion using Topic information

15 0.097412206 177 acl-2011-Interactive Group Suggesting for Twitter

16 0.093080372 52 acl-2011-Automatic Labelling of Topic Models

17 0.088109128 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

18 0.088096142 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

19 0.087928034 204 acl-2011-Learning Word Vectors for Sentiment Analysis

20 0.084956288 257 acl-2011-Question Detection in Spoken Conversations Using Textual Conversations

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.211), (1, 0.057), (2, -0.021), (3, 0.17), (4, -0.09), (5, -0.131), (6, -0.077), (7, -0.016), (8, 0.087), (9, -0.019), (10, 0.038), (11, -0.004), (12, 0.064), (13, -0.056), (14, 0.043), (15, 0.001), (16, 0.006), (17, -0.079), (18, -0.024), (19, -0.014), (20, 0.094), (21, 0.08), (22, -0.08), (23, 0.071), (24, -0.025), (25, -0.07), (26, -0.062), (27, 0.014), (28, -0.005), (29, 0.013), (30, -0.036), (31, -0.083), (32, -0.029), (33, -0.024), (34, 0.009), (35, -0.034), (36, -0.044), (37, -0.115), (38, 0.202), (39, -0.018), (40, -0.279), (41, -0.07), (42, -0.084), (43, 0.06), (44, 0.088), (45, -0.229), (46, 0.016), (47, -0.095), (48, -0.191), (49, -0.107)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97804415 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

Author: Shuguang Li ; Suresh Manandhar

2 0.88432735 25 acl-2011-A Simple Measure to Assess Non-response

Author: Anselmo Penas ; Alvaro Rodrigo

3 0.77019638 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives

Author: Guangyou Zhou ; Li Cai ; Jun Zhao ; Kang Liu

4 0.52983433 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

Author: Hajime Morita ; Tetsuya Sakai ; Manabu Okumura

Abstract: We propose a new method for query-oriented extractive multi-document summarization. To enrich the information need representation of a given query, we build a co-occurrence graph to obtain words that augment the original query terms. We then formulate the summarization problem as a Maximum Coverage Problem with Knapsack Constraints based on word pairs rather than single words. Our experiments with the NTCIR ACLIA question answering test collections show that our method achieves a pyramid F3-score of up to 0.3 13, a 36% improvement over a baseline using Maximal Marginal Relevance. 1

5 0.51284617 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes

Author: Bo Pang ; Ravi Kumar

6 0.50593656 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments

7 0.49496731 200 acl-2011-Learning Dependency-Based Compositional Semantics

8 0.45566872 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

9 0.42665821 19 acl-2011-A Mobile Touchable Application for Online Topic Graph Extraction and Exploration of Web Content

10 0.42375919 120 acl-2011-Even the Abstract have Color: Consensus in Word-Colour Associations

11 0.40732104 109 acl-2011-Effective Measures of Domain Similarity for Parsing

12 0.3983106 161 acl-2011-Identifying Word Translations from Comparable Corpora Using Latent Topic Models

13 0.38920498 18 acl-2011-A Latent Topic Extracting Method based on Events in a Document and its Application

14 0.38742381 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing

15 0.3829802 89 acl-2011-Creative Language Retrieval: A Robust Hybrid of Information Retrieval and Linguistic Creativity

16 0.379906 305 acl-2011-Topical Keyphrase Extraction from Twitter

17 0.36201516 248 acl-2011-Predicting Clicks in a Vocabulary Learning System

18 0.3593992 256 acl-2011-Query Weighting for Ranking Model Adaptation

19 0.35382482 115 acl-2011-Engkoo: Mining the Web for Language Learning

20 0.34664482 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.027), (17, 0.063), (26, 0.044), (37, 0.069), (39, 0.047), (41, 0.047), (55, 0.066), (59, 0.039), (72, 0.027), (76, 0.204), (91, 0.036), (96, 0.225), (98, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89852035 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

Author: Gisle Ytrestl

Abstract: This paper describes a backtracking strategy for an incremental deterministic transitionbased parser for HPSG. The method could theoretically be implemented on any other transition-based parser with some adjustments. In this paper, the algorithm is evaluated on CuteForce, an efficient deterministic shiftreduce HPSG parser. The backtracking strategy may serve to improve existing parsers, or to assess if a deterministic parser would benefit from backtracking as a strategy to improve parsing.

2 0.89478505 255 acl-2011-Query Snowball: A Co-occurrence-based Approach to Multi-document Summarization for Question Answering

Author: Hajime Morita ; Tetsuya Sakai ; Manabu Okumura

3 0.85923719 175 acl-2011-Integrating history-length interpolation and classes in language modeling

Author: Hinrich Schutze

Abstract: Building on earlier work that integrates different factors in language modeling, we view (i) backing off to a shorter history and (ii) class-based generalization as two complementary mechanisms of using a larger equivalence class for prediction when the default equivalence class is too small for reliable estimation. This view entails that the classes in a language model should be learned from rare events only and should be preferably applied to rare events. We construct such a model and show that both training on rare events and preferable application to rare events improve perplexity when compared to a simple direct interpolation of class-based with standard language models.

same-paper 4 0.85278994 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

Author: Shuguang Li ; Suresh Manandhar

5 0.78363872 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.

6 0.78172481 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

7 0.78050292 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

8 0.7784586 207 acl-2011-Learning to Win by Reading Manuals in a Monte-Carlo Framework

9 0.7777797 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

10 0.77646279 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

11 0.77543771 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

12 0.77506119 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

13 0.77344531 163 acl-2011-Improved Modeling of Out-Of-Vocabulary Words Using Morphological Classes

14 0.77287054 193 acl-2011-Language-independent compound splitting with morphological operations

15 0.77277148 187 acl-2011-Jointly Learning to Extract and Compress

16 0.77237707 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation

17 0.77206898 177 acl-2011-Interactive Group Suggesting for Twitter

18 0.77125752 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

19 0.77095687 258 acl-2011-Ranking Class Labels Using Query Sessions

20 0.77019805 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization