emnlp emnlp2012 emnlp2012-41 knowledge-graph by maker-knowledge-mining

41 emnlp-2012-Entity based QA Retrieval

Source: pdf

Author: Amit Singh

Abstract: Bridging the lexical gap between the user’s question and the question-answer pairs in the Q&A; archives has been a major challenge for Q&A; retrieval. State-of-the-art approaches address this issue by implicitly expanding the queries with additional words using statistical translation models. While useful, the effectiveness of these models is highly dependant on the availability of quality corpus in the absence of which they are troubled by noise issues. Moreover these models perform word based expansion in a context agnostic manner resulting in translation that might be mixed and fairly general. This results in degraded retrieval performance. In this work we address the above issues by extending the lexical word based translation model to incorporate semantic concepts (entities). We explore strategies to learn the translation probabilities between words and the concepts using the Q&A; archives and a popular entity catalog. Experiments conducted on a large scale real data show that the proposed techniques are promising.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Entity based Q&A; retrieval Amit Singh IBM Research Bangalore, India ami s ing3 @ in . [sent-1, score-0.063]

2 com Abstract Bridging the lexical gap between the user’s question and the question-answer pairs in the Q&A; archives has been a major challenge for Q&A; retrieval. [sent-3, score-0.131]

3 State-of-the-art approaches address this issue by implicitly expanding the queries with additional words using statistical translation models. [sent-4, score-0.169]

4 While useful, the effectiveness of these models is highly dependant on the availability of quality corpus in the absence of which they are troubled by noise issues. [sent-5, score-0.074]

5 Moreover these models perform word based expansion in a context agnostic manner resulting in translation that might be mixed and fairly general. [sent-6, score-0.191]

6 In this work we address the above issues by extending the lexical word based translation model to incorporate semantic concepts (entities). [sent-8, score-0.096]

7 We explore strategies to learn the translation probabilities between words and the concepts using the Q&A; archives and a popular entity catalog. [sent-9, score-0.313]

8 1 Introduction Over the past few years community-based question answering (CQA) portals like Naver, Yahoo! [sent-11, score-0.155]

9 These portals foster collaborative creation of content by allowing the users to both submit questions to be answered and answer 1266 questions asked by other users. [sent-14, score-0.24]

10 These portals aim to provide highly focused access to this information by directly returning pertinent question and answer (Q&A;) pairs to the users questions, instead of a long list of ranked URLs. [sent-15, score-0.247]

11 This is in noted contrast to the usual search paradigm, where the question is used to search the database of potential answers, in this case the question is used to search the database of previous questions, which in turn are associated with answers. [sent-16, score-0.17]

12 This involves addressing the word mis- match problem between the users question and the question-answer pairs in the archive. [sent-17, score-0.16]

13 Researchers have proposed the use of translation models (Berger and Lafferty, 1999; Jeon et al. [sent-19, score-0.096]

14 As a principled approach to capturing semantic word relations, statistical translation language models are built by using the IBM model 1 (Brown et al. [sent-22, score-0.096]

15 , 1993) and have been shown to outperform traditional document language models on Q&A; retrieval task. [sent-23, score-0.107]

16 The basic idea is to estimate the likelihood of translating a document1 to a query by exploiting the dependencies that exists between query words and document words. [sent-24, score-0.268]

17 For example the document containing the word Whee z ing may well answer the question containing the term Asthma. [sent-25, score-0.184]

18 They learn the these dependencies (encoded as translation probabilities) between words using parallel mono-lingual corpora created from the Q&A; pairs. [sent-26, score-0.139]

19 While useful, the effectiveness of these models is highly dependant on the availability of quality corpus (Lee et al. [sent-27, score-0.074]

20 Lc a2n0g1u2ag Aes Psorcoicaetsiosin fgo arn Cdo Cmopmutpauti oantiaoln Lailn Ngautiustriacls Figure 1: Need for entity based expansions 2008). [sent-30, score-0.184]

21 Also these models only capture shallow semantics between words via the co-occurrence statistics, while some of the more explicit relationships between words and entities is freely available externally. [sent-31, score-0.16]

22 , 2007) is another very common criticism hailed on translation models as it results in noisy and generic translations. [sent-33, score-0.096]

23 Specifically, the word Bl i zard can z refer to an American game development company that develops World of Warcraft game or it could refer to a severe snowstorm. [sent-35, score-0.066]

24 Expanding query without taking the gaming context established by the word WOW (acronym for World of Warcraft) into account would lead to topic drift. [sent-36, score-0.112]

25 In this paper we argue that solution to all the above problems lies in a unified model in which entities are a primary citizen. [sent-38, score-0.105]

26 The guiding hypothesis being, an entity based representation provides a less ambiguous representation of the users question and provides for a more semantically accurate expansion if the relationship between entities and words can be estimated more reliably. [sent-39, score-0.412]

27 We propose Entity based Translation Language 1267 Model (ETLM) for Q&A; retrieval that accommodates semantic information associated between entities and words. [sent-41, score-0.168]

28 Specifically it provides for context aware expansions of the query by exploiting entity annotations on both, the document and the query side. [sent-43, score-0.517]

29 Entity annotations also provide a means to handle the “many-to-one” (Moore, 2004) translation limitation in the IBM model, due to which each word in the target document can be generated by at most one word in the question2. [sent-44, score-0.205]

30 For the same reasons, it also alleviates another related limitation by enabling translation between contiguous words across the query and documents (Moore, 2004). [sent-45, score-0.208]

31 We learn relationships between entities and terms by proposing new ways of organizing monolingual parallel corpus and simultaneously leveraging external resources like Wikipedia from which one can derive these relationships reliably. [sent-47, score-0.258]

32 This helps alleviate the noise problem associated with learning translation models on Q&A; archive described above. [sent-48, score-0.096]

33 An important point to note is that, our technique has merits independent to the choice of the entity catalog. [sent-49, score-0.145]

34 In this work we use 2entity mentions can be of more than unit word length Wikipedia, as it is a popular choice due to its large and ever expanding coverage and its ability to keep up with world events on a timely basis. [sent-50, score-0.104]

35 We provide detailed evaluation of impact of modelling assumptions and model components on retrieval performance on a large scale real data from Yahoo Answers comprising ∼5 milldioatna Q&A; pairs. [sent-52, score-0.09]

36 This is followed by Section 3 which gives the details of entity annotators and its performance. [sent-54, score-0.178]

37 Section 4 describes our experiments on the retrieval method used Q&A; retrieval. [sent-55, score-0.063]

38 Here di refers to the i-th Q&A; data consisting of a question qi and its answer ai. [sent-62, score-0.178]

39 Given the user question quser, the task of Q&A; retrieval is to rank di according to score(quser, di). [sent-63, score-0.237]

40 Offline processing: Using the entity catalog E, we learn the entity annotation models EAoffline and EAonline for annotation of entities in the Q&A; corpus and the query respectively. [sent-65, score-0.633]

41 For each di ∈ D, we then annotate references to entities in Wikipedia using EAoffline re- 1268 sulting in annotated Q&A; corpus C. [sent-67, score-0.143]

42 We then compute relationships between entities and words using C and E. [sent-68, score-0.16]

43 Online processing: At runtime, annotate the user query quser with entities using EAonlineto create an enriched question q. [sent-70, score-0.434]

44 Issue this query over the annotated corpus C and rank the candidates as per the ETLM model described below. [sent-71, score-0.112]

45 1 ETLM Model Let the annotated query q (and similarly annotated Q&A; pair d) be composed of sequence of token spans Tq (and Td). [sent-73, score-0.198]

46 Each token span tq (similarly td) corresponds to sequence of contiguous words occurring in the running text. [sent-74, score-0.555]

47 These tq’s can correspond to entity mentions, phrases or words. [sent-75, score-0.145]

48 Let eq denote the tokens spans that are annotated and neq that are not (Tq = eq ∪ neq). [sent-76, score-0.306]

49 For example, in the query , W|h {azt }|i{zs}|{az}|Quadrati{cz Formula}? [sent-77, score-0.112]

50 , n {ezq token sp|an {z n{ezq} n{ezq} {eqz zQ }u|a{dzr}a|{tz i} c| Formula{z is linked to} an entity corresponding to |Quadratic E{qzuation3, w}hile all other token spans are marked as neq . [sent-78, score-0.445]

51 ∀tq ∈ q; tq ∈ EU, where EU is the upnosiveedrs oafl ese it. [sent-85, score-0.465]

52 This is because when the document was created, each and every td ∈ d had a sense attached to it. [sent-88, score-0.239]

53 org/wiki/Quadratic equation 4its not a restriction as the model is valid for neq consisting of more than one word. [sent-93, score-0.21]

54 T(tq |td) in Equation 1 denotes the probability that a t|otken span tq is the translation of token span td. [sent-95, score-0.689]

55 The key task is to estimate Pml (tq |C), T(tq |td) and Pml (td|d); tq ∈ eq ∪ neq and td ∈ ed ∪, T ned 2. [sent-97, score-0.877]

56 As the name suggests, ETLMqa is estimated from Q&A; data (C and D) while we leverage the entity catalog (in our case it is Wikipedia) for ETLMwiki. [sent-99, score-0.236]

57 , 2008) we pool the question and answers from D to create a master parallel corpus P = (q1, a1) , . [sent-102, score-0.199]

58 Wrnein tghe Tn( dneer|ivnee 2 different parallel corpora from P and P∗ as follows Pentity We remove all non linked tokens ne from P∗ thereby reducing it to parallel corpus over e. [sent-108, score-0.269]

59 translation probabilities obertw leeaernni ntwgo T e(ne|teities e and e0 in E. [sent-111, score-0.096]

60 Phybrid This is hybrid of Pentity and P where in one part of Q&A; pair consists on only ne while other consists of only e. [sent-112, score-0.183]

61 To handle entities e, we introduce special id’s in the ne space. [sent-114, score-0.288]

62 Thus our universal token span set is given 6subscript of q and d has been dropped as translation probability learnt agnostic to it, due to pooling. [sent-115, score-0.268]

63 This is done so that T(tq |td) is learnt fbryom V P, Pentity san isd Phybrid, wha/ot any |mtodification to the corresponding translation algorithm (Brown et al. [sent-117, score-0.123]

64 when calculating T(e|e0), we redistribute probability mass spread over eall the ne to e given by Equation 2 and 3. [sent-123, score-0.183]

65 , 2009) to measure semantic relationships between entities and words using Wikipedia. [sent-129, score-0.16]

66 We use co-citation information in Wikipedia to detect relatedness between entities (T(e|e0)) and co-occurrence counts to estimate T(ne|ne0) as follows: . [sent-131, score-0.105]

67 T(e|e0) = T(ne|ne0) = T(ne|e) = T(e|ne) = Pec0 oc(oe(,e 0 0),e0) PPnecf0 (cnfe(,n e0e0,0)ne0) |tPfDn(ee,D)|( +e)+ |V 1 | Pe0∈tEfntfe,nDe(,De)(e+0) 1+ |E| (6) (7) (8) (9) Here d(e) represents Pthe page corresponding to entity e. [sent-132, score-0.172]

68 cf(ne, ne0) is the number context windows of fixed size containing both ne and ne0 in Wikipedia. [sent-134, score-0.183]

69 tft,d(e) is the frequency of t in d(e) ; co(e, e0) indicates number of entities in Wikipedia that have a hyperlink to both e and e0. [sent-136, score-0.105]

70 5 Self translation probability To make sure self translation probability is not underestimated i. [sent-141, score-0.234]

71 Linear interpolation is often the technique of choice in language modelling for combining models to exploit complementary features of the component models. [sent-147, score-0.064]

72 The mixture translation model Tcombo(e|e0) over M component models is given by Equation e10. [sent-151, score-0.096]

73 3 Entity Annotation In this section we describe our entity annotation system. [sent-157, score-0.182]

74 Recently there has been lot of work addressing the problem of annotating text with links to Wikipedia entities (Mihalcea and Csomai, 2007; Bunescu and Pasca, 2006; Milne and Witten, 2008; Kulkarni et al. [sent-158, score-0.177]

75 We adopt a similar approach, wherein we first find the best disambiguation (BESTDISAMBIGUATION) for a given mention and then decide to prune it (PRUNE), via the dummy mapping NA (similar to “no assignment” (Kulkarni et al. [sent-161, score-0.21]

76 1 BESTDISAMBIGUATION As defined earlier, e ∈ E represent an entity corresponding to liUeRr,N e o ∈f a Wikipedia nart eicnlteit. [sent-164, score-0.145]

77 y cLoret- = {em,1, , em,|Em|} Em em,2, · · · em,i ∈ E represent =the seet of possible disambiguations ∈for E a mention m (m is an index over all mentions in the corpus). [sent-165, score-0.144]

78 Given a mention m, task is to find best disambiguation e from Wikipedia. [sent-166, score-0.172]

79 Let φ(m, em,j) represent t Ehe mapping onto features between an entity mention m and the Wikipedia entity em,j and →ω be the corresponding weight vector and D(em,j) = →ω φ(m, em,j) represent the disambiguation score. [sent-168, score-0.462]

80 The task is to learn →ω such that argmax D(em,j) gives the best disambiguation for em ,j the mention m. [sent-169, score-0.257]

81 Note that Equation 11 means pairwise comparison between the correct disambiguation em,∗ and other disambiguation candidates em,j such that j index corresponding to *. [sent-172, score-0.138]

82 2 PRUNE The disambiguation phase produces one candidate disambiguation per mention. [sent-174, score-0.138]

83 To discard any unmeaningful annotations a simple strategy similar to LOCAL (Kulkarni et al. [sent-175, score-0.065]

84 3 FEATUREMAP φ(m, em,j) Sense probability prior (SP): It represents the prior probability that a mention name s points to a specific entity in Wikipedia. [sent-179, score-0.287]

85 For example, without any other information, mention name “tree” will more likely refer to the entity woody plant8, rather than the less 8en. [sent-180, score-0.287]

86 Context specific features: It captures the textual similarity between weighted word vectors corresponding to the context of the mention (window around the mention) and textual description associated with the entity (Wikipedia page). [sent-187, score-0.248]

87 Let EAonline and EAoffline represent configurations for annotating user question and corpus respectively. [sent-188, score-0.173]

88 For EAonline, user question repre- sents the document from which context specific features are computed. [sent-189, score-0.18]

89 For EAoffline, question and the answer(best) is concatenated to represents the document. [sent-190, score-0.085]

90 Based on the “one sense per discourse” assumption, one additional heuristic is used in EAoffline where, for the same Q&A; pair, if same name mention is repeated multiple times across the question and the answer then one with the maximum D(em,∗) > ρna is annotated for all instances. [sent-191, score-0.282]

91 Volunteers were told to be as exhaustive as possible and tag all possible name mentions, even ifto mark them as ”NA”. [sent-197, score-0.069]

92 3K) were made in the question of which 551 were assigned to NA. [sent-207, score-0.085]

93 We do a linear scan of data to identify entity mentions by first tokenizing and then identifying token sequences that maximally match an entity ID in the entity name dictionary (constructed using Wikipedia anchor text, redirect pages). [sent-209, score-0.6]

94 org/wiki/Tree (data structure) Figure 3: Precision v/s Recall annotation set; 2) EAoffline0 is measured only on annotations made in question. [sent-213, score-0.102]

95 this is done to com- pare it with EAonline; 3) EAoffline∗ is similar to (2), only difference is that for (2) entire Q&A; pair is the context, while here only question part is the context. [sent-214, score-0.085]

96 1 Dataset We crawled a dataset of ∼5 million questions and answers flreodm a Y daahtoasoe! [sent-222, score-0.11]

97 In our retrieval experiments we used 339 queries (average length 5. [sent-226, score-0.099]

98 html 1272 We pooled the top 25 Q&A; pairs from retrieval results generated by varying the retrieval algorithms and the search field. [sent-231, score-0.126]

99 Performance of all the translation based models is better than VSM and OKAPI thereby confirming the importance of addressing the lexical gap. [sent-266, score-0.134]

100 Using high confidence annotations for MAP %chg MRR %chg R-Prec %chg Prec@5 %chg Prec@ 10 VSM OKAPI 0. [sent-267, score-0.065]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tq', 0.465), ('eaoffline', 0.297), ('pml', 0.27), ('etlm', 0.243), ('td', 0.195), ('ne', 0.183), ('eaonline', 0.162), ('neq', 0.162), ('entity', 0.145), ('chg', 0.135), ('etlmqa', 0.135), ('wikipedia', 0.12), ('query', 0.112), ('etlmwiki', 0.108), ('entities', 0.105), ('mention', 0.103), ('translation', 0.096), ('ctm', 0.093), ('em', 0.085), ('question', 0.085), ('ezq', 0.081), ('okapi', 0.081), ('pentity', 0.081), ('quser', 0.081), ('tcombo', 0.081), ('warcraft', 0.081), ('answers', 0.071), ('portals', 0.07), ('volunteers', 0.07), ('disambiguation', 0.069), ('na', 0.067), ('annotations', 0.065), ('vsm', 0.063), ('retrieval', 0.063), ('xue', 0.06), ('ep', 0.055), ('answer', 0.055), ('relationships', 0.055), ('agnostic', 0.055), ('eq', 0.055), ('bestdisambiguation', 0.054), ('etlmcombo', 0.054), ('phybrid', 0.054), ('tlm', 0.054), ('translm', 0.054), ('token', 0.052), ('catalog', 0.052), ('kulkarni', 0.052), ('eu', 0.052), ('user', 0.051), ('equation', 0.048), ('archives', 0.046), ('dependant', 0.046), ('jeon', 0.046), ('sp', 0.046), ('document', 0.044), ('parallel', 0.043), ('qn', 0.042), ('self', 0.042), ('mentions', 0.041), ('expansion', 0.04), ('questions', 0.039), ('name', 0.039), ('pthe', 0.039), ('expansions', 0.039), ('prec', 0.039), ('prune', 0.038), ('addressing', 0.038), ('span', 0.038), ('di', 0.038), ('annotation', 0.037), ('users', 0.037), ('expanding', 0.037), ('ibm', 0.037), ('configurations', 0.037), ('interpolation', 0.037), ('berger', 0.036), ('outgoing', 0.036), ('bl', 0.036), ('lets', 0.036), ('mrr', 0.036), ('queries', 0.036), ('links', 0.034), ('spans', 0.034), ('lafferty', 0.034), ('annotators', 0.033), ('anchor', 0.033), ('game', 0.033), ('xm', 0.033), ('ej', 0.031), ('outlines', 0.031), ('exhaustive', 0.03), ('yahoo', 0.03), ('singh', 0.029), ('availability', 0.028), ('page', 0.027), ('learnt', 0.027), ('modelling', 0.027), ('xj', 0.027), ('popular', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 41 emnlp-2012-Entity based QA Retrieval

Author: Amit Singh

2 0.14035179 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

Author: Thomas Lin ; Mausam ; Oren Etzioni

Abstract: Entity linking systems link noun-phrase mentions in text to their corresponding Wikipedia articles. However, NLP applications would gain from the ability to detect and type all entities mentioned in text, including the long tail of entities not prominent enough to have their own Wikipedia articles. In this paper we show that once the Wikipedia entities mentioned in a corpus of textual assertions are linked, this can further enable the detection and fine-grained typing of the unlinkable entities. Our proposed method for detecting unlinkable entities achieves 24% greater accuracy than a Named Entity Recognition baseline, and our method for fine-grained typing is able to propagate over 1,000 types from linked Wikipedia entities to unlinkable entities. Detection and typing of unlinkable entities can increase yield for NLP applications such as typed question answering.

3 0.12303295 19 emnlp-2012-An Entity-Topic Model for Entity Linking

Author: Xianpei Han ; Le Sun

Abstract: Entity Linking (EL) has received considerable attention in recent years. Given many name mentions in a document, the goal of EL is to predict their referent entities in a knowledge base. Traditionally, there have been two distinct directions of EL research: one focusing on the effects of mention’s context compatibility, assuming that “the referent entity of a mention is reflected by its context”; the other dealing with the effects of document’s topic coherence, assuming that “a mention ’s referent entity should be coherent with the document’ ’s main topics”. In this paper, we propose a generative model called entitytopic model, to effectively join the above two complementary directions together. By jointly modeling and exploiting the context compatibility, the topic coherence and the correlation between them, our model can – accurately link all mentions in a document using both the local information (including the words and the mentions in a document) and the global knowledge (including the topic knowledge, the entity context knowledge and the entity name knowledge). Experimental results demonstrate the effectiveness of the proposed model. 1

4 0.11748607 97 emnlp-2012-Natural Language Questions for the Web of Data

Author: Mohamed Yahya ; Klaus Berberich ; Shady Elbassuoni ; Maya Ramanath ; Volker Tresp ; Gerhard Weikum

Abstract: The Linked Data initiative comprises structured databases in the Semantic-Web data model RDF. Exploring this heterogeneous data by structured query languages is tedious and error-prone even for skilled users. To ease the task, this paper presents a methodology for translating natural language questions into structured SPARQL queries over linked-data sources. Our method is based on an integer linear program to solve several disambiguation tasks jointly: the segmentation of questions into phrases; the mapping of phrases to semantic entities, classes, and relations; and the construction of SPARQL triple patterns. Our solution harnesses the rich type system provided by knowledge bases in the web of linked data, to constrain our semantic-coherence objective function. We present experiments on both the . in question translation and the resulting query answering.

5 0.11184468 84 emnlp-2012-Linking Named Entities to Any Database

Author: Avirup Sil ; Ernest Cronin ; Penghai Nie ; Yinfei Yang ; Ana-Maria Popescu ; Alexander Yates

Abstract: Existing techniques for disambiguating named entities in text mostly focus on Wikipedia as a target catalog of entities. Yet for many types of entities, such as restaurants and cult movies, relational databases exist that contain far more extensive information than Wikipedia. This paper introduces a new task, called Open-Database Named-Entity Disambiguation (Open-DB NED), in which a system must be able to resolve named entities to symbols in an arbitrary database, without requiring labeled data for each new database. We introduce two techniques for Open-DB NED, one based on distant supervision and the other based on domain adaptation. In experiments on two domains, one with poor coverage by Wikipedia and the other with near-perfect coverage, our Open-DB NED strategies outperform a state-of-the-art Wikipedia NED system by over 25% in accuracy.

6 0.10588618 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge

7 0.10140036 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

8 0.093266025 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion

9 0.083758228 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

10 0.078706227 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

11 0.076345749 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

12 0.076012067 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

13 0.075309314 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

14 0.075008824 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

15 0.074121825 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation

16 0.063424997 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation

17 0.059381314 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

18 0.058647148 86 emnlp-2012-Locally Training the Log-Linear Model for SMT

19 0.056240153 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic

20 0.053877827 110 emnlp-2012-Reading The Web with Learned Syntactic-Semantic Inference Rules

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.216), (1, 0.143), (2, -0.037), (3, -0.021), (4, -0.088), (5, -0.137), (6, 0.039), (7, 0.1), (8, -0.056), (9, -0.075), (10, 0.102), (11, -0.048), (12, 0.029), (13, -0.075), (14, -0.01), (15, 0.035), (16, 0.224), (17, 0.116), (18, -0.052), (19, 0.023), (20, 0.11), (21, 0.026), (22, -0.089), (23, 0.086), (24, 0.077), (25, 0.137), (26, -0.032), (27, -0.032), (28, -0.134), (29, -0.036), (30, 0.135), (31, 0.007), (32, 0.153), (33, 0.025), (34, -0.017), (35, 0.024), (36, -0.006), (37, -0.012), (38, -0.156), (39, 0.051), (40, 0.014), (41, -0.066), (42, 0.021), (43, -0.017), (44, -0.053), (45, -0.02), (46, 0.022), (47, -0.097), (48, -0.006), (49, 0.089)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95457971 41 emnlp-2012-Entity based QA Retrieval

Author: Amit Singh

2 0.61543596 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

Author: Thomas Lin ; Mausam ; Oren Etzioni

3 0.56713599 97 emnlp-2012-Natural Language Questions for the Web of Data

Author: Mohamed Yahya ; Klaus Berberich ; Shady Elbassuoni ; Maya Ramanath ; Volker Tresp ; Gerhard Weikum

4 0.54058957 19 emnlp-2012-An Entity-Topic Model for Entity Linking

Author: Xianpei Han ; Le Sun

5 0.51713759 78 emnlp-2012-Learning Lexicon Models from Search Logs for Query Expansion

Author: Jianfeng Gao ; Shasha Xie ; Xiaodong He ; Alnur Ali

Abstract: This paper explores log-based query expansion (QE) models for Web search. Three lexicon models are proposed to bridge the lexical gap between Web documents and user queries. These models are trained on pairs of user queries and titles of clicked documents. Evaluations on a real world data set show that the lexicon models, integrated into a ranker-based QE system, not only significantly improve the document retrieval performance but also outperform two state-of-the-art log-based QE methods.

6 0.50549418 84 emnlp-2012-Linking Named Entities to Any Database

7 0.45998448 76 emnlp-2012-Learning-based Multi-Sieve Co-reference Resolution with Knowledge

8 0.45904455 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation

9 0.44788316 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games

10 0.39402005 47 emnlp-2012-Explore Person Specific Evidence in Web Person Name Disambiguation

11 0.34092653 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

12 0.32966402 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models

13 0.31745073 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

14 0.28317931 114 emnlp-2012-Revisiting the Predictability of Language: Response Completion in Social Media

15 0.28253976 86 emnlp-2012-Locally Training the Log-Linear Model for SMT

16 0.28200537 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM

17 0.27629679 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

18 0.26839826 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

19 0.25407749 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

20 0.24760652 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.012), (16, 0.018), (25, 0.017), (34, 0.065), (60, 0.623), (63, 0.035), (65, 0.021), (70, 0.011), (74, 0.021), (76, 0.039), (86, 0.017), (95, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.99075603 68 emnlp-2012-Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation

Author: Wenbin Jiang ; Fandong Meng ; Qun Liu ; Yajuan Lu

Abstract: In this paper we first describe the technology of automatic annotation transformation, which is based on the annotation adaptation algorithm (Jiang et al., 2009). It can automatically transform a human-annotated corpus from one annotation guideline to another. We then propose two optimization strategies, iterative training and predict-selfreestimation, to further improve the accuracy of annotation guideline transformation. Experiments on Chinese word segmentation show that, the iterative training strategy together with predictself reestimation brings significant improvement over the simple annotation transformation baseline, and leads to classifiers with significantly higher accuracy and several times faster processing than annotation adaptation does. On the Penn Chinese Treebank 5.0, , it achieves an F-measure of 98.43%, significantly outperforms previous works although using a single classifier with only local features.

2 0.9889428 58 emnlp-2012-Generalizing Sub-sentential Paraphrase Acquisition across Original Signal Type of Text Pairs

Author: Aurelien Max ; Houda Bouamor ; Anne Vilnat

Abstract: This paper describes a study on the impact of the original signal (text, speech, visual scene, event) of a text pair on the task of both manual and automatic sub-sentential paraphrase acquisition. A corpus of 2,500 annotated sentences in English and French is described, and performance on this corpus is reported for an efficient system combination exploiting a large set of features for paraphrase recognition. A detailed quantified typology of subsentential paraphrases found in our corpus types is given.

3 0.9886899 84 emnlp-2012-Linking Named Entities to Any Database

Author: Avirup Sil ; Ernest Cronin ; Penghai Nie ; Yinfei Yang ; Ana-Maria Popescu ; Alexander Yates

same-paper 4 0.98835188 41 emnlp-2012-Entity based QA Retrieval

Author: Amit Singh

5 0.96524036 61 emnlp-2012-Grounded Models of Semantic Representation

Author: Carina Silberer ; Mirella Lapata

Abstract: A popular tradition of studying semantic representation has been driven by the assumption that word meaning can be learned from the linguistic environment, despite ample evidence suggesting that language is grounded in perception and action. In this paper we present a comparative study of models that represent word meaning based on linguistic and perceptual data. Linguistic information is approximated by naturally occurring corpora and sensorimotor experience by feature norms (i.e., attributes native speakers consider important in describing the meaning of a word). The models differ in terms of the mechanisms by which they integrate the two modalities. Experimental results show that a closer correspondence to human data can be obtained by uncovering latent information shared among the textual and perceptual modalities rather than arriving at semantic knowledge by concatenating the two.

6 0.93530214 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification

7 0.88246042 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

8 0.85286599 39 emnlp-2012-Enlarging Paraphrase Collections through Generalization and Instantiation

9 0.83407968 137 emnlp-2012-Why Question Answering using Sentiment Analysis and Word Classes

10 0.82930464 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing

11 0.82614839 138 emnlp-2012-Wiki-ly Supervised Part-of-Speech Tagging

12 0.82490015 92 emnlp-2012-Multi-Domain Learning: When Do Domains Matter?

13 0.8230443 19 emnlp-2012-An Entity-Topic Model for Entity Linking

14 0.81729591 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

15 0.79923356 135 emnlp-2012-Using Discourse Information for Paraphrase Extraction

16 0.79470545 108 emnlp-2012-Probabilistic Finite State Machines for Regression-based MT Evaluation

17 0.79321039 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents

18 0.79172021 72 emnlp-2012-Joint Inference for Event Timeline Construction

19 0.79170299 13 emnlp-2012-A Unified Approach to Transliteration-based Text Input with Online Spelling Correction

20 0.78580499 128 emnlp-2012-Translation Model Based Cross-Lingual Language Model Adaptation: from Word Models to Phrase Models