acl acl2010 acl2010-28 knowledge-graph by maker-knowledge-mining

28 acl-2010-An Entity-Level Approach to Information Extraction

Source: pdf

Author: Aria Haghighi ; Dan Klein

Abstract: We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. [sent-3, score-0.475]

2 Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. [sent-4, score-0.402]

3 On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%. [sent-5, score-0.404]

4 1 Introduction Template-filling information extraction (IE) systems must merge information across multiple sentences to identify all role fillers of interest. [sent-6, score-0.222]

5 For instance, in the MUC4 terrorism event extrac- tion task, the entity filling the individual perpetrator role often occurs multiple times, variously as proper, nominal, or pronominal mentions. [sent-7, score-0.579]

6 However, most template-filling systems (Freitag and McCallum, 2000; Patwardhan and Riloff, 2007) assign roles to individual textual mentions using only local context as evidence, leaving aggregation for post-processing. [sent-8, score-0.558]

7 While prior work has acknowledged that coreference resolution and discourse analysis are integral to accurate role identification, to our knowledge no model has been proposed which jointly models these phenomena. [sent-9, score-0.403]

8 Our model jointly merges surface mentions into underlying entities (coreference resolution) and assigns roles to those discovered entities. [sent-11, score-0.821]

9 In the generative process proposed here, document entities are generated for each template role, along with a set of non-template entities. [sent-12, score-0.47]

10 These entities then generate mentions in a process sensitive to both lexical and structural properties of the mention. [sent-13, score-0.586]

11 o ild not Figure 1: Example of the corporate acquisitions role-filling task. [sent-20, score-0.273]

12 In (a), an example template specifying the entities playing each domain role. [sent-21, score-0.353]

13 In (b), an example document with coreferent mentions sharing the same role label. [sent-22, score-0.628]

14 Note that pronoun mentions provide direct clues to entity roles. [sent-23, score-0.655]

15 can naturally incorporate unannotated data, which further increases accuracy. [sent-24, score-0.13]

16 2 Problem Setting Figure 1(a) shows an example template-filling task from the corporate acquisitions domain (Freitag, 1998). [sent-25, score-0.273]

17 1 We have a template of K roles (PURCHASER, AMOUNT, etc. [sent-26, score-0.345]

18 ) and we must identify which entity (if any) fills each role (CSR Limited, etc. [sent-27, score-0.412]

19 Often such problems are modeled at the mention level, directly labeling individual mentions as in Figure 1(b). [sent-29, score-0.818]

20 Indeed, in this data set, the mention-level perspective is evident in the gold annotations, which ignore pronominal references. [sent-30, score-0.146]

21 However, roles in this domain appear in several locations throughout the document, with pronominal mentions often carrying the critical information for template filling. [sent-31, score-0.845]

22 Therefore, Section 3 presents a model in which entities are explicitly modeled, naturally merging information across all mention types and explicitly representing latent structure very much like the entity-level template structure from Figure 1(a). [sent-32, score-0.853]

23 Also we ignore the status field as it doesn’t apply to entities and its meaning is not consistent. [sent-34, score-0.185]

24 c C2o0n1f0er Aenscseoc Sihatoirotn P faopre Crso,m papguetsat 2io9n1a–l2 L9i5n,guistics Purchaser Role Role Entity Parameters Other Entity Parameters Figure 2: Graphical model depiction of our generative model described in Section 3. [sent-37, score-0.134]

25 3 Model We describe our generative model for a document, which has many similarities to the coreferenceonly model of Haghighi and Klein (2010), but which integrally models template role-fillers. [sent-39, score-0.34]

26 Mentions: A mention is an observed textual reference to a latent real-world entity. [sent-41, score-0.466]

27 Mentions are associated with nodes in a parse tree and are typically realized as NPs. [sent-42, score-0.113]

28 There are three basic forms of mentions: proper (NAM), nominal (NOM), and pronominal (PRO). [sent-43, score-0.218]

29 Each mention M is represented as collection of key-value pairs. [sent-44, score-0.383]

30 Mention types are trivially determined from mention head POS tag. [sent-48, score-0.435]

31 Entities: An entity is a specific individual or object in the world. [sent-50, score-0.221]

32 Where a mention has a single word for each property, an entity has a list of signature words. [sent-52, score-0.604]

33 Formally, entities are mappings from properties r ∈ R to lists Lr of “canonical” words which that entity uses sftosr Lthat property. [sent-53, score-0.453]

34 Our model performs role-filling by assuming that each entity is drawn from an underlying role. [sent-55, score-0.37]

35 These roles include the K template roles as well as ‘junk’ roles to represent entities which do not fill a template role (see Section 5. [sent-56, score-1.198]

36 Each role R is represented as a mapping between properties r and pairs of multinomials (θr, fr). [sent-58, score-0.287]

37 θr is a unigram distribution of words for property r that are semantically licensed for the role (e. [sent-59, score-0.298]

38 fr is a “fertility” distribution over the integers that characterizes entity list lengths. [sent-62, score-0.296]

39 Together, these distributions control the lists Lr for entities which instantiate the role. [sent-63, score-0.218]

40 We temporarily assume that all mentions belong to a template role-filling entity; we lift this restriction in Section 5. [sent-65, score-0.598]

41 First, a semantic component generates a sequence of entities E = (E1, . [sent-67, score-0.191]

42 , EK), where each Ei is generated from a corresponding role Ri. [sent-70, score-0.191]

43 , RK) to denote the vector of template role parameters. [sent-74, score-0.397]

44 Note that this work assumes that there is a one-to-one mapping between entities and roles; in particular, at most one entity can fill each role. [sent-75, score-0.399]

45 Once entities have been generated, a discourse component generates which entities will be evoked in each of the n mention positions. [sent-77, score-0.721]

46 We represent these choices using entity indicators denoted by Z = (Z1, . [sent-78, score-0.249]

47 , K indicating the entity number (and thereby the role) underlying the ith mention position. [sent-86, score-0.672]

48 Finally, a mention generation component renders each mention conditioned on the underlying entity and role. [sent-87, score-1.128]

49 The antecedent position is selected according to the distribution, P(j0 |j) ∝ exp{−γTREEDIST(j0, j)} where TREEDIST(j0,j) represents the tree distance between the parse nodes for Mj Mass is and Mj0. [sent-92, score-0.139]

50 4 2There is one exception: the sizes of the proper and nominal head property lists are jointly generated, but their word lists are still independently populated. [sent-93, score-0.353]

51 4Sentence parse trees are merged into a right-branching document parse tree. [sent-95, score-0.141]

52 restricted to antecedent mention positions j0 which occur earlier in the same sentence or in the previous sentence. [sent-97, score-0.436]

53 3 Mention Generation Once the entity indicator has been drawn, we generate words associated with mention conditioned on the underlying entity E and role R. [sent-99, score-1.194]

54 For each mention property r associated with the mention, a word w is drawn utilizing E’s word list Lr as well as the multinomials (fr, θr) from role R. [sent-100, score-0.771]

55 The word w is drawn according to, P(w|E,R)=(1 − αr)1le[wn( ∈L Lrr)]+ αrP(w|θr) For each property r, there is a hyper-parameter αr which interpolates between selecting a word uniformly from the entity list Lr and drawing from the underlying role distribution θr. [sent-101, score-0.637]

56 Intuitively, a small αr indicates that an entity prefers to re-use a small number of words for property r. [sent-102, score-0.292]

57 This is typically the case for proper and nominal heads as well as modifiers. [sent-103, score-0.139]

58 At the other extreme, setting αr to 1 indicates the property isn’t particular to the entity itself, but rather always drawn from the underlying role distribution. [sent-104, score-0.638]

59 4 Learning and Inference Since we will make use of unannotated data (see Section 5), we utilize a variational EM algorithm to learn parameters R and φ. [sent-106, score-0.215]

60 MWe, approximate it using a surrogate variational distribution of the following factored form: Q(E,Z) = iY=K1qi(Ei)! [sent-108, score-0.179]

61 jY=n1rj(Zj) Each rj (Zj) is a distribution over the entity indicator for mention Mj, which approximates the true posterior of Zj. [sent-109, score-0.986]

62 Similarly, qi (Ei) approximates the posterior over entity Ei which is associated with role Ri. [sent-110, score-0.631]

63 As is standard, we iteratively update each component distribution to minimize KL-divergence, fixing all other distributions: qi← argqmiinKL(Q(E,Z)|P(E,Z|M,R,φ) ∝ ∝ exp{EQ/qi ln P(E, Z|M, R, φ))} exp{E 5The sole parameter γ is fixed 293 at 0. [sent-111, score-0.269]

64 8 Table 1: Results on corporate acquisition tasks with given role mention boundaries. [sent-122, score-0.705]

65 We report mention role accuracy and entity role accuracy (correctly labeling all entity mentions). [sent-123, score-1.25]

66 The update for variational entity distribution is given by: ln qi(ei) ∝ EQ/qi ln P(E, Z, M|R, φ) ∝ E{rj}lnP(ei|Ri)j:ZYj=iP(Mj|ei,Ri) = lnP(ei|Ri) +Xrj(i)lnP(Mj|ei,Ri) Xj It is intractable to enumerate all possible entities ei (each consisting of several sets of words). [sent-125, score-0.994]

67 We obtain entity samples by sampling mention entity indicators according to rj. [sent-127, score-0.853]

68 For a given sample, we assume that Ei consists of the non-pronominal head words and modifiers of mentions such that Zj has sampled value i. [sent-128, score-0.444]

69 During the E-Step, we perform 5 iterations of updating each variational factor, which results in an approximate posterior distribution. [sent-129, score-0.17]

70 The role parameters Ri are computed from the qi(ei) and rj (z) distributions, and the global role prior φ from the non-pronominal components of rj (z). [sent-131, score-0.78]

71 5 Experiments We present results on the corporate acquisitions task, which consists of 600 annotated documents split into a 300/300 train/test split. [sent-132, score-0.302]

72 documents, proper and (usually) nominal mentions are annotated with roles, while pronouns are not. [sent-135, score-0.502]

73 , 2006), and extract mention properties from parse trees and the Stanford Dependency Extractor (de Marneffe et al. [sent-137, score-0.478]

74 1 Gold Role Boundaries We first consider the simplified task where role mention boundaries are given. [sent-140, score-0.619]

75 We map each labeled token span in training and test data to a parse tree node that shares the same head. [sent-141, score-0.114]

76 In this setting, the role-filling task is a collective classifica- tion problem, since we know each mention is filling some role. [sent-142, score-0.414]

77 It uses features as similar as possible to the generative model (and more), including the head word, typed dependencies of the head, various tree features, governing word, and several conjunctions of these features as well as coarser versions of lexicalized features. [sent-144, score-0.225]

78 The primary difficulty in classification is the disambiguation amongst the acquired, seller, and purchaser roles, which have similar internal structure, and differ primarily in their semantic contexts. [sent-147, score-0.152]

79 Our entity-centered model, JOINT in Table 1, has no latent variables at training time in this setting, since each role maps to a unique entity. [sent-148, score-0.247]

80 7 During development, we noted that often the most direct evidence of the role of an entity was associated with pronoun usage (see the first “it” in Figure 1). [sent-151, score-0.481]

81 Training our model with pronominal mentions, whose roles are latent variables at training time, improves accuracy to 5. [sent-152, score-0.334]

82 8 Full Task We now consider the more difficult setting where role mention boundaries time. [sent-155, score-0.656]

83 In this setting, are not provided at test we automatically extract mentions from a parse tree using a heuristic ap7We use the mode of the variational posteriors rj (Zj) to make predictions (see Section 4). [sent-156, score-0.841]

84 Systems must determine which mentions are template role-fillers as well as label them. [sent-165, score-0.598]

85 ROLE ID only evaluates the binary decision of whether a mention is a template role-filler or not. [sent-166, score-0.589]

86 Our BEST system, see Section 5, adds extra unannotated data to our JOINT+PRO system. [sent-168, score-0.1]

87 Our mention extraction procedure yields 95% recall over annotated role mentions and 45% precision. [sent-170, score-1.06]

88 9 Using extracted mentions as input, our task is to label some subset of the mentions with template roles. [sent-171, score-0.99]

89 Since systems can label mentions as non-role bearing, only recall is critical to mention extraction. [sent-172, score-0.775]

90 The baseline then classifies mentions which pass this first phase as before. [sent-174, score-0.425]

91 We add ‘junk’ roles to our model to flexibly model entities that do not correspond to annotated template roles. [sent-175, score-0.554]

92 During training, extracted mentions which are not matched in the labeled data have posteriors which are constrained to be amongst the ‘junk’ roles. [sent-176, score-0.523]

93 We first evaluate role identification (ROLE ID in Table 2), the task of identifying mentions which play some role in the template. [sent-177, score-0.774]

94 On the task of identifying and correctly labeling role mentions, our model outperforms INDEP as well (OVERALL in Table 2). [sent-182, score-0.265]

95 As our model is generative, it is straightforward to utilize totally unannotated data. [sent-183, score-0.131]

96 We added 700 fully unannotated documents from the mergers and acquisitions portion of the Reuters 21857 corpus. [sent-184, score-0.299]

97 9Following Patwardhan and Riloff (2009), we match extracted mentions to labeled spans if the head of the mention matches the labeled span. [sent-190, score-0.883]

98 6 Conclusion We have presented a joint generative model of coreference resolution and role-filling information extraction. [sent-192, score-0.279]

99 This model makes role decisions at the entity, rather than at the mention level. [sent-193, score-0.605]

100 This approach naturally aggregates information across multiple mentions, incorporates unannotated data, and yields strong performance. [sent-194, score-0.193]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mentions', 0.392), ('mention', 0.383), ('entity', 0.221), ('template', 0.206), ('rj', 0.199), ('role', 0.191), ('zj', 0.171), ('mj', 0.148), ('entities', 0.147), ('ln', 0.144), ('acquisitions', 0.142), ('ei', 0.142), ('roles', 0.139), ('corporate', 0.131), ('variational', 0.115), ('indep', 0.114), ('pronominal', 0.108), ('pro', 0.106), ('patwardhan', 0.104), ('unannotated', 0.1), ('qi', 0.099), ('junk', 0.098), ('purchaser', 0.098), ('freitag', 0.088), ('lr', 0.082), ('coreference', 0.076), ('haghighi', 0.073), ('generative', 0.072), ('property', 0.071), ('underlying', 0.068), ('eqz', 0.065), ('lnp', 0.065), ('rz', 0.065), ('treedist', 0.065), ('yields', 0.063), ('nominal', 0.063), ('resolution', 0.061), ('latent', 0.056), ('posterior', 0.055), ('indicator', 0.054), ('amongst', 0.054), ('antecedent', 0.053), ('dayne', 0.052), ('head', 0.052), ('drawn', 0.05), ('multinomials', 0.049), ('posteriors', 0.049), ('parse', 0.048), ('proper', 0.047), ('properties', 0.047), ('ez', 0.046), ('exp', 0.046), ('document', 0.045), ('update', 0.045), ('boundaries', 0.045), ('component', 0.044), ('jointly', 0.044), ('labeling', 0.043), ('rke', 0.042), ('pronoun', 0.042), ('ley', 0.041), ('uc', 0.041), ('acquired', 0.04), ('fr', 0.039), ('joint', 0.039), ('approximates', 0.038), ('rp', 0.038), ('tree', 0.038), ('lists', 0.038), ('ignore', 0.038), ('setting', 0.037), ('berkeley', 0.037), ('ri', 0.036), ('distribution', 0.036), ('klein', 0.036), ('marneffe', 0.035), ('classifies', 0.033), ('distributions', 0.033), ('typed', 0.032), ('petrov', 0.032), ('riloff', 0.032), ('omit', 0.032), ('fill', 0.031), ('extraction', 0.031), ('model', 0.031), ('filling', 0.031), ('concrete', 0.03), ('naturally', 0.03), ('conditioned', 0.029), ('heads', 0.029), ('documents', 0.029), ('mergers', 0.028), ('csr', 0.028), ('quires', 0.028), ('surrogate', 0.028), ('variously', 0.028), ('labeled', 0.028), ('indicators', 0.028), ('textual', 0.027), ('associated', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 28 acl-2010-An Entity-Level Approach to Information Extraction

Author: Aria Haghighi ; Dan Klein

2 0.24104145 233 acl-2010-The Same-Head Heuristic for Coreference

Author: Micha Elsner ; Eugene Charniak

Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.

3 0.22882025 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

Author: Cosmin Bejan ; Sanda Harabagiu

Abstract: This paper examines how a new class of nonparametric Bayesian models can be effectively applied to an open-domain event coreference task. Designed with the purpose of clustering complex linguistic objects, these models consider a potentially infinite number of features and categorical outcomes. The evaluation performed for solving both within- and cross-document event coreference shows significant improvements of the models when compared against two baselines for this task.

4 0.21456851 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

Author: Marta Recasens ; Eduard Hovy

Abstract: This paper explores the effect that different corpus configurations have on the performance of a coreference resolution system, as measured by MUC, B3, and CEAF. By varying separately three parameters (language, annotation scheme, and preprocessing information) and applying the same coreference resolution system, the strong bonds between system and corpus are demonstrated. The experiments reveal problems in coreference resolution evaluation relating to task definition, coding schemes, and features. They also ex- pose systematic biases in the coreference evaluation metrics. We show that system comparison is only possible when corpus parameters are in exact agreement.

5 0.14105923 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

Author: Peng Li ; Jing Jiang ; Yinglin Wang

Abstract: In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.

6 0.13292567 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

7 0.12047432 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

8 0.11506322 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

9 0.10020723 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

10 0.10018893 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

11 0.09062323 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

12 0.088747621 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

13 0.080123156 73 acl-2010-Coreference Resolution with Reconcile

14 0.080051728 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue

15 0.07892888 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

16 0.075172096 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

17 0.07219816 238 acl-2010-Towards Open-Domain Semantic Role Labeling

18 0.071185432 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

19 0.07093136 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

20 0.0701041 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.202), (1, 0.079), (2, 0.046), (3, -0.146), (4, -0.092), (5, 0.215), (6, 0.002), (7, -0.037), (8, 0.069), (9, 0.045), (10, -0.028), (11, -0.085), (12, 0.011), (13, -0.119), (14, 0.064), (15, -0.03), (16, 0.003), (17, 0.012), (18, -0.002), (19, -0.063), (20, -0.013), (21, -0.017), (22, -0.075), (23, -0.021), (24, -0.036), (25, -0.037), (26, 0.002), (27, 0.032), (28, 0.112), (29, -0.116), (30, 0.157), (31, 0.045), (32, 0.039), (33, 0.039), (34, 0.009), (35, 0.011), (36, 0.01), (37, 0.063), (38, 0.075), (39, -0.033), (40, -0.047), (41, 0.084), (42, -0.111), (43, -0.016), (44, -0.049), (45, 0.075), (46, 0.08), (47, 0.065), (48, 0.124), (49, 0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96529704 28 acl-2010-An Entity-Level Approach to Information Extraction

Author: Aria Haghighi ; Dan Klein

2 0.75409943 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

Author: Cosmin Bejan ; Sanda Harabagiu

3 0.70305121 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

Author: Marta Recasens ; Eduard Hovy

4 0.65448076 233 acl-2010-The Same-Head Heuristic for Coreference

Author: Micha Elsner ; Eugene Charniak

5 0.57056457 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

6 0.54888457 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

7 0.54773915 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

8 0.51315343 73 acl-2010-Coreference Resolution with Reconcile

9 0.50378162 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

10 0.43088728 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

11 0.4256618 248 acl-2010-Unsupervised Ontology Induction from Text

12 0.42393216 63 acl-2010-Comparable Entity Mining from Comparative Questions

13 0.41039363 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

14 0.40014133 139 acl-2010-Identifying Generic Noun Phrases

15 0.3844125 111 acl-2010-Extracting Sequences from the Web

16 0.37597582 32 acl-2010-Arabic Named Entity Recognition: Using Features Extracted from Noisy Data

17 0.3718808 106 acl-2010-Event-Based Hyperspace Analogue to Language for Query Expansion

18 0.36244529 165 acl-2010-Learning Script Knowledge with Web Experiments

19 0.35644302 162 acl-2010-Learning Common Grammar from Multilingual Corpus

20 0.35178742 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.539), (42, 0.024), (59, 0.07), (73, 0.029), (78, 0.03), (83, 0.095), (84, 0.018), (98, 0.095)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97063124 224 acl-2010-Talking NPCs in a Virtual Game World

Author: Tina Kluwer ; Peter Adolphs ; Feiyu Xu ; Hans Uszkoreit ; Xiwen Cheng

Abstract: This paper describes the KomParse system, a natural-language dialog system in the three-dimensional virtual world Twinity. In order to fulfill the various communication demands between nonplayer characters (NPCs) and users in such an online virtual world, the system realizes a flexible and hybrid approach combining knowledge-intensive domainspecific question answering, task-specific and domain-specific dialog with robust chatbot-like chitchat.

2 0.95310915 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

same-paper 3 0.94627637 28 acl-2010-An Entity-Level Approach to Information Extraction

Author: Aria Haghighi ; Dan Klein

4 0.86933714 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

5 0.817496 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

6 0.80600554 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

7 0.80097359 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

8 0.62771749 71 acl-2010-Convolution Kernel over Packed Parse Forest

9 0.62355316 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

10 0.61683643 169 acl-2010-Learning to Translate with Source and Target Syntax

11 0.61664557 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar

12 0.61653066 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

13 0.60379958 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

14 0.59825236 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

15 0.58780813 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

16 0.58743894 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

17 0.58084393 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

18 0.5744251 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

19 0.57143098 121 acl-2010-Generating Entailment Rules from FrameNet

20 0.56997091 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection