acl acl2010 acl2010-101 knowledge-graph by maker-knowledge-mining

101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields


Source: pdf

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Entity-based local coherence modelling using topological fields Jackie Chi Kit Cheung and Gerald Penn Department of Computer Science University of Toronto Toronto, ON, M5S 3G4, Canada { j cheung ,gpenn} @ cs . [sent-1, score-1.208]

2 In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. [sent-4, score-1.112]

3 First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. [sent-5, score-1.532]

4 Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly. [sent-6, score-1.206]

5 1 Introduction One type of coherence modelling that has captured recent research interest is local coherence modelling, which measures the coherence of a document by examining the similarity between neighbouring text spans. [sent-7, score-1.022]

6 The entity-based approach, in particular, considers the occurrences of noun phrase entities in a document (Barzilay and Lapata, 2008). [sent-8, score-0.238]

7 Local coherence modelling has been shown to be useful for tasks like natural language generation and summarization, (Barzilay and Lee, 2004) and genre classification (Barzilay and Lapata, 2008). [sent-9, score-0.36]

8 Previous work on English, a language with relatively fixed word order, has identified factors that contribute to local coherence, such as the grammatical roles associated with the entities. [sent-10, score-0.27]

9 For instance, freerword-order languages exhibit word order patterns which are dependent on discourse factors relating to information structure, in addition to the grammatical roles of nominal arguments of the main verb. [sent-12, score-0.221]

10 We thus expect word order information to be particularly important in these languages in discourse analysis, which includes coherence modelling. [sent-13, score-0.311]

11 We instead use topological fields, a model of clausal structure which is indicative ofinformation structure in German, but shallow enough to be automatically parsed at high accuracy. [sent-19, score-0.752]

12 We test the hypothesis that they would provide a good complement or alternative to grammatical roles in local coherence modelling. [sent-20, score-0.542]

13 We show that they are superior to grammatical roles in a sentence ordering experiment, and in fact outperforms simple word-order information as well. [sent-21, score-0.307]

14 We further show that these differences are particularly large when manual syntactic and grammatical role an186 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h. [sent-22, score-0.226]

15 ” Figure 1: The clausal and topological field structure of a German sentence. [sent-26, score-0.866]

16 We then embed these topological field annotations into a natural language generation system to show the utility of local coherence information in an applied setting. [sent-29, score-1.18]

17 We add contextual features using topological field transitions to the model of Filippova and Strube (2007b) and achieve a slight improvement over their model in a constituent ordering task, though not statistically significantly. [sent-30, score-1.054]

18 We conclude by discussing possible reasons for the utility of topological fields in local coherence modelling. [sent-31, score-1.108]

19 1 German Topological Field Parsing Topological fields are sequences of one or more contiguous phrases found in an enclosing syntactic region, which is the clause in the case of the German topological field model (H¨ ohle, 1983). [sent-33, score-1.008]

20 Topological fields are a useful abstraction of word order, because while Germanic word order is relatively free with respect to grammatical functions, the order of the topological fields is strict and unvarying. [sent-36, score-1.033]

21 The other topological fields are defined in relation to these two brackets, and contain all other parts of the clause such as verbal arguments, adjuncts, and discourse cues. [sent-39, score-0.892]

22 The NF (Nachfeld or “post-field”) contains prosodically heavy elements such as postposed prepositional phrases or relative clauses, and occasionally postposed noun phrases. [sent-44, score-0.319]

23 2 The Role of the Vorfeld One of the reasons that we use topological fields for local coherence modelling is the role that the VF plays in signalling the information structure of German clauses, as it often contains the topic of the sentence. [sent-46, score-1.261]

24 h1 oth Table 1: a) An example of a document from T ¨uBa-D/Z, b) an abbreviated entity grid representation of it, and c) the feature vector representation of the abbreviated entity grid for transitions of length two. [sent-78, score-0.666]

25 nom: and other arguments Filippova and Strube (2007c) also examine the role of the VF in local coherence and natural language generation, focusing on the correlation between VFs and sentential topics. [sent-80, score-0.444]

26 3 Using Entity Grids to Model Local Coherence Barzilay and Lapata (2008) introduce the entity grid as a method ofrepresenting the coherence of a document. [sent-84, score-0.485]

27 Entity grids indicate the location of the occurrences of an entity in a document, which is nominative, acc: accusative, oth: dative, oblique, important for coherence modelling because mentions of an entity tend to appear in clusters of neighbouring or nearby sentences in coherent documents. [sent-85, score-0.675]

28 In Barzilay and Lapata (2008), an entity grid is constructed for each document, and is represented as a matrix in which each row represents a sentence, and each column represents an entity. [sent-87, score-0.213]

29 The cell is marked by the presence or absence of the entity, and can also be augmented with other information about the entity in this sentence, such as the grammatical role of the noun phrase representing that entity in that sentence, or the topological field in which the noun phrase appears. [sent-89, score-1.501]

30 An entity grid representation which incorporates the syntactic role of the noun phrase in which the entity ap188 pears is also shown (not all entities are listed for brevity). [sent-91, score-0.643]

31 We tabulate the transitions of entities between different syntactic positions (or their nonoccurrence) in sentences, and convert the frequencies of transitions into a feature vector representation of transition probabilities in the document. [sent-92, score-0.273]

32 This model of local coherence was investigated for German by Filippova and Strube (2007a). [sent-94, score-0.36]

33 In contrast, our work focuses on improving performance by annotating entities with additional linguistic information, such as topological fields, and is geared towards natural language generation systems where perfect information is available. [sent-98, score-0.74]

34 Similar models of local coherence include various Centering Theory accounts of local coherence ((Kibble and Power, 2004; Poesio et al. [sent-99, score-0.72]

35 1 Method We test a version of the entity grid representation augmented with topological fields in a sentence ordering experiment corresponding to Experiment 1 of Barzilay and Lapata (2008). [sent-104, score-1.15]

36 , 2004), which contains manual coreference, grammatical role and topological field information. [sent-109, score-0.944]

37 Representation when marking the presence of an entity in a sentence, what information about the entity is marked (topological field, grammatical role, or none). [sent-118, score-0.46]

38 2 Entity Representations The main goal of this study is to compare word order, grammatical role and topological field information, which is encoded into the entity grid at each occurrence of an entity. [sent-128, score-1.129]

39 Here, we describe the variants of the entity representations that we compare. [sent-129, score-0.207]

40 189 Baseline Representations We implement several baseline representations against which we test our topological field-enhanced model. [sent-130, score-0.667]

41 The simplest baseline representation marks the mere appearance of an entity without any additional information, which we refer to as de fault . [sent-131, score-0.235]

42 The two versions of clausal order we tried are order 1/ 2 / 3+, which marks a noun phrase as the first, the sec- ond, or the third or later to appear in a clause, and order 1/ 2 +, which marks a noun phrase as the first, or the second or later to appear in a clause. [sent-135, score-0.553]

43 Since noun phrases can be embedded in other noun phrases, overlaps can occur. [sent-136, score-0.233]

44 In this case, the dominating noun phrase takes the smallest order number among its dominated noun phrases. [sent-137, score-0.229]

45 The third class of baseline representations we employ mark an entity by its grammatical role in the clause. [sent-138, score-0.405]

46 Because German distinguishes more grammatical roles morphologically than English, we experiment with various granularities of role labelling. [sent-140, score-0.314]

47 case s distinguishes five types of entities corresponding to the four morphological cases of German in addition to another category for noun phrases which are not complements of the main verb. [sent-142, score-0.272]

48 Topological Field-Based These representations mark the topological field in which an entity appears. [sent-143, score-0.925]

49 vf marks the noun phrase as belonging to a VF (and not in a PP) or not. [sent-146, score-0.433]

50 t opf/pp distinguishes entities in the topological fields VF, MF, and NF, contains a separate category for PP, and a category for all other noun phrases. [sent-148, score-0.951]

51 Combined We tried a representation which combines grammatical role and topological field into a single representation, sub j / ob j ×vf, winthoich a ata skinegs ltehe eCparretesseniatna product obfj s/uobb j /×ovbf j and vf above. [sent-151, score-1.231]

52 Thus, we devised additional entity representations to account for these aspects of German. [sent-155, score-0.207]

53 A noun phrase is marked as TOPIC if it is in VF as in vfpp, or if it is the first noun phrase in MF and also the first NP in the clause. [sent-157, score-0.297]

54 While this representation may appear to be very similar to simply distinguishing the first entity in a clause as for order 1/ 2 + in that TOPIC would correspond to the first entity in the clause, they are in fact distinct. [sent-160, score-0.421]

55 Due to issues related to coordination, appositive constructions, and fragments which do not receive a topology of fields, the first entity in a clause is labelled the TOPIC only 80. [sent-161, score-0.249]

56 The following set of decisions represents how a noun phrase is marked: If the first NP in the clause is a pronoun in an MF field and is the subject, we mark it as TOPIC. [sent-165, score-0.352]

57 Thus, we test the robustness of the entity repre190 tection experiment with various entity representations using manual and automatic annotations of topological fields and grammatical roles. [sent-170, score-1.37]

58 We employ the following two systems for extracting topological fields and grammatical roles. [sent-178, score-0.889]

59 To parse topological fields, we use the Berkeley parser of Petrov and Klein (2007), which has been shown to perform well at this task (Cheung and Penn, 2009). [sent-179, score-0.604]

60 35% F1 on topological fields and clausal nodes without gold POS tags on the section of T ¨uBa-D/Z it was tested on. [sent-181, score-0.896]

61 First, we tried extracting grammatical roles from the parse trees which we obtained from the Berkeley parser, as this information is present in the edge labels that can be recovered from the parse. [sent-183, score-0.207]

62 Morphological case is distinct from grammatical role, as noun phrases can function as adjuncts in possessive constructions and preposiAnnotation Accuracy (%) Grammatical role83. [sent-185, score-0.278]

63 +PP means that prepositional objects are treated as a separate category from topological fields. [sent-190, score-0.712]

64 However, we can approximate the grammatical role of an entity using the morphological case. [sent-193, score-0.37]

65 We follow the annotation conventions of T ¨uBa-D/Z in not assigning a grammatical role when the noun phrase is a prepositional object. [sent-194, score-0.384]

66 We also do not assign a grammatical role when the noun phrase is in the genitive case, as genitive objects are very rare in German and are far outnumbered by the possessive genitive construction. [sent-195, score-0.474]

67 The top four performing entity representations are all topological field-based, and they outperform grammatical role-based and simple clausal order-based models. [sent-198, score-1.1]

68 These results indicate that the information that topological fields provide about clause structure, appositives, right dislocation, etc. [sent-199, score-0.853]

69 which is not captured by simple clausal order is important for coherence modelling. [sent-200, score-0.42]

70 The representations incorporating linguistics-based heuristics do not outperform purely topological field-based models. [sent-201, score-0.667]

71 Surprisingly, the VF-based models fare quite poorly, performing worse than not adding any annotations, despite the fact that topological fieldbased models in general perform well. [sent-202, score-0.604]

72 The automatic topological field annotations are more accurate than the automatic grammatical role annotations (Table 3), which may partly explain why grammatical role-based models suffer more when using automatic annotations. [sent-204, score-1.189]

73 Note, however, that the models based on automatic topological field annotations outperform even the grammatical role-based models using manual annotation (at marginal significance, p < 0. [sent-205, score-0.953]

74 The topo191 logical field annotations are accurate enough that automatic annotations produce no decrease in performance. [sent-207, score-0.246]

75 These results show the upper bound of entitybased local coherence modelling with perfect coreference information. [sent-208, score-0.555]

76 In our experiments, we create an entity for every single noun phrase node that we encounter, then merge the entities that are linked by coreference. [sent-224, score-0.345]

77 Filip- pova and Strube (2007a) convert the annotations of T ¨uBa-D/Z into a dependency format, then ex- tract entities from the noun phrases found there. [sent-225, score-0.271]

78 They may thus annotate fewer entities, as there 1Barzilay and Lapata (2008) use the coreference system of Ng and Cardie (2002) to obtain coreference annotations. [sent-226, score-0.222]

79 experiment with various entity representations using manual and automatic annotations of topological fields and grammatical roles on subset of corpus used by Filippova and Strube (2007a). [sent-230, score-1.267]

80 The relative rankings of different entity representations in this experiment are similar to the rankings ofthe previous experiment, with topological field-based models outperforming grammatical role and clausal order models. [sent-233, score-1.193]

81 Various coherence models have been tested in corpus-based NLG settings. [sent-237, score-0.272]

82 (2009) compare several versions of Centering Theory-based metrics of coherence on corpora by examining how highly the original ordering found in the corpus is ranked compared to other possible orderings of propositions. [sent-239, score-0.479]

83 We embed entity topological field transitions into their probabilistic model, and show that the added coherence component slightly improves the performance of the baseline NLG system in generating constituent orderings in a German corpus, though not to a statistically significant degree. [sent-242, score-1.389]

84 The baseline generation system already incorporates topological field information into the constituent ordering process. [sent-248, score-0.999]

85 In the first VF selection step, MAXENT simply produces a probability of each constituent being a VF, and the constituent with the highest probability is selected. [sent-259, score-0.24]

86 The final ordering is achieved by first randomizing the order of the constituents in a clause (besides the first one, which is selected to be the VF), then sorting them according to the precedence probabilities. [sent-261, score-0.309]

87 Specifically, a constituent A is put before a constituent B if MAXENT2(A,B) > 0. [sent-262, score-0.24]

88 We incorporate local coherence information into the model by adding entity transition features which we found to be useful in the sentence ordering experiment in Section 3 above. [sent-273, score-0.72]

89 Specifically, we add features indicating the topological fields in which entities occur in the previous sentences. [sent-274, score-0.816]

90 Because this corpus does not come with general coreference information except for the coreference chain of the biographee, we use the semantic classes instead. [sent-276, score-0.222]

91 An example of a feature may be biog-last2, which takes on a value such as ‘v−’, meaning that this constituent refers to thhe a biographee, ainngd thhaet biographee occurs rins the VF two clauses ago (v), but does not appear in the previous clause (−). [sent-278, score-0.379]

92 72 Table 5: Results of adding coherence features into a natural language generation system. [sent-286, score-0.308]

93 We suggest that the lack of coreference information for all entities in the article may have reduced the benefit of the coherence component. [sent-295, score-0.451]

94 5 Conclusions We have shown that topological fields are a useful source of information for local coherence modelling. [sent-301, score-1.108]

95 In a sentence-order permutation detection task, models which use topological field information outperform both grammatical role-based models and models based on simple clausal order, with the best performing model achieving a relative error reduction of 40. [sent-302, score-1.007]

96 Applying our local coherence model in another setting, we have embedded topological field transitions of entities into an NLG system which orders constituents in German clauses. [sent-304, score-1.252]

97 We suggest that the utility of topological fields in local coherence modelling comes from the interaction between word order and information structure in freer-word-order languages. [sent-306, score-1.16]

98 Crucially, topological fields take into account issues such as coordination, appositives, sentential fragments and differences in clause types, which word order alone does not. [sent-307, score-0.88]

99 Further refinement of the topological field annotations to take advantage of the fact that they do not correspond neatly to any single information status such as topic or focus could provide additional performance gains. [sent-309, score-0.828]

100 Extending the entity-grid coherence model to semantically related entities. [sent-367, score-0.272]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('topological', 0.604), ('coherence', 0.272), ('vf', 0.262), ('filippova', 0.191), ('clausal', 0.148), ('fields', 0.144), ('entity', 0.144), ('grammatical', 0.141), ('barzilay', 0.127), ('ordering', 0.125), ('strube', 0.122), ('constituent', 0.12), ('field', 0.114), ('coreference', 0.111), ('german', 0.111), ('clause', 0.105), ('biographee', 0.101), ('vorfeld', 0.101), ('centering', 0.096), ('noun', 0.096), ('local', 0.088), ('mf', 0.084), ('lapata', 0.08), ('grid', 0.069), ('entities', 0.068), ('frauen', 0.067), ('annotations', 0.066), ('representations', 0.063), ('nlg', 0.061), ('transitions', 0.061), ('salience', 0.059), ('role', 0.057), ('nf', 0.057), ('transition', 0.055), ('clauses', 0.053), ('prepositional', 0.053), ('modelling', 0.052), ('auch', 0.05), ('postposed', 0.05), ('vfs', 0.05), ('cheung', 0.048), ('constituents', 0.045), ('kendall', 0.044), ('orderings', 0.044), ('topic', 0.044), ('roles', 0.041), ('phrases', 0.041), ('discourse', 0.039), ('distinguishes', 0.039), ('marks', 0.038), ('genitive', 0.038), ('versions', 0.038), ('phrase', 0.037), ('poesio', 0.037), ('document', 0.037), ('generation', 0.036), ('experiment', 0.036), ('pp', 0.036), ('oth', 0.036), ('acc', 0.034), ('precedence', 0.034), ('accidents', 0.034), ('addressation', 0.034), ('earthquakes', 0.034), ('grids', 0.034), ('oftopological', 0.034), ('opic', 0.034), ('vfpp', 0.034), ('werden', 0.034), ('perfect', 0.032), ('marked', 0.031), ('statistically', 0.03), ('passage', 0.029), ('prosodically', 0.029), ('neighbouring', 0.029), ('bracket', 0.029), ('karamanis', 0.029), ('telljohann', 0.029), ('objects', 0.029), ('anaphora', 0.029), ('structuring', 0.029), ('manual', 0.028), ('representation', 0.028), ('morphological', 0.028), ('sentential', 0.027), ('permuted', 0.027), ('versley', 0.027), ('subordinating', 0.027), ('die', 0.027), ('dipper', 0.027), ('sgall', 0.027), ('tau', 0.027), ('resolution', 0.026), ('treated', 0.026), ('tried', 0.025), ('abbreviated', 0.025), ('kibble', 0.025), ('appositives', 0.025), ('women', 0.025), ('fault', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

2 0.1893803 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

Author: Emily Pitler ; Annie Louis ; Ani Nenkova

Abstract: To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference informa- tion, and summarization specific features. Our best results are 90% accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input.

3 0.14232129 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

Author: Vincent Ng

Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.

4 0.10174 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

5 0.10018893 28 acl-2010-An Entity-Level Approach to Information Extraction

Author: Aria Haghighi ; Dan Klein

Abstract: We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%.

6 0.096357629 130 acl-2010-Hard Constraints for Grammatical Function Labelling

7 0.096048027 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

8 0.094155967 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

9 0.094004601 73 acl-2010-Coreference Resolution with Reconcile

10 0.085131928 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

11 0.083288819 233 acl-2010-The Same-Head Heuristic for Coreference

12 0.082871281 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

13 0.081186257 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information

14 0.069619291 132 acl-2010-Hierarchical Joint Learning: Improving Joint Parsing and Named Entity Recognition with Non-Jointly Labeled Data

15 0.068801023 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue

16 0.068477087 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

17 0.067107067 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

18 0.067101181 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

19 0.066497475 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data

20 0.065132678 39 acl-2010-Automatic Generation of Story Highlights


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.192), (1, 0.071), (2, 0.009), (3, -0.104), (4, -0.077), (5, 0.137), (6, 0.02), (7, -0.076), (8, 0.022), (9, 0.058), (10, 0.035), (11, -0.024), (12, 0.011), (13, 0.018), (14, 0.043), (15, 0.024), (16, 0.076), (17, 0.024), (18, 0.031), (19, 0.004), (20, 0.045), (21, -0.017), (22, -0.04), (23, 0.012), (24, 0.013), (25, 0.029), (26, -0.007), (27, 0.018), (28, -0.03), (29, -0.024), (30, 0.033), (31, -0.054), (32, -0.017), (33, -0.017), (34, 0.048), (35, -0.02), (36, 0.002), (37, -0.051), (38, -0.137), (39, 0.019), (40, 0.011), (41, 0.067), (42, -0.139), (43, -0.128), (44, -0.041), (45, -0.03), (46, 0.059), (47, -0.019), (48, 0.057), (49, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93360847 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

2 0.60502023 38 acl-2010-Automatic Evaluation of Linguistic Quality in Multi-Document Summarization

Author: Emily Pitler ; Annie Louis ; Ani Nenkova

Abstract: To date, few attempts have been made to develop and validate methods for automatic evaluation of linguistic quality in text summarization. We present the first systematic assessment of several diverse classes of metrics designed to capture various aspects of well-written text. We train and test linguistic quality models on consecutive years of NIST evaluation data in order to show the generality of results. For grammaticality, the best results come from a set of syntactic features. Focus, coherence and referential clarity are best evaluated by a class of features measuring local coherence on the basis of cosine similarity between sentences, coreference informa- tion, and summarization specific features. Our best results are 90% accuracy for pairwise comparisons of competing systems over a test set of several inputs and 70% for ranking summaries of a specific input.

3 0.5995878 28 acl-2010-An Entity-Level Approach to Information Extraction

Author: Aria Haghighi ; Dan Klein

Abstract: We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%.

4 0.57044953 233 acl-2010-The Same-Head Heuristic for Coreference

Author: Micha Elsner ; Eugene Charniak

Abstract: We investigate coreference relationships between NPs with the same head noun. It is relatively common in unsupervised work to assume that such pairs are coreferent– but this is not always true, especially if realistic mention detection is used. We describe the distribution of noncoreferent same-head pairs in news text, and present an unsupervised generative model which learns not to link some samehead NPs using syntactic features, improving precision.

5 0.56760162 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

Author: Vincent Ng

Abstract: The research focus of computational coreference resolution has exhibited a shift from heuristic approaches to machine learning approaches in the past decade. This paper surveys the major milestones in supervised coreference research since its inception fifteen years ago.

6 0.55265641 72 acl-2010-Coreference Resolution across Corpora: Languages, Coding Schemes, and Preprocessing Information

7 0.51365656 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue

8 0.51000208 39 acl-2010-Automatic Generation of Story Highlights

9 0.50958323 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

10 0.5093531 196 acl-2010-Plot Induction and Evolutionary Search for Story Generation

11 0.49990281 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

12 0.47533333 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure

13 0.47320566 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

14 0.4719511 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

15 0.47114331 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

16 0.47067538 73 acl-2010-Coreference Resolution with Reconcile

17 0.46625328 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

18 0.46044007 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

19 0.45549282 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

20 0.45391929 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.01), (14, 0.015), (25, 0.108), (39, 0.015), (42, 0.036), (44, 0.014), (52, 0.204), (59, 0.073), (73, 0.049), (78, 0.053), (80, 0.017), (83, 0.166), (84, 0.033), (98, 0.111)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88028026 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project

Author: Mikko Kurimo ; William Byrne ; John Dines ; Philip N. Garner ; Matthew Gibson ; Yong Guan ; Teemu Hirsimaki ; Reima Karhila ; Simon King ; Hui Liang ; Keiichiro Oura ; Lakshmi Saheer ; Matt Shannon ; Sayaki Shiota ; Jilei Tian

Abstract: In the EMIME project we have studied unsupervised cross-lingual speaker adaptation. We have employed an HMM statistical framework for both speech recognition and synthesis which provides transformation mechanisms to adapt the synthesized voice in TTS (text-to-speech) using the recognized voice in ASR (automatic speech recognition). An important application for this research is personalised speech-to-speech translation that will use the voice of the speaker in the input language to utter the translated sentences in the output language. In mobile environments this enhances the users’ interaction across language barriers by making the output speech sound more like the original speaker’s way of speaking, even if she or he could not speak the output language.

same-paper 2 0.84820271 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

Author: Jackie Chi Kit Cheung ; Gerald Penn

Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.

3 0.84307349 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification

Author: Ainur Yessenalina ; Yejin Choi ; Claire Cardie

Abstract: One ofthe central challenges in sentimentbased text categorization is that not every portion of a document is equally informative for inferring the overall sentiment of the document. Previous research has shown that enriching the sentiment labels with human annotators’ “rationales” can produce substantial improvements in categorization performance (Zaidan et al., 2007). We explore methods to automatically generate annotator rationales for document-level sentiment classification. Rather unexpectedly, we find the automatically generated rationales just as helpful as human rationales.

4 0.79799354 154 acl-2010-Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

Author: Dong Yang ; Paul Dixon ; Sadaoki Furui

Abstract: This paper presents a joint optimization method of a two-step conditional random field (CRF) model for machine transliteration and a fast decoding algorithm for the proposed method. Our method lies in the category of direct orthographical mapping (DOM) between two languages without using any intermediate phonemic mapping. In the two-step CRF model, the first CRF segments an input word into chunks and the second one converts each chunk into one unit in the target language. In this paper, we propose a method to jointly optimize the two-step CRFs and also a fast algorithm to realize it. Our experiments show that the proposed method outper- forms the well-known joint source channel model (JSCM) and our proposed fast algorithm decreases the decoding time significantly. Furthermore, combination of the proposed method and the JSCM gives further improvement, which outperforms state-of-the-art results in terms of top-1 accuracy.

5 0.74855971 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."

Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth

Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.

6 0.74223459 71 acl-2010-Convolution Kernel over Packed Parse Forest

7 0.74202448 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

8 0.72796863 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference

9 0.72705907 73 acl-2010-Coreference Resolution with Reconcile

10 0.72436273 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

11 0.71986449 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information

12 0.71945661 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

13 0.71482867 233 acl-2010-The Same-Head Heuristic for Coreference

14 0.7143873 158 acl-2010-Latent Variable Models of Selectional Preference

15 0.71367371 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes

16 0.71359742 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

17 0.71284992 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years

18 0.71124029 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

19 0.70966822 230 acl-2010-The Manually Annotated Sub-Corpus: A Community Resource for and by the People

20 0.70940667 112 acl-2010-Extracting Social Networks from Literary Fiction