acl acl2013 acl2013-21 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ravi Kondadadi ; Blake Howald ; Frank Schilder
Abstract: We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus–level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time. – – *∗Ravi Kondadadi is now affiliated with Nuance Communications, Inc.
Reference: text
sentIndex sentText sentNum sentScore
1 com Abstract We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. [sent-3, score-0.352]
2 Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. [sent-4, score-0.247]
3 We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. [sent-10, score-0.369]
4 The metrics for generated weather and biography texts fall within acceptable ranges. [sent-11, score-0.63]
5 However, document planning is arguably one of the most crucial components of an NLG system and is responsible for making the texts express the desired communicative goal in a coherent structure. [sent-18, score-0.286]
6 If the document planning stage fails, the communicative goal of the generated text will not be met even if the other two stages are perfect. [sent-19, score-0.29]
7 A template structure contains “gaps” that are filled to generate the output. [sent-34, score-0.162]
8 The idea is to create a lot of templates from the historical data and select the right template based on some constraints. [sent-35, score-0.344]
9 Experiments with different variants of our system (for biography and weather subject matter domains) demonstrate that our system generates reasonable texts. [sent-37, score-0.568]
10 ), we present a unique text evaluation metric called syntactic variability to measure the linguistic vari- ation of generated texts. [sent-41, score-0.179]
11 This metric applies to the document collection level and is based on computing the number of unique template sequences among all the generated texts. [sent-42, score-0.333]
12 We argue that this metric is useful for evaluating template-based systems and for any type of text generation for domains where linguistic variability is favored (e. [sent-44, score-0.261]
13 The main contributions of this paper are (1) A statistical NLG system that combines document and sentence planning and surface realization into one single process; and (2) A new metric syntactic variability is proposed to measure the syntactic and morphological variability of the generated texts. [sent-47, score-0.602]
14 We believe this is the first work to propose an automatic metric to measure linguistic variability of generated texts in NLG. [sent-48, score-0.234]
15 2 Background Typically, knowledge-based NLG systems are implemented by rules and, as mentioned above, have a pipelined architecture for the document and sentence planning stages and surface realization (Hovy, 1993; Moore and Paris, 1993). [sent-53, score-0.32]
16 However, document planning is arguably the most important task (Sripada et al. [sent-54, score-0.201]
17 It follows that approaches to document planning are rule-based as well and, concomitantly, are usually domain specific. [sent-56, score-0.245]
18 (201 1) proposed document planning based on an ontology knowledge base to generate football summaries. [sent-58, score-0.229]
19 For example, Duboue and McKeown (2003) proposed a statistical approach to extract content selection rules for biography descriptions. [sent-66, score-0.303]
20 However, while the system consolidated both sentence planning and surface realization with this approach (described in more detail in Section 3), the document plan was given via the input data and sequencing information was present in training documents. [sent-73, score-0.352]
21 For the present research, we introduce a similar method that leverages the distributions of document–level features in the training corpus to incorporate a statistical document planning component. [sent-74, score-0.201]
22 3 Methodology In order to generate text for a given domain our system runs input data through a statistical ranking model to select a sequence of templates that best fit the input data (E). [sent-76, score-0.264]
23 In order to build the ranking model, our system takes historical data (corpus) for the domain through four components: (A) preprocessing; (B) “conceptual unit” creation; (C) collecting statistics; and (D) ranking model build- ing (summarized in Figure 1). [sent-77, score-0.212]
24 Preprocessing involves uncovering the underlying semantic structure of the corpus and using this as a foundation for template creation (Lu et al. [sent-83, score-0.162]
25 We developed the named-entity tagger for the weather domain ourselves. [sent-89, score-0.279]
26 To tag entities in the biography domain, we used OpenCalais (www. [sent-90, score-0.309]
27 For example, in the biography in (1), the conceptual meaning (semantic predicates and domain-specific entities) of sentences (a-b) are represented in (c-d). [sent-93, score-0.368]
28 The outputs ofthe preprocessing stage are the template bank and predicate information for each template in the corpus. [sent-107, score-0.381]
29 This is a semiautomatic process where we use the predicate information for each template to compute similarity between templates. [sent-110, score-0.162]
30 We associate each template in the cor- pus to a corresponding CuId. [sent-114, score-0.162]
31 For example, in (2), using the templates in (1e-f), the identified named entities are assigned to a clustered CuId (2a-b). [sent-115, score-0.168]
32 s – – At this stage, we will have a set of conceptual units with corresponding template collections (see Howald et al. [sent-122, score-0.23]
33 However, to contrast our work from Duboue and McKeown, which focused on content selection, we are focused on learning templates from the semantic representations for the complete generation system (covering content selection, aggregation, sentence and document planning). [sent-128, score-0.374]
34 4 Building a ranking model The core component of our system is a statistical model that ranks a set of templates for a given position (sentence 1, sentence 2, . [sent-135, score-0.234]
35 To generate the training data, we first filter the templates that have named entity tags not specified in the input data. [sent-141, score-0.2]
36 We then rank templates according to the Levenshtein edit distance (Levenshtein, 1966) from the template corresponding to the current sentence in the training document (using the top 10 ranked templates in training for ease of processing effort). [sent-143, score-0.459]
37 Similarity between the most likely template in CuId aSinmd cluarrirteyn bte ettewmepelna tthee: Emdoits tdi lsiktaelnyce t e bmetpwleaetne tnhe C cuuIrdrent template and the most likely template for the current CuId. [sent-156, score-0.486]
38 Similarity between the most likely template in CuId gSiimvenila proitsyit bieontw aeendn tchuerr menostt t leikmeplyla tetem: pEladitet dinis CtauncIed between the current template and the most likely template for the current CuId at the current position. [sent-157, score-0.486]
39 5 Generation At generation time, our system has a set of input data, a semantically organized template bank (collection of templates organized by CuId) and a model from training on the documents for a given domain. [sent-160, score-0.445]
40 We first filter out those templates that contain a named entity tag not present in the input data. [sent-161, score-0.2]
41 The template with the highest overall score is selected and filled with matching entity tags from the input data and 1409 appended to the generated text. [sent-163, score-0.266]
42 Before generating the next sentence, we track those entities used in the initial sentence generation and decide to either remove those entities from the input data or keep the entity for one or more additional sentence generations. [sent-164, score-0.283]
43 For example, in the biography discourses, the name of the person may occur only once in the input data, but it may be useful for creating good texts to have that person’s name available for subsequent generations. [sent-165, score-0.388]
44 To illustrate in (3), if we remove James Smithton from the input data after the initial generation, an irrelevant sentence (d) is generated as the input data will only have one company after the removal of James Smithton and the model will only select a template with one company. [sent-166, score-0.306]
45 If we keep James Smithton, then the generations in (a-b) are more cohesive. [sent-167, score-0.227]
46 For example, some entities are very unique to a text and should not be made available for subsequent generations as doing so would lead to unwanted redundancies (e. [sent-180, score-0.263]
47 , mentioning the name of current company in a biography discourse more than once as in (3)) and some entities are general and should be retained. [sent-182, score-0.368]
48 Our system possesses the ability to monitor the data usage from historical data and we can set parameters (based on the distribution of en- tities) on the usage to ensure coherent generations for a given domain. [sent-183, score-0.335]
49 Then, the results of both automatic and human evaluations of our system’s generations against the original and baseline texts are considered as a means of determining performance. [sent-189, score-0.325]
50 For all experiments reported in this section, the baseline system selects the most frequent conceptual unit at the given position, chooses the most likely template for the conceptual unit, and fills the template with input data. [sent-190, score-0.518]
51 1 Data We ran our system on two different domains: corporate officer and director biographies and offshore oil rig weather reports from the SUMTIMEMETEO corpus ((Reiter et al. [sent-193, score-0.353]
52 The biography domain includes 1150 texts ranging from 3-17 sentences and the weather domain includes 1045 weather reports ranging from 1-6 sentences. [sent-195, score-0.886]
53 (4) provides generation comparisons for the system ( DocSys), baseline ( DocBase) and original ( DocOrig) randomly selected text snippets from each domain. [sent-197, score-0.121]
54 The variability of the generated texts ranges from a close similarity to slightly shorter - not an uncommon (Belz and Reiter, 2006), but not necessarily detrimental, observation for NLG systems (van Deemter et al. [sent-198, score-0.19]
55 However, we provide no comparison between our system and SUMTIME-METEO as our system utilized the generated forecasts from SUMTIME-METEO’s system as the historical data. [sent-211, score-0.233]
56 We cannot compare with other statistical generation systems like (Belz, 2007) as they only focussed on the part of the forecasts the predicts wind characteristics whereas our system generates the complete forecasts. [sent-212, score-0.15]
57 The DocSys and largely grammatical DocBase generations are and coherent on the surface with some variance, but there are graded semantic variations (e. [sent-215, score-0.255]
58 Both automatic and human evaluations are required in NLG to determine the impact ofthese variances on the understandability of the texts in general (non-experts) and as they are representative of particular subject matter domains (experts). [sent-220, score-0.18]
59 These metrics only evaluate the text on a document level but fail to identify “syntactic repetitiveness” across documents in a document collection. [sent-230, score-0.126]
60 In order to compute this metric, each document should be represented as a sequence of templates by associating each sentence in the document with a template in the template bank. [sent-233, score-0.58]
61 Syntactic variability is defined as the percentage of unique template sequences across all generated documents. [sent-234, score-0.325]
62 3 As indicated in Figure 2, the BLEU-4 scores are low for all DocSys and DocBase generations (as compared to DocOrig) for each domain. [sent-237, score-0.227]
63 Both BLEU–4 and METEOR measure the similarity of the generated text to the original text, but fail to penalize repetitiveness across texts, which is addressed by the syntactic variability metric. [sent-246, score-0.189]
64 There is no statistically significant difference between DocSys and DocBase generations for METEOR and BLEU–4. [sent-247, score-0.252]
65 4 However, there is a statistically significant difference in the syntactic variability metric for both domains (weather - χ2=137. [sent-248, score-0.195]
66 0001) - the variability of the DocSys generations is greater than the DocBase generations, which shows that texts generated by our system are more variable than the baseline texts. [sent-256, score-0.447]
67 For the text–understandability task, 40 documents were chosen at random from the DocOrig test set along with the corresponding 40 DocSys and DocBase generations (240 documents total/120 for each domain). [sent-284, score-0.227]
68 8 judgments per document were solicited from the crowd (1920 to- tal judgments, 69. [sent-285, score-0.144]
69 51 average agreement) and are summarized in Figures 3 and 4 (biography and weather respectively). [sent-286, score-0.235]
70 If the system is performing well and the ranking model is actually contributing to increased performance, the accepted trend should be that the DocOrig texts are more fluent and preferred compared to both the DocSys and DocBase systems. [sent-287, score-0.216]
71 Focusing on fluency ratings, it is expected that the DocOrig generations will have the highest fluency (as they are human generated). [sent-290, score-0.301]
72 Figure 3, which shows the biography text evaluations, demonstrates this acceptable distribution of performances. [sent-292, score-0.304]
73 For the weather discourses, as evident from Figure 4, the acceptable trend holds between the DocSys and DocBase generations, and the DocSys generation fluency is actually slightly higher than DocOrig. [sent-293, score-0.455]
74 This is possibly because the DocOrig texts are from a particular subject matter weather forecasts for offshore oil rigs in the U. [sent-294, score-0.346]
75 In terms of significance, there are no statistically significant differences between the systems for weather (DocOrig vs. [sent-298, score-0.26]
76 For the sentence preference task, equivalent sentences across the 120 documents were chosen at random (80 sentences from biography and 74 sentences from weather). [sent-333, score-0.299]
77 Similar to the text–understandability task, an acceptable performance pattern should include the DocOrig texts being preferred to both DocSys and DocBase generations and the DocSys generation preferred to the DocBase. [sent-336, score-0.454]
78 The biography domain illus1412 Figure 5: Biography Sentence Evaluations. [sent-338, score-0.317]
79 In contrast, for weather domain, sentences from DocBase system were preferred to our system’s (Figure 6). [sent-340, score-0.29]
80 In terms of significance, there are no statistically significant differences between the systems for weather (DocOrig vs. [sent-343, score-0.26]
81 The trend is different compared to the fluency metric above in that the DocBase system is outperforming the DocOrig generations to an almost statistically significant difference - the remaining comparisons follow the trend. [sent-359, score-0.4]
82 More problematic is the results of the biography evaluations. [sent-363, score-0.273]
83 Here there is a statistically significant difference between the DocSys and DocOrig and no statistically significant difference between the DocSys and DocBase generations (DocOrig vs. [sent-364, score-0.277]
84 4 Expert Human Evaluations We performed expert evaluations for the biography domain only as we do not have access to weather experts. [sent-384, score-0.643]
85 The four biography reviewers are journalists who write short biographies for news archives. [sent-385, score-0.273]
86 For the biography domain, evaluations of the texts were largely similar to the evaluations of the non-expert crowd (76. [sent-386, score-0.445]
87 For example, the disfluent ratings were highest for the DocBase generations and lowest for the DocOrig generations. [sent-389, score-0.254]
88 Also, the fluent ratings were highest for the DocOrig generations, and while the combined fluent and understandable are higher for DocSys as compared to DocBase, the DocBase generations had a 10% higher fluent score (58. [sent-390, score-0.344]
89 Based on notes from the reviewers, the succinctness of the the DocBase generations are preferred in some ways as they are in keeping with certain editorial standards. [sent-393, score-0.252]
90 This is further reflected in the sentence preferences being 70% in favor of the DocBase generations as compared to the DocSys generations (all other sentence comparisons were consis- tent with the non-expert crowd). [sent-394, score-0.506]
91 The development time to adapt our system to new domains is small compared to other NLG systems; around a week to adapt the system to weather and biography domains. [sent-401, score-0.595]
92 Most of the development time was spent on creating the domain-specific entity taggers for the weather domain. [sent-402, score-0.275]
93 The development time would be reduced to hours if the historical data for a domain is readily available with the corresponding input data. [sent-403, score-0.15]
94 Our system does consolidate many traditional components (macroand micro-planning, lexical choice and aggregation),5 but the system cannot be applied to the domains with no historical data. [sent-405, score-0.165]
95 The quality and the linguistic variability of the generated text is directly proportional to the amount of historical data available. [sent-406, score-0.213]
96 We also presented a new automatic metric to evaluate generated texts at document collection level to identify boilerplate texts. [sent-407, score-0.198]
97 This metric computes “syntactic repetitiveness” by counting the number of unique template sequences across the given document collection. [sent-408, score-0.297]
98 For example, most NLG pipelines have a separate component responsible for referring expression generation (Krahmer and van Deemter, 2012). [sent-410, score-0.124]
99 We believe that this is possible by identifying referring expressions in templates and adding features to the model to give higher scores to the templates having relevant referring expressions. [sent-413, score-0.274]
100 Investigating content selection for language generation using machine learning. [sent-484, score-0.121]
wordName wordTfidf (topN-words)
[('docsys', 0.43), ('docbase', 0.39), ('docorig', 0.349), ('biography', 0.273), ('weather', 0.235), ('cuid', 0.229), ('generations', 0.227), ('nlg', 0.19), ('template', 0.162), ('planning', 0.138), ('templates', 0.104), ('variability', 0.099), ('generation', 0.091), ('smithton', 0.081), ('historical', 0.078), ('meteor', 0.072), ('conceptual', 0.068), ('cornwall', 0.067), ('document', 0.063), ('director', 0.061), ('texts', 0.055), ('understandability', 0.055), ('duboue', 0.054), ('howald', 0.054), ('repetitiveness', 0.054), ('reiter', 0.051), ('sales', 0.049), ('expert', 0.048), ('position', 0.044), ('metric', 0.044), ('domain', 0.044), ('evaluations', 0.043), ('belz', 0.041), ('cuids', 0.04), ('entity', 0.04), ('realization', 0.039), ('ehud', 0.039), ('fluent', 0.039), ('serving', 0.038), ('trend', 0.037), ('fluency', 0.037), ('generated', 0.036), ('entities', 0.036), ('cfo', 0.036), ('kondadadi', 0.036), ('sripada', 0.036), ('discourse', 0.033), ('referring', 0.033), ('deemter', 0.033), ('numerically', 0.033), ('person', 0.032), ('crowd', 0.031), ('bio', 0.031), ('barriers', 0.031), ('cold', 0.031), ('enlg', 0.031), ('acceptable', 0.031), ('experts', 0.031), ('system', 0.03), ('content', 0.03), ('ranking', 0.03), ('bank', 0.03), ('forecasts', 0.029), ('mckeown', 0.029), ('bleu', 0.029), ('input', 0.028), ('football', 0.028), ('sequences', 0.028), ('surface', 0.028), ('named', 0.028), ('predicates', 0.027), ('stage', 0.027), ('adaptable', 0.027), ('domains', 0.027), ('afnredq', 0.027), ('appointed', 0.027), ('bateman', 0.027), ('boxer', 0.027), ('disfluent', 0.027), ('fordway', 0.027), ('internation', 0.027), ('keyes', 0.027), ('offshore', 0.027), ('solicited', 0.027), ('somayajulu', 0.027), ('company', 0.026), ('hybrid', 0.026), ('stages', 0.026), ('sentence', 0.026), ('ne', 0.025), ('statistically', 0.025), ('preferred', 0.025), ('holds', 0.024), ('anja', 0.024), ('denkowski', 0.024), ('kamp', 0.024), ('kees', 0.024), ('krahmer', 0.024), ('originals', 0.024), ('judgments', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
Author: Ravi Kondadadi ; Blake Howald ; Frank Schilder
Abstract: We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus–level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time. – – *∗Ravi Kondadadi is now affiliated with Nuance Communications, Inc.
Author: Sina Zarriess ; Jonas Kuhn
Abstract: We suggest a generation task that integrates discourse-level referring expression generation and sentence-level surface realization. We present a data set of German articles annotated with deep syntax and referents, including some types of implicit referents. Our experiments compare several architectures varying the order of a set of trainable modules. The results suggest that a revision-based pipeline, with intermediate linearization, significantly outperforms standard pipelines or a parallel architecture.
3 0.11529034 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
Author: Kapila Ponnamperuma ; Advaith Siddharthan ; Cheng Zeng ; Chris Mellish ; Rene van der Wal
Abstract: The aim of the Tag2Blog system is to bring satellite tagged wild animals “to life” through narratives that place their movements in an ecological context. Our motivation is to use such automatically generated texts to enhance public engagement with a specific species reintroduction programme, although the protocols developed here can be applied to any animal or other movement study that involves signal data from tags. We are working with one of the largest nature conservation charities in Europe in this regard, focusing on a single species, the red kite. We describe a system that interprets a sequence of locational fixes obtained from a satellite tagged individual, and constructs a story around its use of the landscape.
4 0.092685014 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
Author: Lu Wang ; Claire Cardie
Abstract: We address the challenge of generating natural language abstractive summaries for spoken meetings in a domain-independent fashion. We apply Multiple-Sequence Alignment to induce abstract generation templates that can be used for different domains. An Overgenerateand-Rank strategy is utilized to produce and rank candidate abstracts. Experiments using in-domain and out-of-domain training on disparate corpora show that our system uniformly outperforms state-of-the-art supervised extract-based approaches. In addition, human judges rate our system summaries significantly higher than compared systems in fluency and overall quality.
5 0.089492925 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
Author: Gerasimos Lampouras ; Ion Androutsopoulos
Abstract: We present an ILP model of concept-totext generation. Unlike pipeline architectures, our model jointly considers the choices in content selection, lexicalization, and aggregation to avoid greedy decisions and produce more compact texts.
6 0.088341944 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
7 0.075497061 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data
8 0.073595509 303 acl-2013-Robust multilingual statistical morphological generation models
9 0.066016339 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
10 0.047548171 225 acl-2013-Learning to Order Natural Language Texts
11 0.046519876 172 acl-2013-Graph-based Local Coherence Modeling
12 0.045269798 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric
13 0.045139637 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
14 0.044609763 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis
15 0.0426322 135 acl-2013-English-to-Russian MT evaluation campaign
16 0.040107746 360 acl-2013-Translating Italian connectives into Italian Sign Language
17 0.040106781 160 acl-2013-Fine-grained Semantic Typing of Emerging Entities
18 0.039174087 355 acl-2013-TransDoop: A Map-Reduce based Crowdsourced Translation for Complex Domain
19 0.039172415 267 acl-2013-PARMA: A Predicate Argument Aligner
20 0.038603719 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
topicId topicWeight
[(0, 0.13), (1, 0.027), (2, 0.016), (3, -0.056), (4, -0.002), (5, 0.028), (6, 0.008), (7, -0.018), (8, -0.023), (9, 0.028), (10, -0.039), (11, 0.03), (12, -0.034), (13, 0.027), (14, -0.052), (15, -0.014), (16, 0.027), (17, -0.042), (18, -0.021), (19, -0.014), (20, -0.045), (21, -0.004), (22, 0.003), (23, 0.037), (24, -0.006), (25, 0.047), (26, 0.073), (27, -0.033), (28, -0.017), (29, 0.014), (30, -0.003), (31, -0.088), (32, 0.014), (33, 0.034), (34, -0.035), (35, 0.038), (36, -0.024), (37, 0.019), (38, -0.058), (39, -0.003), (40, 0.051), (41, 0.091), (42, 0.008), (43, -0.024), (44, -0.049), (45, 0.026), (46, 0.149), (47, -0.051), (48, -0.146), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.87032974 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
Author: Ravi Kondadadi ; Blake Howald ; Frank Schilder
Abstract: We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus–level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time. – – *∗Ravi Kondadadi is now affiliated with Nuance Communications, Inc.
Author: Sina Zarriess ; Jonas Kuhn
Abstract: We suggest a generation task that integrates discourse-level referring expression generation and sentence-level surface realization. We present a data set of German articles annotated with deep syntax and referents, including some types of implicit referents. Our experiments compare several architectures varying the order of a set of trainable modules. The results suggest that a revision-based pipeline, with intermediate linearization, significantly outperforms standard pipelines or a parallel architecture.
3 0.82969785 337 acl-2013-Tag2Blog: Narrative Generation from Satellite Tag Data
Author: Kapila Ponnamperuma ; Advaith Siddharthan ; Cheng Zeng ; Chris Mellish ; Rene van der Wal
Abstract: The aim of the Tag2Blog system is to bring satellite tagged wild animals “to life” through narratives that place their movements in an ecological context. Our motivation is to use such automatically generated texts to enhance public engagement with a specific species reintroduction programme, although the protocols developed here can be applied to any animal or other movement study that involves signal data from tags. We are working with one of the largest nature conservation charities in Europe in this regard, focusing on a single species, the red kite. We describe a system that interprets a sequence of locational fixes obtained from a satellite tagged individual, and constructs a story around its use of the landscape.
4 0.75905544 375 acl-2013-Using Integer Linear Programming in Concept-to-Text Generation to Produce More Compact Texts
Author: Gerasimos Lampouras ; Ion Androutsopoulos
Abstract: We present an ILP model of concept-totext generation. Unlike pipeline architectures, our model jointly considers the choices in content selection, lexicalization, and aggregation to avoid greedy decisions and produce more compact texts.
5 0.722224 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
Author: Nina Dethlefs ; Helen Hastie ; Heriberto Cuayahuitl ; Oliver Lemon
Abstract: Surface realisers in spoken dialogue systems need to be more responsive than conventional surface realisers. They need to be sensitive to the utterance context as well as robust to partial or changing generator inputs. We formulate surface realisation as a sequence labelling task and combine the use of conditional random fields (CRFs) with semantic trees. Due to their extended notion of context, CRFs are able to take the global utterance context into account and are less constrained by local features than other realisers. This leads to more natural and less repetitive surface realisation. It also allows generation from partial and modified inputs and is therefore applicable to incremental surface realisation. Results from a human rating study confirm that users are sensitive to this extended notion of context and assign ratings that are significantly higher (up to 14%) than those for taking only local context into account.
6 0.67359072 1 acl-2013-"Let Everything Turn Well in Your Wife": Generation of Adult Humor Using Lexical Constraints
7 0.59722543 303 acl-2013-Robust multilingual statistical morphological generation models
8 0.59516543 129 acl-2013-Domain-Independent Abstract Generation for Focused Meeting Summarization
9 0.5643239 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
10 0.56119168 65 acl-2013-BRAINSUP: Brainstorming Support for Creative Sentence Generation
11 0.52511668 371 acl-2013-Unsupervised joke generation from big data
12 0.48085588 89 acl-2013-Computerized Analysis of a Verbal Fluency Test
13 0.47648779 268 acl-2013-PATHS: A System for Accessing Cultural Heritage Collections
14 0.46416742 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
15 0.45691821 364 acl-2013-Typesetting for Improved Readability using Lexical and Syntactic Information
16 0.44976103 203 acl-2013-Is word-to-phone mapping better than phone-phone mapping for handling English words?
17 0.43953079 228 acl-2013-Leveraging Domain-Independent Information in Semantic Parsing
18 0.42704192 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures
19 0.42534614 322 acl-2013-Simple, readable sub-sentences
20 0.4246406 14 acl-2013-A Novel Classifier Based on Quantum Computation
topicId topicWeight
[(0, 0.039), (6, 0.023), (11, 0.061), (15, 0.02), (24, 0.042), (26, 0.045), (34, 0.291), (35, 0.072), (42, 0.101), (48, 0.032), (70, 0.045), (88, 0.025), (90, 0.037), (95, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.74706054 21 acl-2013-A Statistical NLG Framework for Aggregated Planning and Realization
Author: Ravi Kondadadi ; Blake Howald ; Frank Schilder
Abstract: We present a hybrid natural language generation (NLG) system that consolidates macro and micro planning and surface realization tasks into one statistical learning process. Our novel approach is based on deriving a template bank automatically from a corpus of texts from a target domain. First, we identify domain specific entity tags and Discourse Representation Structures on a per sentence basis. Each sentence is then organized into semantically similar groups (representing a domain specific concept) by k-means clustering. After this semi-automatic processing (human review of cluster assignments), a number of corpus–level statistics are compiled and used as features by a ranking SVM to develop model weights from a training corpus. At generation time, a set of input data, the collection of semantically organized templates, and the model weights are used to select optimal templates. Our system is evaluated with automatic, non–expert crowdsourced and expert evaluation metrics. We also introduce a novel automatic metric syntactic variability that represents linguistic variation as a measure of unique template sequences across a collection of automatically generated documents. The metrics for generated weather and biography texts fall within acceptable ranges. In sum, we argue that our statistical approach to NLG reduces the need for complicated knowledge-based architectures and readily adapts to different domains with reduced development time. – – *∗Ravi Kondadadi is now affiliated with Nuance Communications, Inc.
2 0.66636658 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
Author: Lu Wang ; Hema Raghavan ; Vittorio Castelli ; Radu Florian ; Claire Cardie
Abstract: We consider the problem of using sentence compression techniques to facilitate queryfocused multi-document summarization. We present a sentence-compression-based framework for the task, and design a series of learning-based compression models built on parse trees. An innovative beam search decoder is proposed to efficiently find highly probable compressions. Under this framework, we show how to integrate various indicative metrics such as linguistic motivation and query relevance into the compression process by deriving a novel formulation of a compression scoring function. Our best model achieves statistically significant improvement over the state-of-the-art systems on several metrics (e.g. 8.0% and 5.4% improvements in ROUGE-2 respectively) for the DUC 2006 and 2007 summarization task. ,
3 0.56648678 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
Author: Xipeng Qiu ; Qi Zhang ; Xuanjing Huang
Abstract: The growing need for Chinese natural language processing (NLP) is largely in a range of research and commercial applications. However, most of the currently Chinese NLP tools or components still have a wide range of issues need to be further improved and developed. FudanNLP is an open source toolkit for Chinese natural language processing (NLP) , which uses statistics-based and rule-based methods to deal with Chinese NLP tasks, such as word segmentation, part-ofspeech tagging, named entity recognition, dependency parsing, time phrase recognition, anaphora resolution and so on.
4 0.52047527 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
Author: Peifeng Li ; Qiaoming Zhu ; Guodong Zhou
Abstract: As a paratactic language, sentence-level argument extraction in Chinese suffers much from the frequent occurrence of ellipsis with regard to inter-sentence arguments. To resolve such problem, this paper proposes a novel global argument inference model to explore specific relationships, such as Coreference, Sequence and Parallel, among relevant event mentions to recover those intersentence arguments in the sentence, discourse and document layers which represent the cohesion of an event or a topic. Evaluation on the ACE 2005 Chinese corpus justifies the effectiveness of our global argument inference model over a state-of-the-art baseline. 1
5 0.51687199 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
Author: Ji Ma ; Jingbo Zhu ; Tong Xiao ; Nan Yang
Abstract: In this paper, we combine easy-first dependency parsing and POS tagging algorithms with beam search and structured perceptron. We propose a simple variant of “early-update” to ensure valid update in the training process. The proposed solution can also be applied to combine beam search and structured perceptron with other systems that exhibit spurious ambiguity. On CTB, we achieve 94.01% tagging accuracy and 86.33% unlabeled attachment score with a relatively small beam width. On PTB, we also achieve state-of-the-art performance. 1
6 0.51568151 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
7 0.51141846 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
8 0.51120883 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
9 0.50838226 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
10 0.50805396 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
11 0.50791979 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
12 0.50735301 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
13 0.50691015 225 acl-2013-Learning to Order Natural Language Texts
14 0.50659424 80 acl-2013-Chinese Parsing Exploiting Characters
15 0.5065015 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
16 0.50640953 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
17 0.50633591 166 acl-2013-Generalized Reordering Rules for Improved SMT
18 0.50406229 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
19 0.50373048 172 acl-2013-Graph-based Local Coherence Modeling
20 0.50337827 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding