acl acl2010 acl2010-199 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martijn Goudbeek ; Emiel Krahmer
Abstract: Current Referring Expression Generation algorithms rely on domain dependent preferences for both content selection and linguistic realization. We present two experiments showing that human speakers may opt for dispreferred properties and dispreferred modifier orderings when these were salient in a preceding interaction (without speakers being consciously aware of this). We discuss the impact of these findings for current generation algorithms.
Reference: text
sentIndex sentText sentNum sentScore
1 nl Abstract Current Referring Expression Generation algorithms rely on domain dependent preferences for both content selection and linguistic realization. [sent-4, score-0.146]
2 We present two experiments showing that human speakers may opt for dispreferred properties and dispreferred modifier orderings when these were salient in a preceding interaction (without speakers being consciously aware of this). [sent-5, score-1.87]
3 We discuss the impact of these findings for current generation algorithms. [sent-6, score-0.09]
4 1 Introduction The generation of referring expressions is a core ingredient of most Natural Language Generation (NLG) systems (Reiter and Dale, 2000; Mellish et al. [sent-7, score-0.27]
5 These systems usually approach Referring Expression Generation (REG) as a two-step procedure, where first it is decided which properties to include (content selection), after which the selected properties are turned into a natural language referring expression (linguistic realization). [sent-9, score-0.173]
6 The basic problem in both stages is one of choice; there are many ways in which one could refer to a target object and there are multiple ways in which these could be realized in natural language. [sent-10, score-0.024]
7 Typically, these choice problems are tackled by giving preference to some solutions over others. [sent-11, score-0.025]
8 The Incremental Algorithm is arguably unique in assuming a complete preference order of attributes, but other REG algoEmiel Krahmer University of Tilburg Tilburg, The Netherlands e . [sent-13, score-0.025]
9 , 2003), for example, searches for the cheapest description for a target, and distinguishes cheap attributes (such as color) from more expensive ones (orientation). [sent-18, score-0.11]
10 Realization of referring expressions has received less attention, yet recent studies on the ordering of modifiers (Shaw and Hatzivassiloglou, 1999; Malouf, 2000; Mitchell, 2009) also work from the assumption that some orderings (large red) are preferred over others (red large). [sent-19, score-0.742]
11 We argue that such preferences are less stable when referring expressions are generated in interactive settings, as would be required for applications such as spoken dialogue systems or interactive virtual characters. [sent-20, score-0.474]
12 In these cases, we hypothe- size that, besides domain preferences, also the referring expressions that were produced earlier in the interaction are important. [sent-21, score-0.255]
13 It has been shown that if one dialogue participant refers to a couch as a sofa, the next speaker is more likely to use the word sofa as well (Branigan et al. [sent-22, score-0.194]
14 This kind of micro-planning or “lexical entrainment” (Brennan and Clark, 1996) can be seen as a specific form of “alignment” (Pickering and Garrod, 2004) between speaker and addressee. [sent-24, score-0.03]
15 Pickering and Garrod argue that alignment may take place on all levels of interaction, and indeed it has been shown that participants also align their intonation patterns and syntactic structures. [sent-25, score-0.173]
16 However, as far as we know, experimental evidence for alignment on the level of content planning has never been given, and neither have alignment effects in modifier orderings during realization been shown. [sent-26, score-0.455]
17 (2009) who study alignment in micro-planning, and Janarthanam and Lemon (2009) who study alignment in expertise levels, alignment has received little attention in NLG so far. [sent-28, score-0.237]
18 Experiment Istudies the trade-off between adaptation 55 Uppsala,P Srwoce de dni,n 1g1s- 1of6 t Jhuely AC 20L1 20 . [sent-30, score-0.058]
19 0c 2 C0o1n0fe Aresnsoceci Sathio rnt f Poarp Ceorsm,p paugteastio 5n5a–l5 L9i,nguistics and preferences during content selection while Experiment II looks at this trade-off for modifier orderings during realization. [sent-32, score-0.354]
20 Both studies use a novel interactive reference production paradigm, applied to two domains the Furniture and People domains of the TUNA data-set (Gatt et al. [sent-33, score-0.218]
21 , 2009) to see whether adaptation may be domain dependent. [sent-35, score-0.103]
22 Finally, we contrast our findings with the performance of state-of-theart REG algorithms, discussing how they could be adapted so as to account for the new data, effectively adding plasticity to the generation process. [sent-36, score-0.09]
23 – – 2 Experiment I Experiment Istudies what speakers do when referring to a target that can be distinguished in a preferred (the blue fan) or a dispreferred way (the left-facing fan), when in the prior context either the first or the second variant was made salient. [sent-37, score-1.227]
24 Method Participants 26 students (2 male, mean age = 20 years, 11 months), all native speakers of Dutch without hearing or speech problems, participated for course credits. [sent-38, score-0.131]
25 Materials Target pictures were taken from the TUNA corpus (Gatt et al. [sent-39, score-0.036]
26 This corpus consists of two domains: one containing pictures ofpeople (famous mathematicians), the other containing furniture items in different colors depicted from different orientations. [sent-41, score-0.231]
27 , 2009) it is known that participants show a preference for certain attributes: color in the Furniture domain and glasses in the People domain, and disprefer other attributes (orientation of a furniture piece and wearing a tie, respectively). [sent-44, score-0.646]
28 Procedure Trials consisted of four turns in an interactive reference understanding and production experiment: a prime, two fillers and the experi- mental description (see Figure 1). [sent-45, score-0.159]
29 First, participants listened to a pre-recorded female voice referring to one of three objects and had to indicate which one was being referenced. [sent-46, score-0.25]
30 In this subtask, references either used a preferred or a dispreferred attribute; both were distinguishing. [sent-47, score-0.964]
31 Second, participants themselves described a filler picture, after which, third, they had to indicate which filler picture was being described. [sent-48, score-0.258]
32 The two filler turns always concerned stimuli from the alterna- Figure 1: The 4 tasks per trial. [sent-49, score-0.086]
33 A furniture trial is shown; people trials have an identical structure. [sent-50, score-0.385]
34 tive domain and were intended to prevent a too direct connection between the prime and the target. [sent-51, score-0.123]
35 Fourth, participants described the target object, which could always be distinguished from its distractors in a preferred (The blue fan) or a dispreferred (The left facing fan) way. [sent-52, score-1.142]
36 Note that at56 Figure 2: Proportions of preferred and dispreferred attributes in the Furniture domain. [sent-53, score-1.074]
37 tributes are primed, not values; a participant may have heard front facing in the prime turn, while the target has a different value for this attribute (cf. [sent-54, score-0.365]
38 For the two domains, there were 20 preferred and 20 dispreferred trials, giving rise to 2 x (20 + 20) = 80 critical trials. [sent-57, score-0.991]
39 These were presented in counter-balanced blocks, and within blocks each participant received a different random order. [sent-58, score-0.077]
40 In addition, there were 80 filler trials (each following the same structure as outlined in Figure 1). [sent-59, score-0.186]
41 During debriefing, none of the participants indicated they had been aware of the experiment’s purpose. [sent-60, score-0.129]
42 Results We use the proportion of attribute alignment as our dependent measure. [sent-61, score-0.246]
43 Alignment occurs when a participant uses the same attribute in the target as occurred in the prime. [sent-62, score-0.255]
44 , 2006; Arnold, 2008), where both the preferred and dispreferred attributes were mentioned by participants. [sent-64, score-1.074]
45 Overspecification occurred in 13% of the critical trials (and these were evenly distributed over the experimental conditions). [sent-65, score-0.179]
46 The use of the preferred and dispreferred attribute as a function of prime and domain is shown in Figure 2 and Figure 3. [sent-66, score-1.238]
47 In both domains, the preferred attribute is used much more frequently than the dispreferred attribute with the preferred primes, which serves as a manipulation check. [sent-67, score-1.539]
48 As a test of our hypothesis that adaptation processes play an important role in attribute selection for referring expressions, we need to look at participants’ expressions with the dispreferred primes (with the preferred primes, effects of adaptation and of preferences cannot be teased apart). [sent-68, score-1.629]
49 Current REG algorithms such as the Incremental Algorithm and the Graph-based algorithm predict that participants will always opt for the preferred Figure 3: Proportions of preferred and dispreferred attributes in the People domain. [sent-69, score-1.501]
50 attribute, and hence will not use the dispreferred attribute. [sent-70, score-0.691]
51 This is not what we observe: our participants used the dispreferred attribute at a rate significantly larger than zero when they had been exposed to it three turns earlier (tfurniture [25] = 6. [sent-71, score-1.014]
52 Ad- ditionally, they used the dispreferred attribute significantly more when they had previously heard the dispreferred attribute rather than the preferred attribute. [sent-76, score-1.989]
53 This difference is especially marked and significant in the Furniture domain (tfurniture [25] = 2. [sent-77, score-0.045]
54 34), where participants opt for the dispreferred attribute in 54% of the trials, more frequently than they do for the preferred attribute (Fig. [sent-81, score-1.42]
55 3 Experiment II Experiment II uses the same paradigm used for Experiment Ito study whether speaker’s preferences for modifier orderings can be changed by exposing them to dispreferred orderings. [sent-83, score-1.045]
56 Method Participants 28 Students (ten males, mean age = 23 years and two months) participated for course credits. [sent-84, score-0.035]
57 All were native speakers ofDutch, without hearing and speech problems. [sent-85, score-0.096]
58 Materials The materials were identical to those used in Experiment I, except for their arrangement in the critical trials. [sent-87, score-0.066]
59 In these trials, the participants could only identify the target picture using two attributes. [sent-88, score-0.16]
60 In the Furniture domain these were color and size, in the People domain these were having a beard and wearing glasses. [sent-89, score-0.259]
61 1), these attributes were realized in a preferred way (“size first”: e. [sent-91, score-0.383]
62 , the big red sofa, or “glasses first”: the bespectacled and bearded man) or in a dispreferred way (“color first”: the red big sofa or “beard first” the bespectacled and bearded 57 Figure 4: Proportions of preferred and dispreferred modifier orderings in the Furniture domain. [sent-93, score-2.282]
63 Google counts for the original Dutch modifier orderings reveal that the ratio of preferred to dispreferred is in the order of 40: 1in the Furniture domain and 3: 1in the People domain. [sent-95, score-1.286]
64 Results We use the proportion of modifier ordering alignments as our dependent measure, where alignment occurs when the participant’s ordering coincides with the primed ordering. [sent-97, score-0.396]
65 Figure 4 and 5 show the use of the preferred and dispreferred modifier ordering per prime and domain. [sent-98, score-1.234]
66 It can be seen that in the preferred prime conditions, participants produce the expected orderings, more or less in accordance with the Google counts. [sent-99, score-0.453]
67 State-of-the-art realizers would always opt for the most frequent ordering of a given pair of modifiers and hence would never predict the dispreferred orderings to occur. [sent-100, score-0.978]
68 Still, the use of the dispreferred modifier ordering occurred significantly more often than one would expect given this prediction, tfurniture [27] = 6. [sent-101, score-0.999]
69 To test our hypotheses concerning adaptation, we looked at the dispreferred realizations when speakers were exposed to dispreferred primes (compared to preferred primes). [sent-106, score-1.916]
70 In both domains this resulted in an increase of the anount of dispreferred realizations, which was significant in the People domain (tpeople [27] = 1. [sent-107, score-0.778]
71 4 Discussion Current state-of-the-art REG algorithms often rest upon the assumption that some attributes and some realizations are preferred over others. [sent-112, score-0.422]
72 The two experiments described in this paper show that this assumption is incorrect, when references are produced in an interactive setting. [sent-113, score-0.077]
73 In both experiments, speakers were more likely to select a dis- Figure 5: Proportions of preferred and dispreferred modifier orderings in the People domain. [sent-114, score-1.307]
74 preferred attribute or produce a dispreferred modifier ordering when they had previously been exposed to these attributes or orderings, without being aware of this. [sent-115, score-1.489]
75 These findings fit in well with the adaptation and alignment models proposed by psycholinguists, but ours, as far as we know, is the first experimental evidence of alignment in attribute selection and in modifier ordering. [sent-116, score-0.498]
76 To account for these findings, GRE algorithms that function in an interactive setting should be made sensitive to the production of dialogue partners. [sent-118, score-0.167]
77 For the Incremental Algorithm (Dale and Reiter, 1995), this could be achieved by augmenting the list of preferred attributes with a list of “previously mentioned” attributes. [sent-119, score-0.383]
78 , 2003), costs of properties could be based on two components: a relatively fixed domain component (preferred is cheaper) and a flexible interactive component (recently used is cheaper). [sent-122, score-0.122]
79 Which approach would work best is an open, empirical question, but either way this would constitute an important step towards interactive REG. [sent-123, score-0.077]
80 Acknowledgments The research reported in this paper forms part of the VICI project “Bridging the gap between psycholinguistics and Computational linguistics: the case of referring expressions”, funded by the Netherlands Organization for Scientific Research (NWO grant 277-70-007). [sent-124, score-0.148]
81 Computational interpretations of the gricean maxims in the generation of referring expressions. [sent-149, score-0.244]
82 Do speakers and listeners observe the gricean maxim of quantity? [sent-155, score-0.102]
83 Evaluating algorithms for the generation of referring expressions using a balanced corpus. [sent-159, score-0.27]
84 Learning lexical alignment policies for generating referring expressions for spoken dialogue systems. [sent-163, score-0.314]
85 The order of prenominal adjectives in natural language generation. [sent-177, score-0.036]
wordName wordTfidf (topN-words)
[('dispreferred', 0.691), ('preferred', 0.273), ('furniture', 0.195), ('orderings', 0.16), ('attribute', 0.151), ('referring', 0.148), ('trials', 0.125), ('reg', 0.117), ('modifier', 0.117), ('primes', 0.111), ('attributes', 0.11), ('participants', 0.102), ('gatt', 0.098), ('tfurniture', 0.089), ('tpeople', 0.089), ('color', 0.079), ('prime', 0.078), ('krahmer', 0.078), ('pickering', 0.078), ('sofa', 0.078), ('preferences', 0.077), ('interactive', 0.077), ('ordering', 0.075), ('alignment', 0.071), ('tilburg', 0.067), ('koolen', 0.067), ('speakers', 0.066), ('people', 0.065), ('expressions', 0.062), ('filler', 0.061), ('generation', 0.06), ('adaptation', 0.058), ('production', 0.057), ('participant', 0.053), ('reiter', 0.052), ('proportions', 0.052), ('opt', 0.052), ('enlg', 0.05), ('dale', 0.048), ('fan', 0.048), ('red', 0.046), ('exposed', 0.045), ('domain', 0.045), ('beard', 0.045), ('bearded', 0.045), ('bespectacled', 0.045), ('branigan', 0.045), ('buschmeier', 0.045), ('cheaper', 0.045), ('engelhardt', 0.045), ('glasses', 0.045), ('goudbeek', 0.045), ('istudies', 0.045), ('martijn', 0.045), ('uvt', 0.045), ('wearing', 0.045), ('netherlands', 0.042), ('domains', 0.042), ('materials', 0.039), ('realizations', 0.039), ('emiel', 0.039), ('brennan', 0.039), ('garrod', 0.039), ('tuna', 0.039), ('incremental', 0.038), ('orientation', 0.036), ('realization', 0.036), ('ehud', 0.036), ('gricean', 0.036), ('prenominal', 0.036), ('pictures', 0.036), ('janarthanam', 0.036), ('participated', 0.035), ('experiment', 0.035), ('picture', 0.034), ('primed', 0.034), ('shaw', 0.034), ('dialogue', 0.033), ('heard', 0.032), ('albert', 0.032), ('hearing', 0.03), ('findings', 0.03), ('speaker', 0.03), ('critical', 0.027), ('occurred', 0.027), ('months', 0.027), ('greece', 0.027), ('mellish', 0.027), ('facing', 0.027), ('nlg', 0.027), ('aware', 0.027), ('turns', 0.025), ('expression', 0.025), ('blue', 0.025), ('bridging', 0.025), ('referential', 0.025), ('preference', 0.025), ('dependent', 0.024), ('target', 0.024), ('received', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation
Author: Martijn Goudbeek ; Emiel Krahmer
Abstract: Current Referring Expression Generation algorithms rely on domain dependent preferences for both content selection and linguistic realization. We present two experiments showing that human speakers may opt for dispreferred properties and dispreferred modifier orderings when these were salient in a preceding interaction (without speakers being consciously aware of this). We discuss the impact of these findings for current generation algorithms.
2 0.15047653 231 acl-2010-The Prevalence of Descriptive Referring Expressions in News and Narrative
Author: Raquel Hervas ; Mark Finlayson
Abstract: Generating referring expressions is a key step in Natural Language Generation. Researchers have focused almost exclusively on generating distinctive referring expressions, that is, referring expressions that uniquely identify their intended referent. While undoubtedly one of their most important functions, referring expressions can be more than distinctive. In particular, descriptive referring expressions those that provide additional information not required for distinction are critical to flu– – ent, efficient, well-written text. We present a corpus analysis in which approximately one-fifth of 7,207 referring expressions in 24,422 words ofnews and narrative are descriptive. These data show that if we are ever to fully master natural language generation, especially for the genres of news and narrative, researchers will need to devote more attention to understanding how to generate descriptive, and not just distinctive, referring expressions. 1 A Distinctive Focus Generating referring expressions is a key step in Natural Language Generation (NLG). From early treatments in seminal papers by Appelt (1985) and Reiter and Dale (1992) to the recent set of Referring Expression Generation (REG) Challenges (Gatt et al., 2009) through different corpora available for the community (Eugenio et al., 1998; van Deemter et al., 2006; Viethen and Dale, 2008), generating referring expressions has become one of the most studied areas of NLG. Researchers studying this area have, almost without exception, focused exclusively on how to generate distinctive referring expressions, that is, referring expressions that unambiguously idenMark Alan Finlayson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA, 02139 USA markaf@mit .edu tify their intended referent. Referring expressions, however, may be more than distinctive. It is widely acknowledged that they can be used to achieve multiple goals, above and beyond distinction. Here we focus on descriptive referring expressions, that is, referring expressions that are not only distinctive, but provide additional information not required for identifying their intended referent. Consider the following text, in which some of the referring expressions have been underlined: Once upon a time there was a man, who had three daughters. They lived in a house and their dresses were made of fabric. While a bit strange, the text is perfectly wellformed. All the referring expressions are distinctive, in that we can properly identify the referents of each expression. But the real text, the opening lines to the folktale The Beauty and the Beast, is actually much more lyrical: Once upon a time there was a rich merchant, who had three daughters. They lived in a very fine house and their gowns were made of the richest fabric sewn with jewels. All the boldfaced portions namely, the choice of head nouns, the addition of adjectives, the use of appositive phrases serve to perform a descriptive function, and, importantly, are all unnecessary for distinction! In all of these cases, the author is using the referring expressions as a vehicle for communicating information about the referents. This descriptive information is sometimes – – new, sometimes necessary for understanding the text, and sometimes just for added flavor. But when the expression is descriptive, as opposed to distinctive, this additional information is not required for identifying the referent of the expression, and it is these sorts of referring expressions that we will be concerned with here. 49 Uppsala,P Srwoce de dni,n 1g1s- 1of6 t Jhuely AC 20L1 20 .1 ?0c 2 C0o1n0fe Aresnsoceci Sathio rnt f Poarp Ceorsm,p paugteastio 4n9a–l5 L4i,nguistics Although these sorts of referring expression have been mostly ignored by researchers in this area1 , we show in this corpus study that descriptive expressions are in fact quite prevalent: nearly one-fifth of referring expressions in news and narrative are descriptive. In particular, our data, the trained judgments of native English speakers, show that 18% of all distinctive referring expressions in news and 17% of those in narrative folktales are descriptive. With this as motivation, we argue that descriptive referring expressions must be studied more carefully, especially as the field progresses from referring in a physical, immediate context (like that in the REG Challenges) to generating more literary forms of text. 2 Corpus Annotation This is a corpus study; our procedure was therefore to define our annotation guidelines (Section 2.1), select texts to annotate (2.2), create an annotation tool for our annotators (2.3), and, finally, train annotators, have them annotate referring expressions’ constituents and function, and then adjudicate the double-annotated texts into a gold standard (2.4). 2.1 Definitions We wrote an annotation guide explaining the difference between distinctive and descriptive referring expressions. We used the guide when training annotators, and it was available to them while annotating. With limited space here we can only give an outline of what is contained in the guide; for full details see (Finlayson and Herv a´s, 2010a). Referring Expressions We defined referring expressions as referential noun phrases and their coreferential expressions, e.g., “John kissed Mary. She blushed.”. This included referring expressions to generics (e.g., “Lions are fierce”), dates, times, and numbers, as well as events if they were referred to using a noun phrase. We included in each referring expression all the determiners, quantifiers, adjectives, appositives, and prepositional phrases that syntactically attached to that expression. When referring expressions were nested, all the nested referring expressions were also marked separately. Nuclei vs. Modifiers In the only previous corpus study of descriptive referring expressions, on 1With the exception of a small amount of work, discussed in Section 4. museum labels, Cheng et al. (2001) noted that descriptive information is often integrated into referring expressions using modifiers to the head noun. To study this, and to allow our results to be more closely compared with Cheng’s, we had our annotators split referring expressions into their constituents, portions called either nuclei or modifiers. The nuclei were the portions of the referring expression that performed the ‘core’ referring function; the modifiers were those portions that could be varied, syntactically speaking, independently of the nuclei. Annotators then assigned a distinctive or descriptive function to each constituent, rather than the referring expression as a whole. Normally, the nuclei corresponded to the head of the noun phrase. In (1), the nucleus is the token king, which we have here surrounded with square brackets. The modifiers, surrounded by parentheses, are The and old. (1) (The) (old) [king] was wise. Phrasal modifiers were marked as single modifiers, for example, in (2). (2) (The) [roof] (of the house) collapsed. It is significant that we had our annotators mark and tag the nuclei of referring expressions. Cheng and colleagues only mentioned the possibility that additional information could be introduced in the modifiers. However, O’Donnell et al. (1998) observed that often the choice of head noun can also influence the function of a referring expression. Consider (3), in which the word villain is used to refer to the King. (3) The King assumed the throne today. I ’t trust (that) [villain] one bit. don The speaker could have merely used him to refer to the King–the choice of that particular head noun villain gives us additional information about the disposition of the speaker. Thus villain is descriptive. Function: Distinctive vs. Descriptive As already noted, instead of tagging the whole referring expression, annotators tagged each constituent (nuclei and modifiers) as distinctive or descriptive. The two main tests for determining descriptiveness were (a) if presence of the constituent was unnecessary for identifying the referent, or (b) if 50 the constituent was expressed using unusual or ostentatious word choice. If either was true, the constituent was considered descriptive; otherwise, it was tagged as distinctive. In cases where the constituent was completely irrelevant to identifying the referent, it was tagged as descriptive. For example, in the folktale The Princess and the Pea, from which (1) was extracted, there is only one king in the entire story. Thus, in that story, the king is sufficient for identification, and therefore the modifier old is descriptive. This points out the importance of context in determining distinctiveness or descriptiveness; if there had been a roomful of kings, the tags on those modifiers would have been reversed. There is some question as to whether copular predicates, such as the plumber in (4), are actually referring expressions. (4) John is the plumber Our annotators marked and tagged these constructions as normal referring expressions, but they added an additional flag to identify them as copular predicates. We then excluded these constructions from our final analysis. Note that copular predicates were treated differently from appositives: in appositives the predicate was included in the referring expression, and in most cases (again, depending on context) was marked descriptive (e.g., John, the plumber, slept.). 2.2 Text Selection Our corpus comprised 62 texts, all originally written in English, from two different genres, news and folktales. We began with 30 folktales of different sizes, totaling 12,050 words. These texts were used in a previous work on the influence of dialogues on anaphora resolution algorithms (Aggarwal et al., 2009); they were assembled with an eye toward including different styles, different authors, and different time periods. Following this, we matched, approximately, the number of words in the folktales by selecting 32 texts from Wall Street Journal section of the Penn Treebank (Marcus et al., 1993). These texts were selected at ran- dom from the first 200 texts in the corpus. 2.3 The Story Workbench We used the Story Workbench application (Finlayson, 2008) to actually perform the annotation. The Story Workbench is a semantic annotation program that, among other things, includes the ability to annotate referring expressions and coreferential relationships. We added the ability to annotate nuclei, modifiers, and their functions by writing a workbench “plugin” in Java that could be installed in the application. The Story Workbench is not yet available to the public at large, being in a limited distribution beta testing phase. The developers plan to release it as free software within the next year. At that time, we also plan to release our plugin as free, downloadable software. 2.4 Annotation & Adjudication The main task of the study was the annotation of the constituents of each referring expression, as well as the function (distinctive or descriptive) of each constituent. The system generated a first pass of constituent analysis, but did not mark functions. We hired two native English annotators, neither of whom had any linguistics background, who corrected these automatically-generated constituent analyses, and tagged each constituent as descriptive or distinctive. Every text was annotated by both annotators. Adjudication of the differences was conducted by discussion between the two annotators; the second author moderated these discussions and settled irreconcilable disagreements. We followed a “train-as-you-go” paradigm, where there was no distinct training period, but rather adjudication proceeded in step with annotation, and annotators received feedback during those sessions. We calculated two measures of inter-annotator agreement: a kappa statistic and an f-measure, shown in Table 1. All of our f-measures indicated that annotators agreed almost perfectly on the location of referring expressions and their breakdown into constituents. These agreement calculations were performed on the annotators’ original corrected texts. All the kappa statistics were calculated for two tags (nuclei vs. modifier for the constituents, and distinctive vs. descriptive for the functions) over both each token assigned to a nucleus or modifier and each referring expression pair. Our kappas indicate moderate to good agreement, especially for the folktales. These results are expected because of the inherent subjectivity of language. During the adjudication sessions it became clear that different people do not consider the same information 51 as obvious or descriptive for the same concepts, and even the contexts deduced by each annotators from the texts were sometimes substantially different. 3 Results Table 2 lists the primary results of the study. We considered a referring expression descriptive if any of its constituents were descriptive. Thus, 18% of the referring expressions in the corpus added additional information beyond what was required to unambiguously identify their referent. The results were similar in both genres. Tales Articles Total Texts303262 Words Sentences 12,050 904 12,372 571 24,422 1,475 Ref. Exp.3,6813,5267,207 Dist. Ref. Exp. 3,057 2,830 5,887 Desc. Ref. Exp. 609 672 1,281 % Dist. Ref.83%81%82% % Desc. Ref. 17% 19% Table 2: Primary results. 18% Table 3 contains the percentages of descriptive and distinctive tags broken down by constituent. Like Cheng’s results, our analysis shows that descriptive referring expressions make up a significant fraction of all referring expressions. Although Cheng did not examine nuclei, our results show that the use of descriptive nuclei is small but not negligible. 4 Relation to the Field Researchers working on generating referring expressions typically acknowledge that referring expressions can perform functions other than distinction. Despite this widespread acknowledgment, researchers have, for the most part, explicitly ignored these functions. Exceptions to this trend Tales Articles Total Nuclei3,6663,5027,168 Max. Nuc/Ref Dist. Nuc. 1 95% 1 97% 1 96% Desc. Nuc. 5% 3% 4% Modifiers2,2773,6275,904 Avg. Mod/Ref Max. Mod/Ref Dist. Mod. Desc. Mod. 0.6 4 78% 22% 1.0 6 81% 19% 0.8 6 80% 20% Table 3: Breakdown of Constituent Tags are three. First is the general study of aggregation in the process of referring expression generation. Second and third are corpus studies by Cheng et al. (2001) and Jordan (2000a) that bear on the prevalence of descriptive referring expressions. The NLG subtask of aggregation can be used to imbue referring expressions with a descriptive function (Reiter and Dale, 2000, §5.3). There is a specific nk (iRned otefr aggregation 0c0al0le,d § embedding t ihsa at moves information from one clause to another inside the structure of a separate noun phrase. This type of aggregation can be used to transform two sentences such as “The princess lived in a castle. She was pretty ” into “The pretty princess lived in a castle ”. The adjective pretty, previously a cop- ular predicate, becomes a descriptive modifier of the reference to the princess, making the second text more natural and fluent. This kind of aggregation is widely used by humans for making the discourse more compact and efficient. In order to create NLG systems with this ability, we must take into account the caveat, noted by Cheng (1998), that any non-distinctive information in a referring expression must not lead to confusion about the distinctive function of the referring expression. This is by no means a trivial problem this sort of aggregation interferes with referring and coherence planning at both a local and global level (Cheng and Mellish, 2000; Cheng et al., 2001). It is clear, from the current state of the art of NLG, that we have not yet obtained a deep enough understanding of aggregation to enable us to handle these interactions. More research on the topic is needed. Two previous corpus studies have looked at the use of descriptive referring expressions. The first showed explicitly that people craft descriptive referring expressions to accomplish different – 52 goals. Jordan and colleagues (Jordan, 2000b; Jordan, 2000a) examined the use of referring expressions using the COCONUT corpus (Eugenio et al., 1998). They tested how domain and discourse goals can influence the content of non-pronominal referring expressions in a dialogue context, checking whether or not a subject’s goals led them to include non-referring information in a referring expression. Their results are intriguing because they point toward heretofore unexamined constraints, utilities and expectations (possibly genre- or styledependent) that may underlie the use ofdescriptive information to perform different functions, and are not yet captured by aggregation modules in particular or NLG systems in general. In the other corpus study, which partially inspired this work, Cheng and colleagues analyzed a set of museum descriptions, the GNOME corpus (Poesio, 2004), for the pragmatic functions of referring expressions. They had three functions in their study, in contrast to our two. Their first function (marked by their uniq tag) was equiv- alent to our distinctive function. The other two were specializations of our descriptive tag, where they differentiated between additional information that helped to understand the text (int), or additional information not necessary for understanding (att r). Despite their annotators seeming to have trouble distinguishing between the latter two tags, they did achieve good overall inter-annotator agreement. They identified 1,863 modifiers to referring expressions in their corpus, of which 47.3% fulfilled a descriptive (att r or int) function. This is supportive of our main assertion, namely, that descriptive referring expressions, not only crucial for efficient and fluent text, are actually a significant phenomenon. It is interesting, though, that Cheng’s fraction of descriptive referring expression was so much higher than ours (47.3% versus our 18%). We attribute this substantial difference to genre, in that Cheng studied museum labels, in which the writer is spaceconstrained, having to pack a lot of information into a small label. The issue bears further study, and perhaps will lead to insights into differences in writing style that may be attributed to author or genre. 5 Contributions We make two contributions in this paper. First, we assembled, double-annotated, and adjudicated into a gold-standard a corpus of 24,422 words. We marked all referring expressions, coreferential relations, and referring expression constituents, and tagged each constituent as having a descriptive or distinctive function. We wrote an annotation guide and created software that allows the annotation of this information in free text. The corpus and the guide are available on-line in a permanent digital archive (Finlayson and Herv a´s, 2010a; Finlayson and Herv a´s, 2010b). The software will also be released in the same archive when the Story Workbench annotation application is released to the public. This corpus will be useful for the automatic generation and analysis of both descriptive and distinctive referring expressions. Any kind of system intended to generate text as humans do must take into account that identifica- tion is not the only function of referring expressions. Many analysis applications would benefit from the automatic recognition of descriptive referring expressions. Second, we demonstrated that descriptive referring expressions comprise a substantial fraction (18%) of the referring expressions in news and narrative. Along with museum descriptions, studied by Cheng, it seems that news and narrative are genres where authors naturally use a large number ofdescriptive referring expressions. Given that so little work has been done on descriptive referring expressions, this indicates that the field would be well served by focusing more attention on this phenomenon. Acknowledgments This work was supported in part by the Air Force Office of Scientific Research under grant number A9550-05-1-0321, as well as by the Office of Naval Research under award number N00014091059. Any opinions, findings, and con- clusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the Office of Naval Research. This research is also partially funded the Spanish Ministry of Education and Science (TIN200914659-C03-01) and Universidad Complutense de Madrid (GR58/08). We also thank Whitman Richards, Ozlem Uzuner, Peter Szolovits, Patrick Winston, Pablo Gerv a´s, and Mark Seifter for their helpful comments and discussion, and thank our annotators Saam Batmanghelidj and Geneva Trotter. 53 References Alaukik Aggarwal, Pablo Gerv a´s, and Raquel Herv a´s. 2009. Measuring the influence of errors induced by the presence of dialogues in reference clustering of narrative text. In Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, India. Macmillan Publishers. Douglas E. Appelt. 1985. Planning English referring expressions. Artificial Intelligence, 26: 1–33. Hua Cheng and Chris Mellish. 2000. Capturing the interaction between aggregation and text planning in two generation systems. In INLG ’00: First international conference on Natural Language Generation 2000, pages 186–193, Morristown, NJ, USA. Association for Computational Linguistics. Hua Cheng, Massimo Poesio, Renate Henschel, and Chris Mellish. 2001 . Corpus-based np modifier generation. In NAACL ’01: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, pages 1–8, Morristown, NJ, USA. Association for Computational Linguistics. Hua Cheng. 1998. Embedding new information into referring expressions. In ACL-36: Proceedings of the 36thAnnual Meeting ofthe Associationfor Computational Linguistics and 17th International Conference on Computational Linguistics, pages 1478– 1480, Morristown, NJ, USA. Association for Computational Linguistics. Barbara Di Eugenio, Johanna D. Moore, Pamela W. Jordan, and Richmond H. Thomason. 1998. An empirical investigation of proposals in collaborative dialogues. In Proceedings of the 17th international conference on Computational linguistics, pages 325–329, Morristown, NJ, USA. Association for Computational Linguistics. Mark A. Finlayson and Raquel Herv a´s. 2010a. Annotation guide for the UCM/MIT indications, referring expressions, and coreference corpus (UMIREC corpus). Technical Report MIT-CSAIL-TR-2010-025, MIT Computer Science and Artificial Intelligence Laboratory. http://hdl.handle.net/1721. 1/54765. Mark A. Finlayson and Raquel Herv a´s. 2010b. UCM/MIT indications, referring expressions, and coreference corpus (UMIREC corpus). Work product, MIT Computer Science and Artificial Intelligence Laboratory. http://hdl.handle.net/1721 .1/54766. Mark A. Finlayson. 2008. Collecting semantics in the wild: The Story Workbench. In Proceedings of the AAAI Fall Symposium on Naturally-Inspired Artificial Intelligence, pages 46–53, Menlo Park, CA, USA. AAAI Press. Albert Gatt, Anja Belz, and Eric Kow. 2009. The TUNA-REG challenge 2009: overview and evaluation results. In ENLG ’09: Proceedings of the 12th European Workshop on Natural Language Generation, pages 174–182, Morristown, NJ, USA. Association for Computational Linguistics. Pamela W. Jordan. 2000a. Can nominal expressions achieve multiple goals?: an empirical study. In ACL ’00: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 142– 149, Morristown, NJ, USA. Association for Computational Linguistics. Pamela W. Jordan. 2000b. Influences on attribute selection in redescriptions: A corpus study. In Proceedings of CogSci2000, pages 250–255. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19(2):3 13–330. Michael O’Donnell, Hua Cheng, and Janet Hitzeman. 1998. Integrating referring and informing in NP planning. In Proceedings of COLING-ACL’98 Workshop on the Computational Treatment of Nominals, pages 46–56. Massimo Poesio. 2004. Discourse annotation and semantic annotation in the GNOME corpus. In DiscAnnotation ’04: Proceedings of the 2004 ACL Workshop on Discourse Annotation, pages 72–79, Morristown, NJ, USA. Association for Computational Linguistics. Ehud Reiter and Robert Dale. 1992. A fast algorithm for the generation of referring expressions. In Proceedings of the 14th conference on Computational linguistics, Nantes, France. Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press. Kees van Deemter, Ielka van der Sluis, and Albert Gatt. 2006. Building a semantically transparent corpus for the generation of referring expressions. In Proceedings of the 4th International Conference on Natural Language Generation (Special Session on Data Sharing and Evaluation), INLG-06. Jette Viethen and Robert Dale. 2008. The use of spatial relations in referring expressions. In Proceedings of the 5th International Conference on Natural Language Generation. 54
3 0.11519825 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
Author: Srinivasan Janarthanam ; Oliver Lemon
Abstract: We present a data-driven approach to learn user-adaptive referring expression generation (REG) policies for spoken dialogue systems. Referring expressions can be difficult to understand in technical domains where users may not know the technical ‘jargon’ names of the domain entities. In such cases, dialogue systems must be able to model the user’s (lexical) domain knowledge and use appropriate referring expressions. We present a reinforcement learning (RL) framework in which the sys- tem learns REG policies which can adapt to unknown users online. Furthermore, unlike supervised learning methods which require a large corpus of expert adaptive behaviour to train on, we show that effective adaptive policies can be learned from a small dialogue corpus of non-adaptive human-machine interaction, by using a RL framework and a statistical user simulation. We show that in comparison to adaptive hand-coded baseline policies, the learned policy performs significantly better, with an 18.6% average increase in adaptation accuracy. The best learned policy also takes less dialogue time (average 1.07 min less) than the best hand-coded policy. This is because the learned policies can adapt online to changing evidence about the user’s domain expertise.
4 0.094538644 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
Author: Dmitry Davidov ; Ari Rappoport
Abstract: We present a novel framework for automated extraction and approximation of numerical object attributes such as height and weight from the Web. Given an object-attribute pair, we discover and analyze attribute information for a set of comparable objects in order to infer the desired value. This allows us to approximate the desired numerical values even when no exact values can be found in the text. Our framework makes use of relation defining patterns and WordNet similarity information. First, we obtain from the Web and WordNet a list of terms similar to the given object. Then we retrieve attribute values for each term in this list, and information that allows us to compare different objects in the list and to infer the attribute value range. Finally, we combine the retrieved data for all terms from the list to select or approximate the requested value. We evaluate our method using automated question answering, WordNet enrichment, and comparison with answers given in Wikipedia and by leading search engines. In all of these, our framework provides a significant improvement.
5 0.091635294 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
Author: Ryu Iida ; Syumpei Kobayashi ; Takenobu Tokunaga
Abstract: This paper proposes an approach to reference resolution in situated dialogues by exploiting extra-linguistic information. Recently, investigations of referential behaviours involved in situations in the real world have received increasing attention by researchers (Di Eugenio et al., 2000; Byron, 2005; van Deemter, 2007; Spanger et al., 2009). In order to create an accurate reference resolution model, we need to handle extra-linguistic information as well as textual information examined by existing approaches (Soon et al., 2001 ; Ng and Cardie, 2002, etc.). In this paper, we incorporate extra-linguistic information into an existing corpus-based reference resolution model, and investigate its effects on refer- ence resolution problems within a corpus of Japanese dialogues. The results demonstrate that our proposed model achieves an accuracy of 79.0% for this task.
6 0.080303796 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
7 0.077987559 224 acl-2010-Talking NPCs in a Virtual Game World
8 0.064467087 35 acl-2010-Automated Planning for Situated Natural Language Generation
9 0.061856963 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
10 0.050633524 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
11 0.049096134 173 acl-2010-Modeling Norms of Turn-Taking in Multi-Party Conversation
12 0.048508771 209 acl-2010-Sentiment Learning on Product Reviews via Sentiment Ontology Tree
13 0.047616228 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
14 0.045766886 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project
15 0.04570752 4 acl-2010-A Cognitive Cost Model of Annotations Based on Eye-Tracking Data
16 0.045485266 59 acl-2010-Cognitively Plausible Models of Human Language Processing
17 0.041655224 170 acl-2010-Letter-Phoneme Alignment: An Exploration
18 0.040833797 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
19 0.039968424 133 acl-2010-Hierarchical Search for Word Alignment
20 0.039905213 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
topicId topicWeight
[(0, -0.099), (1, 0.034), (2, -0.061), (3, -0.088), (4, -0.018), (5, -0.053), (6, -0.1), (7, 0.072), (8, 0.014), (9, -0.025), (10, 0.007), (11, -0.023), (12, 0.002), (13, 0.043), (14, 0.021), (15, -0.023), (16, 0.141), (17, 0.07), (18, 0.024), (19, 0.084), (20, 0.051), (21, -0.007), (22, 0.068), (23, 0.023), (24, -0.066), (25, 0.059), (26, 0.039), (27, -0.021), (28, -0.085), (29, 0.056), (30, -0.188), (31, 0.032), (32, -0.02), (33, -0.054), (34, 0.101), (35, 0.105), (36, 0.079), (37, 0.012), (38, 0.043), (39, -0.05), (40, -0.052), (41, 0.115), (42, -0.019), (43, -0.03), (44, -0.048), (45, 0.061), (46, -0.015), (47, 0.084), (48, -0.148), (49, 0.126)]
simIndex simValue paperId paperTitle
same-paper 1 0.960392 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation
Author: Martijn Goudbeek ; Emiel Krahmer
Abstract: Current Referring Expression Generation algorithms rely on domain dependent preferences for both content selection and linguistic realization. We present two experiments showing that human speakers may opt for dispreferred properties and dispreferred modifier orderings when these were salient in a preceding interaction (without speakers being consciously aware of this). We discuss the impact of these findings for current generation algorithms.
2 0.71549839 231 acl-2010-The Prevalence of Descriptive Referring Expressions in News and Narrative
Author: Raquel Hervas ; Mark Finlayson
Abstract: Generating referring expressions is a key step in Natural Language Generation. Researchers have focused almost exclusively on generating distinctive referring expressions, that is, referring expressions that uniquely identify their intended referent. While undoubtedly one of their most important functions, referring expressions can be more than distinctive. In particular, descriptive referring expressions those that provide additional information not required for distinction are critical to flu– – ent, efficient, well-written text. We present a corpus analysis in which approximately one-fifth of 7,207 referring expressions in 24,422 words ofnews and narrative are descriptive. These data show that if we are ever to fully master natural language generation, especially for the genres of news and narrative, researchers will need to devote more attention to understanding how to generate descriptive, and not just distinctive, referring expressions. 1 A Distinctive Focus Generating referring expressions is a key step in Natural Language Generation (NLG). From early treatments in seminal papers by Appelt (1985) and Reiter and Dale (1992) to the recent set of Referring Expression Generation (REG) Challenges (Gatt et al., 2009) through different corpora available for the community (Eugenio et al., 1998; van Deemter et al., 2006; Viethen and Dale, 2008), generating referring expressions has become one of the most studied areas of NLG. Researchers studying this area have, almost without exception, focused exclusively on how to generate distinctive referring expressions, that is, referring expressions that unambiguously idenMark Alan Finlayson Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Cambridge, MA, 02139 USA markaf@mit .edu tify their intended referent. Referring expressions, however, may be more than distinctive. It is widely acknowledged that they can be used to achieve multiple goals, above and beyond distinction. Here we focus on descriptive referring expressions, that is, referring expressions that are not only distinctive, but provide additional information not required for identifying their intended referent. Consider the following text, in which some of the referring expressions have been underlined: Once upon a time there was a man, who had three daughters. They lived in a house and their dresses were made of fabric. While a bit strange, the text is perfectly wellformed. All the referring expressions are distinctive, in that we can properly identify the referents of each expression. But the real text, the opening lines to the folktale The Beauty and the Beast, is actually much more lyrical: Once upon a time there was a rich merchant, who had three daughters. They lived in a very fine house and their gowns were made of the richest fabric sewn with jewels. All the boldfaced portions namely, the choice of head nouns, the addition of adjectives, the use of appositive phrases serve to perform a descriptive function, and, importantly, are all unnecessary for distinction! In all of these cases, the author is using the referring expressions as a vehicle for communicating information about the referents. This descriptive information is sometimes – – new, sometimes necessary for understanding the text, and sometimes just for added flavor. But when the expression is descriptive, as opposed to distinctive, this additional information is not required for identifying the referent of the expression, and it is these sorts of referring expressions that we will be concerned with here. 49 Uppsala,P Srwoce de dni,n 1g1s- 1of6 t Jhuely AC 20L1 20 .1 ?0c 2 C0o1n0fe Aresnsoceci Sathio rnt f Poarp Ceorsm,p paugteastio 4n9a–l5 L4i,nguistics Although these sorts of referring expression have been mostly ignored by researchers in this area1 , we show in this corpus study that descriptive expressions are in fact quite prevalent: nearly one-fifth of referring expressions in news and narrative are descriptive. In particular, our data, the trained judgments of native English speakers, show that 18% of all distinctive referring expressions in news and 17% of those in narrative folktales are descriptive. With this as motivation, we argue that descriptive referring expressions must be studied more carefully, especially as the field progresses from referring in a physical, immediate context (like that in the REG Challenges) to generating more literary forms of text. 2 Corpus Annotation This is a corpus study; our procedure was therefore to define our annotation guidelines (Section 2.1), select texts to annotate (2.2), create an annotation tool for our annotators (2.3), and, finally, train annotators, have them annotate referring expressions’ constituents and function, and then adjudicate the double-annotated texts into a gold standard (2.4). 2.1 Definitions We wrote an annotation guide explaining the difference between distinctive and descriptive referring expressions. We used the guide when training annotators, and it was available to them while annotating. With limited space here we can only give an outline of what is contained in the guide; for full details see (Finlayson and Herv a´s, 2010a). Referring Expressions We defined referring expressions as referential noun phrases and their coreferential expressions, e.g., “John kissed Mary. She blushed.”. This included referring expressions to generics (e.g., “Lions are fierce”), dates, times, and numbers, as well as events if they were referred to using a noun phrase. We included in each referring expression all the determiners, quantifiers, adjectives, appositives, and prepositional phrases that syntactically attached to that expression. When referring expressions were nested, all the nested referring expressions were also marked separately. Nuclei vs. Modifiers In the only previous corpus study of descriptive referring expressions, on 1With the exception of a small amount of work, discussed in Section 4. museum labels, Cheng et al. (2001) noted that descriptive information is often integrated into referring expressions using modifiers to the head noun. To study this, and to allow our results to be more closely compared with Cheng’s, we had our annotators split referring expressions into their constituents, portions called either nuclei or modifiers. The nuclei were the portions of the referring expression that performed the ‘core’ referring function; the modifiers were those portions that could be varied, syntactically speaking, independently of the nuclei. Annotators then assigned a distinctive or descriptive function to each constituent, rather than the referring expression as a whole. Normally, the nuclei corresponded to the head of the noun phrase. In (1), the nucleus is the token king, which we have here surrounded with square brackets. The modifiers, surrounded by parentheses, are The and old. (1) (The) (old) [king] was wise. Phrasal modifiers were marked as single modifiers, for example, in (2). (2) (The) [roof] (of the house) collapsed. It is significant that we had our annotators mark and tag the nuclei of referring expressions. Cheng and colleagues only mentioned the possibility that additional information could be introduced in the modifiers. However, O’Donnell et al. (1998) observed that often the choice of head noun can also influence the function of a referring expression. Consider (3), in which the word villain is used to refer to the King. (3) The King assumed the throne today. I ’t trust (that) [villain] one bit. don The speaker could have merely used him to refer to the King–the choice of that particular head noun villain gives us additional information about the disposition of the speaker. Thus villain is descriptive. Function: Distinctive vs. Descriptive As already noted, instead of tagging the whole referring expression, annotators tagged each constituent (nuclei and modifiers) as distinctive or descriptive. The two main tests for determining descriptiveness were (a) if presence of the constituent was unnecessary for identifying the referent, or (b) if 50 the constituent was expressed using unusual or ostentatious word choice. If either was true, the constituent was considered descriptive; otherwise, it was tagged as distinctive. In cases where the constituent was completely irrelevant to identifying the referent, it was tagged as descriptive. For example, in the folktale The Princess and the Pea, from which (1) was extracted, there is only one king in the entire story. Thus, in that story, the king is sufficient for identification, and therefore the modifier old is descriptive. This points out the importance of context in determining distinctiveness or descriptiveness; if there had been a roomful of kings, the tags on those modifiers would have been reversed. There is some question as to whether copular predicates, such as the plumber in (4), are actually referring expressions. (4) John is the plumber Our annotators marked and tagged these constructions as normal referring expressions, but they added an additional flag to identify them as copular predicates. We then excluded these constructions from our final analysis. Note that copular predicates were treated differently from appositives: in appositives the predicate was included in the referring expression, and in most cases (again, depending on context) was marked descriptive (e.g., John, the plumber, slept.). 2.2 Text Selection Our corpus comprised 62 texts, all originally written in English, from two different genres, news and folktales. We began with 30 folktales of different sizes, totaling 12,050 words. These texts were used in a previous work on the influence of dialogues on anaphora resolution algorithms (Aggarwal et al., 2009); they were assembled with an eye toward including different styles, different authors, and different time periods. Following this, we matched, approximately, the number of words in the folktales by selecting 32 texts from Wall Street Journal section of the Penn Treebank (Marcus et al., 1993). These texts were selected at ran- dom from the first 200 texts in the corpus. 2.3 The Story Workbench We used the Story Workbench application (Finlayson, 2008) to actually perform the annotation. The Story Workbench is a semantic annotation program that, among other things, includes the ability to annotate referring expressions and coreferential relationships. We added the ability to annotate nuclei, modifiers, and their functions by writing a workbench “plugin” in Java that could be installed in the application. The Story Workbench is not yet available to the public at large, being in a limited distribution beta testing phase. The developers plan to release it as free software within the next year. At that time, we also plan to release our plugin as free, downloadable software. 2.4 Annotation & Adjudication The main task of the study was the annotation of the constituents of each referring expression, as well as the function (distinctive or descriptive) of each constituent. The system generated a first pass of constituent analysis, but did not mark functions. We hired two native English annotators, neither of whom had any linguistics background, who corrected these automatically-generated constituent analyses, and tagged each constituent as descriptive or distinctive. Every text was annotated by both annotators. Adjudication of the differences was conducted by discussion between the two annotators; the second author moderated these discussions and settled irreconcilable disagreements. We followed a “train-as-you-go” paradigm, where there was no distinct training period, but rather adjudication proceeded in step with annotation, and annotators received feedback during those sessions. We calculated two measures of inter-annotator agreement: a kappa statistic and an f-measure, shown in Table 1. All of our f-measures indicated that annotators agreed almost perfectly on the location of referring expressions and their breakdown into constituents. These agreement calculations were performed on the annotators’ original corrected texts. All the kappa statistics were calculated for two tags (nuclei vs. modifier for the constituents, and distinctive vs. descriptive for the functions) over both each token assigned to a nucleus or modifier and each referring expression pair. Our kappas indicate moderate to good agreement, especially for the folktales. These results are expected because of the inherent subjectivity of language. During the adjudication sessions it became clear that different people do not consider the same information 51 as obvious or descriptive for the same concepts, and even the contexts deduced by each annotators from the texts were sometimes substantially different. 3 Results Table 2 lists the primary results of the study. We considered a referring expression descriptive if any of its constituents were descriptive. Thus, 18% of the referring expressions in the corpus added additional information beyond what was required to unambiguously identify their referent. The results were similar in both genres. Tales Articles Total Texts303262 Words Sentences 12,050 904 12,372 571 24,422 1,475 Ref. Exp.3,6813,5267,207 Dist. Ref. Exp. 3,057 2,830 5,887 Desc. Ref. Exp. 609 672 1,281 % Dist. Ref.83%81%82% % Desc. Ref. 17% 19% Table 2: Primary results. 18% Table 3 contains the percentages of descriptive and distinctive tags broken down by constituent. Like Cheng’s results, our analysis shows that descriptive referring expressions make up a significant fraction of all referring expressions. Although Cheng did not examine nuclei, our results show that the use of descriptive nuclei is small but not negligible. 4 Relation to the Field Researchers working on generating referring expressions typically acknowledge that referring expressions can perform functions other than distinction. Despite this widespread acknowledgment, researchers have, for the most part, explicitly ignored these functions. Exceptions to this trend Tales Articles Total Nuclei3,6663,5027,168 Max. Nuc/Ref Dist. Nuc. 1 95% 1 97% 1 96% Desc. Nuc. 5% 3% 4% Modifiers2,2773,6275,904 Avg. Mod/Ref Max. Mod/Ref Dist. Mod. Desc. Mod. 0.6 4 78% 22% 1.0 6 81% 19% 0.8 6 80% 20% Table 3: Breakdown of Constituent Tags are three. First is the general study of aggregation in the process of referring expression generation. Second and third are corpus studies by Cheng et al. (2001) and Jordan (2000a) that bear on the prevalence of descriptive referring expressions. The NLG subtask of aggregation can be used to imbue referring expressions with a descriptive function (Reiter and Dale, 2000, §5.3). There is a specific nk (iRned otefr aggregation 0c0al0le,d § embedding t ihsa at moves information from one clause to another inside the structure of a separate noun phrase. This type of aggregation can be used to transform two sentences such as “The princess lived in a castle. She was pretty ” into “The pretty princess lived in a castle ”. The adjective pretty, previously a cop- ular predicate, becomes a descriptive modifier of the reference to the princess, making the second text more natural and fluent. This kind of aggregation is widely used by humans for making the discourse more compact and efficient. In order to create NLG systems with this ability, we must take into account the caveat, noted by Cheng (1998), that any non-distinctive information in a referring expression must not lead to confusion about the distinctive function of the referring expression. This is by no means a trivial problem this sort of aggregation interferes with referring and coherence planning at both a local and global level (Cheng and Mellish, 2000; Cheng et al., 2001). It is clear, from the current state of the art of NLG, that we have not yet obtained a deep enough understanding of aggregation to enable us to handle these interactions. More research on the topic is needed. Two previous corpus studies have looked at the use of descriptive referring expressions. The first showed explicitly that people craft descriptive referring expressions to accomplish different – 52 goals. Jordan and colleagues (Jordan, 2000b; Jordan, 2000a) examined the use of referring expressions using the COCONUT corpus (Eugenio et al., 1998). They tested how domain and discourse goals can influence the content of non-pronominal referring expressions in a dialogue context, checking whether or not a subject’s goals led them to include non-referring information in a referring expression. Their results are intriguing because they point toward heretofore unexamined constraints, utilities and expectations (possibly genre- or styledependent) that may underlie the use ofdescriptive information to perform different functions, and are not yet captured by aggregation modules in particular or NLG systems in general. In the other corpus study, which partially inspired this work, Cheng and colleagues analyzed a set of museum descriptions, the GNOME corpus (Poesio, 2004), for the pragmatic functions of referring expressions. They had three functions in their study, in contrast to our two. Their first function (marked by their uniq tag) was equiv- alent to our distinctive function. The other two were specializations of our descriptive tag, where they differentiated between additional information that helped to understand the text (int), or additional information not necessary for understanding (att r). Despite their annotators seeming to have trouble distinguishing between the latter two tags, they did achieve good overall inter-annotator agreement. They identified 1,863 modifiers to referring expressions in their corpus, of which 47.3% fulfilled a descriptive (att r or int) function. This is supportive of our main assertion, namely, that descriptive referring expressions, not only crucial for efficient and fluent text, are actually a significant phenomenon. It is interesting, though, that Cheng’s fraction of descriptive referring expression was so much higher than ours (47.3% versus our 18%). We attribute this substantial difference to genre, in that Cheng studied museum labels, in which the writer is spaceconstrained, having to pack a lot of information into a small label. The issue bears further study, and perhaps will lead to insights into differences in writing style that may be attributed to author or genre. 5 Contributions We make two contributions in this paper. First, we assembled, double-annotated, and adjudicated into a gold-standard a corpus of 24,422 words. We marked all referring expressions, coreferential relations, and referring expression constituents, and tagged each constituent as having a descriptive or distinctive function. We wrote an annotation guide and created software that allows the annotation of this information in free text. The corpus and the guide are available on-line in a permanent digital archive (Finlayson and Herv a´s, 2010a; Finlayson and Herv a´s, 2010b). The software will also be released in the same archive when the Story Workbench annotation application is released to the public. This corpus will be useful for the automatic generation and analysis of both descriptive and distinctive referring expressions. Any kind of system intended to generate text as humans do must take into account that identifica- tion is not the only function of referring expressions. Many analysis applications would benefit from the automatic recognition of descriptive referring expressions. Second, we demonstrated that descriptive referring expressions comprise a substantial fraction (18%) of the referring expressions in news and narrative. Along with museum descriptions, studied by Cheng, it seems that news and narrative are genres where authors naturally use a large number ofdescriptive referring expressions. Given that so little work has been done on descriptive referring expressions, this indicates that the field would be well served by focusing more attention on this phenomenon. Acknowledgments This work was supported in part by the Air Force Office of Scientific Research under grant number A9550-05-1-0321, as well as by the Office of Naval Research under award number N00014091059. Any opinions, findings, and con- clusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the Office of Naval Research. This research is also partially funded the Spanish Ministry of Education and Science (TIN200914659-C03-01) and Universidad Complutense de Madrid (GR58/08). We also thank Whitman Richards, Ozlem Uzuner, Peter Szolovits, Patrick Winston, Pablo Gerv a´s, and Mark Seifter for their helpful comments and discussion, and thank our annotators Saam Batmanghelidj and Geneva Trotter. 53 References Alaukik Aggarwal, Pablo Gerv a´s, and Raquel Herv a´s. 2009. Measuring the influence of errors induced by the presence of dialogues in reference clustering of narrative text. In Proceedings of ICON-2009: 7th International Conference on Natural Language Processing, India. Macmillan Publishers. Douglas E. Appelt. 1985. Planning English referring expressions. Artificial Intelligence, 26: 1–33. Hua Cheng and Chris Mellish. 2000. Capturing the interaction between aggregation and text planning in two generation systems. In INLG ’00: First international conference on Natural Language Generation 2000, pages 186–193, Morristown, NJ, USA. Association for Computational Linguistics. Hua Cheng, Massimo Poesio, Renate Henschel, and Chris Mellish. 2001 . Corpus-based np modifier generation. In NAACL ’01: Second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies 2001, pages 1–8, Morristown, NJ, USA. Association for Computational Linguistics. Hua Cheng. 1998. Embedding new information into referring expressions. In ACL-36: Proceedings of the 36thAnnual Meeting ofthe Associationfor Computational Linguistics and 17th International Conference on Computational Linguistics, pages 1478– 1480, Morristown, NJ, USA. Association for Computational Linguistics. Barbara Di Eugenio, Johanna D. Moore, Pamela W. Jordan, and Richmond H. Thomason. 1998. An empirical investigation of proposals in collaborative dialogues. In Proceedings of the 17th international conference on Computational linguistics, pages 325–329, Morristown, NJ, USA. Association for Computational Linguistics. Mark A. Finlayson and Raquel Herv a´s. 2010a. Annotation guide for the UCM/MIT indications, referring expressions, and coreference corpus (UMIREC corpus). Technical Report MIT-CSAIL-TR-2010-025, MIT Computer Science and Artificial Intelligence Laboratory. http://hdl.handle.net/1721. 1/54765. Mark A. Finlayson and Raquel Herv a´s. 2010b. UCM/MIT indications, referring expressions, and coreference corpus (UMIREC corpus). Work product, MIT Computer Science and Artificial Intelligence Laboratory. http://hdl.handle.net/1721 .1/54766. Mark A. Finlayson. 2008. Collecting semantics in the wild: The Story Workbench. In Proceedings of the AAAI Fall Symposium on Naturally-Inspired Artificial Intelligence, pages 46–53, Menlo Park, CA, USA. AAAI Press. Albert Gatt, Anja Belz, and Eric Kow. 2009. The TUNA-REG challenge 2009: overview and evaluation results. In ENLG ’09: Proceedings of the 12th European Workshop on Natural Language Generation, pages 174–182, Morristown, NJ, USA. Association for Computational Linguistics. Pamela W. Jordan. 2000a. Can nominal expressions achieve multiple goals?: an empirical study. In ACL ’00: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, pages 142– 149, Morristown, NJ, USA. Association for Computational Linguistics. Pamela W. Jordan. 2000b. Influences on attribute selection in redescriptions: A corpus study. In Proceedings of CogSci2000, pages 250–255. Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of english: the penn treebank. Computational Linguistics, 19(2):3 13–330. Michael O’Donnell, Hua Cheng, and Janet Hitzeman. 1998. Integrating referring and informing in NP planning. In Proceedings of COLING-ACL’98 Workshop on the Computational Treatment of Nominals, pages 46–56. Massimo Poesio. 2004. Discourse annotation and semantic annotation in the GNOME corpus. In DiscAnnotation ’04: Proceedings of the 2004 ACL Workshop on Discourse Annotation, pages 72–79, Morristown, NJ, USA. Association for Computational Linguistics. Ehud Reiter and Robert Dale. 1992. A fast algorithm for the generation of referring expressions. In Proceedings of the 14th conference on Computational linguistics, Nantes, France. Ehud Reiter and Robert Dale. 2000. Building Natural Language Generation Systems. Cambridge University Press. Kees van Deemter, Ielka van der Sluis, and Albert Gatt. 2006. Building a semantically transparent corpus for the generation of referring expressions. In Proceedings of the 4th International Conference on Natural Language Generation (Special Session on Data Sharing and Evaluation), INLG-06. Jette Viethen and Robert Dale. 2008. The use of spatial relations in referring expressions. In Proceedings of the 5th International Conference on Natural Language Generation. 54
3 0.60283732 149 acl-2010-Incorporating Extra-Linguistic Information into Reference Resolution in Collaborative Task Dialogue
Author: Ryu Iida ; Syumpei Kobayashi ; Takenobu Tokunaga
Abstract: This paper proposes an approach to reference resolution in situated dialogues by exploiting extra-linguistic information. Recently, investigations of referential behaviours involved in situations in the real world have received increasing attention by researchers (Di Eugenio et al., 2000; Byron, 2005; van Deemter, 2007; Spanger et al., 2009). In order to create an accurate reference resolution model, we need to handle extra-linguistic information as well as textual information examined by existing approaches (Soon et al., 2001 ; Ng and Cardie, 2002, etc.). In this paper, we incorporate extra-linguistic information into an existing corpus-based reference resolution model, and investigate its effects on refer- ence resolution problems within a corpus of Japanese dialogues. The results demonstrate that our proposed model achieves an accuracy of 79.0% for this task.
4 0.53405809 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
Author: Srinivasan Janarthanam ; Oliver Lemon
Abstract: We present a data-driven approach to learn user-adaptive referring expression generation (REG) policies for spoken dialogue systems. Referring expressions can be difficult to understand in technical domains where users may not know the technical ‘jargon’ names of the domain entities. In such cases, dialogue systems must be able to model the user’s (lexical) domain knowledge and use appropriate referring expressions. We present a reinforcement learning (RL) framework in which the sys- tem learns REG policies which can adapt to unknown users online. Furthermore, unlike supervised learning methods which require a large corpus of expert adaptive behaviour to train on, we show that effective adaptive policies can be learned from a small dialogue corpus of non-adaptive human-machine interaction, by using a RL framework and a statistical user simulation. We show that in comparison to adaptive hand-coded baseline policies, the learned policy performs significantly better, with an 18.6% average increase in adaptation accuracy. The best learned policy also takes less dialogue time (average 1.07 min less) than the best hand-coded policy. This is because the learned policies can adapt online to changing evidence about the user’s domain expertise.
5 0.44897494 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
Author: Verena Rieser ; Oliver Lemon ; Xingkun Liu
Abstract: We present a novel approach to Information Presentation (IP) in Spoken Dialogue Systems (SDS) using a data-driven statistical optimisation framework for content planning and attribute selection. First we collect data in a Wizard-of-Oz (WoZ) experiment and use it to build a supervised model of human behaviour. This forms a baseline for measuring the performance of optimised policies, developed from this data using Reinforcement Learning (RL) methods. We show that the optimised policies significantly outperform the baselines in a variety of generation scenarios: while the supervised model is able to attain up to 87.6% of the possible reward on this task, the RL policies are significantly better in 5 out of 6 scenarios, gaining up to 91.5% of the total possible reward. The RL policies perform especially well in more complex scenarios. We are also the first to show that adding predictive “lower level” features (e.g. from the NLG realiser) is important for optimising IP strategies according to user preferences. This provides new insights into the nature of the IP problem for SDS.
6 0.3972269 193 acl-2010-Personalising Speech-To-Speech Translation in the EMIME Project
7 0.37097833 113 acl-2010-Extraction and Approximation of Numerical Attributes from the Web
8 0.36124977 196 acl-2010-Plot Induction and Evolutionary Search for Story Generation
9 0.35168552 224 acl-2010-Talking NPCs in a Virtual Game World
10 0.33775434 58 acl-2010-Classification of Feedback Expressions in Multimodal Data
11 0.33266288 91 acl-2010-Domain Adaptation of Maximum Entropy Language Models
12 0.29747912 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
13 0.29686883 35 acl-2010-Automated Planning for Situated Natural Language Generation
14 0.28721234 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
15 0.27440628 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
16 0.27216384 142 acl-2010-Importance-Driven Turn-Bidding for Spoken Dialogue Systems
17 0.27089962 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources
18 0.25821507 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
19 0.25109246 82 acl-2010-Demonstration of a Prototype for a Conversational Companion for Reminiscing about Images
20 0.24910024 137 acl-2010-How Spoken Language Corpora Can Refine Current Speech Motor Training Methodologies
topicId topicWeight
[(18, 0.011), (25, 0.045), (29, 0.389), (39, 0.033), (42, 0.046), (59, 0.045), (73, 0.031), (78, 0.047), (80, 0.011), (83, 0.098), (84, 0.046), (98, 0.068)]
simIndex simValue paperId paperTitle
same-paper 1 0.71416605 199 acl-2010-Preferences versus Adaptation during Referring Expression Generation
Author: Martijn Goudbeek ; Emiel Krahmer
Abstract: Current Referring Expression Generation algorithms rely on domain dependent preferences for both content selection and linguistic realization. We present two experiments showing that human speakers may opt for dispreferred properties and dispreferred modifier orderings when these were salient in a preceding interaction (without speakers being consciously aware of this). We discuss the impact of these findings for current generation algorithms.
2 0.35478663 1 acl-2010-"Ask Not What Textual Entailment Can Do for You..."
Author: Mark Sammons ; V.G.Vinod Vydiswaran ; Dan Roth
Abstract: We challenge the NLP community to participate in a large-scale, distributed effort to design and build resources for developing and evaluating solutions to new and existing NLP tasks in the context of Recognizing Textual Entailment. We argue that the single global label with which RTE examples are annotated is insufficient to effectively evaluate RTE system performance; to promote research on smaller, related NLP tasks, we believe more detailed annotation and evaluation are needed, and that this effort will benefit not just RTE researchers, but the NLP community as a whole. We use insights from successful RTE systems to propose a model for identifying and annotating textual infer- ence phenomena in textual entailment examples, and we present the results of a pilot annotation study that show this model is feasible and the results immediately useful.
3 0.35099551 18 acl-2010-A Study of Information Retrieval Weighting Schemes for Sentiment Analysis
Author: Georgios Paltoglou ; Mike Thelwall
Abstract: Most sentiment analysis approaches use as baseline a support vector machines (SVM) classifier with binary unigram weights. In this paper, we explore whether more sophisticated feature weighting schemes from Information Retrieval can enhance classification accuracy. We show that variants of the classic tf.idf scheme adapted to sentiment analysis provide significant increases in accuracy, especially when using a sublinear function for term frequency weights and document frequency smoothing. The techniques are tested on a wide selection of data sets and produce the best accuracy to our knowledge.
4 0.34977388 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
Author: Jackie Chi Kit Cheung ; Gerald Penn
Abstract: One goal of natural language generation is to produce coherent text that presents information in a logical order. In this paper, we show that topological fields, which model high-level clausal structure, are an important component of local coherence in German. First, we show in a sentence ordering experiment that topological field information improves the entity grid model of Barzilay and Lapata (2008) more than grammatical role and simple clausal order information do, particularly when manual annotations of this information are not available. Then, we incorporate the model enhanced with topological fields into a natural language generation system that generates constituent orders for German text, and show that the added coherence component improves performance slightly, though not statistically significantly.
5 0.34715396 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
Author: Junhui Li ; Guodong Zhou ; Hwee Tou Ng
Abstract: This paper explores joint syntactic and semantic parsing of Chinese to further improve the performance of both syntactic and semantic parsing, in particular the performance of semantic parsing (in this paper, semantic role labeling). This is done from two levels. Firstly, an integrated parsing approach is proposed to integrate semantic parsing into the syntactic parsing process. Secondly, semantic information generated by semantic parsing is incorporated into the syntactic parsing model to better capture semantic information in syntactic parsing. Evaluation on Chinese TreeBank, Chinese PropBank, and Chinese NomBank shows that our integrated parsing approach outperforms the pipeline parsing approach on n-best parse trees, a natural extension of the widely used pipeline parsing approach on the top-best parse tree. Moreover, it shows that incorporating semantic role-related information into the syntactic parsing model significantly improves the performance of both syntactic parsing and semantic parsing. To our best knowledge, this is the first research on exploring syntactic parsing and semantic role labeling for both verbal and nominal predicates in an integrated way. 1
6 0.34703767 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews
7 0.34668055 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse
8 0.34650671 33 acl-2010-Assessing the Role of Discourse References in Entailment Inference
9 0.3464582 158 acl-2010-Latent Variable Models of Selectional Preference
10 0.34618533 134 acl-2010-Hierarchical Sequential Learning for Extracting Opinions and Their Attributes
11 0.34276238 65 acl-2010-Complexity Metrics in an Incremental Right-Corner Parser
12 0.34251958 73 acl-2010-Coreference Resolution with Reconcile
13 0.34165233 71 acl-2010-Convolution Kernel over Packed Parse Forest
14 0.34107539 219 acl-2010-Supervised Noun Phrase Coreference Research: The First Fifteen Years
15 0.33916134 13 acl-2010-A Rational Model of Eye Movement Control in Reading
16 0.33913317 112 acl-2010-Extracting Social Networks from Literary Fiction
17 0.33893159 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
18 0.33765611 59 acl-2010-Cognitively Plausible Models of Human Language Processing
19 0.33757228 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
20 0.33671463 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses