acl acl2012 acl2012-193 knowledge-graph by maker-knowledge-mining

193 acl-2012-Text-level Discourse Parsing with Rich Linguistic Features

Source: pdf

Author: Vanessa Wei Feng ; Graeme Hirst

Abstract: In this paper, we develop an RST-style textlevel discourse parser, based on the HILDA discourse parser (Hernault et al., 2010b). We significantly improve its tree-building step by incorporating our own rich linguistic features. We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 we i feng@ c s t oront o edu Abstract In this paper, we develop an RST-style textlevel discourse parser, based on the HILDA discourse parser (Hernault et al. [sent-3, score-1.613]

2 We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions. [sent-6, score-1.648]

3 Research in discourse parsing aims to unmask such relations in text, which is helpful for many downstream applications such as summarization, information retrieval, and question answering. [sent-8, score-0.861]

4 However, most existing discourse parsers operate on individual sentences alone, whereas discourse parsing is more powerful for text-level analysis. [sent-9, score-1.526]

5 Therefore, in this work, we aim to develop a textlevel discourse parser. [sent-10, score-0.776]

6 We follow the framework of Rhetorical Structure Theory (Mann and Thompson, 1988) and we take the HILDA discourse parser (Hernault et al. [sent-11, score-0.785]

7 , 2010b) as the basis of our work, because it is the first fully implemented text-level discourse parser with state-of-the-art performance. [sent-12, score-0.812]

8 edu difficulty with extending traditional sentence-level discourse parsing to text-level parsing, by comparing discourse parsing performance under different discourse conditions. [sent-18, score-2.393]

9 1 The RST Discourse Treebank Rhetorical Structure Theory (Mann and Thompson, 1988) is one of the most widely accepted frameworks for discourse analysis. [sent-20, score-0.721]

10 In the framework of RST, a coherent text can be represented as a discourse tree whose leaves are non-overlapping text spans called elementary discourse units (EDUs); these are the minimal text units of discourse trees. [sent-21, score-2.539]

11 Adjacent nodes can be related through particular discourse relations to form a discourse subtree, which can then be related to other adjacent nodes in the tree structure. [sent-22, score-1.63]

12 According to RST, there are two types of discourse relations, hypotactic (“mononuclear”) and paratactic (“multi-nuclear”). [sent-23, score-0.721]

13 In mononuclear relations, one of the text spans, the nucleus, is more salient than the other, the satellite, while in multinuclear relations, all text spans are equally important for interpretation. [sent-24, score-0.26]

14 Its discourse tree representation is shown below in the figure, following the notational convention ofRST. [sent-26, score-0.793]

15 The two EDUs e1 and e2 are related by a mononuclear relation ATTRIBUTION, where e1 is the more salient span; the span (e1-e2) and the EDU e3 are related by a multi-nuclear relation SAME-UNIT, where they are equally salient. [sent-27, score-0.349]

16 ]e4 (e1-e4) (e1-2)(e1)a-ets3riab)umteio-(neu2cni)otdi(eo3n)(e4 Figure 1: An example text fragment (wsj 0616) composed of four EDUs, and its RST discourse tree representation. [sent-31, score-0.829]

17 In RST-DT, the original 24 discourse relations defined by Mann and Thompson (1988) are further divided into a set of 18 relation classes with 78 finergrained rhetorical relations in total, which provides a high level of expressivity. [sent-35, score-1.077]

18 Unlike RST-DT, PDTB does not follow the framework of RST; rather, it follows a lexically grounded, predicate-argument approach with a different set of predefined discourse relations, as proposed by Webber (2004). [sent-40, score-0.721]

19 The argument that the discourse connective structurally attaches to is called Arg2, and the other argument is called Arg1 unlike in RST, the two arguments are not distinguished by their saliency for interpretation. [sent-44, score-0.745]

20 , PDTB-styled discourse relations exist only in a very local contextual window. [sent-47, score-0.866]

21 , 2010b) have been proposed, which extracted different textual information and adopted various approaches for discourse tree building. [sent-52, score-0.793]

22 Here we briefly review two fully implemented text-level discourse parsers with the state-of-the-art performance. [sent-53, score-0.721]

23 The HILDA discourse parser of Hernault and his colleagues (duVerle and Prendinger, 2009; Hernault et al. [sent-54, score-0.785]

24 , 2010b) is the first fully-implemented featurebased discourse parser that works at the full text level. [sent-55, score-0.856]

25 They showed that the production rules extracted from constituent parse trees are the most effective features, while contextual features are the weakest. [sent-69, score-0.226]

26 Subsequently, they fully implemented an end-to-end PDTB-style discourse parser (Lin et al. [sent-70, score-0.785]

27 However, because of infrequent relations for which we do not have sufficient instances for training, many unseen — — features occur in the test data, resulting in poor test performance. [sent-74, score-0.22]

28 4 Text-level discourse parsing Not until recently has discourse parsing for full texts been a research focus previously, discourse parsing was only performed on the sentence level1 . [sent-77, score-2.387]

29 In this section, we explain why we believe text-level discourse parsing is crucial. [sent-78, score-0.784]

30 Unlike syntactic parsing, where we are almost never interested in parsing above sentence level, sentence-level parsing is not sufficient for discourse parsing. [sent-79, score-0.847]

31 While a sequence of local (sentence-level) grammaticality can be considered to be global grammaticality, a sequence of local discourse coherence does not necessarily form a globally coherent text. [sent-80, score-0.776]

32 If we attempt to represent the text as an RST discourse tree like the one shown in Figure 1, we find that no discourse relation can be assigned to relate the spans (e1-e2) and (e3-e4) and the text cannot be represented by a valid discourse tree structure. [sent-83, score-2.609]

33 In order to rule out such unreasonable transitions between sentences, we have to expand the text units upon which discourse parsing is performed: from sentences to paragraphs, and finally paragraphs to 1Strictly speaking, for PDTB-style discourse parsing (e. [sent-84, score-1.643]

34 (2009; 2010)), there is no absolute distinction between sentence-level and text-level parsing, since in PDTB, discourse relations are annotated at a level no higher than that of adjacent sentences. [sent-87, score-0.837]

35 No discourse relation can be associated with the spans (e1-e2) and (e3-e4). [sent-94, score-0.951]

36 Text-level discourse parsing imposes more constraints on the global coherence than sentence-level discourse parsing. [sent-96, score-1.525]

37 However, if, technically speaking, text-level discourse parsing were no more difficult than sentence-level parsing, any sentence-level discourse parser could be easily upgraded to a textlevel discourse parser just by applying it to full texts. [sent-97, score-2.444]

38 5 Method We use the HILDA discourse parser of Hernault et al. [sent-99, score-0.785]

39 We choose HILDA because it is a fully implemented text-level discourse parser with the best reported performance up to now. [sent-104, score-0.811]

40 ’s strategy of performing feature selection prior to classification proves to be effective in reducing the total feature dimensions, which is favorable since we wish to incorporate rich linguistic features into our discourse parser. [sent-109, score-0.988]

41 Then, from the EDUs, a bottom-up approach is applied to build a discourse tree for the full text. [sent-112, score-0.828]

42 Initially, a binary Structure classifier evaluates whether a discourse relation is likely to hold between consecutive EDUs. [sent-113, score-0.911]

43 The two EDUs which are most probably connected by a discourse relation are merged into a discourse subtree of two EDUs. [sent-114, score-1.592]

44 A multi-class Relation classifier evaluates which discourse relation label should be assigned to this new subtree. [sent-115, score-0.869]

45 Next, the Structure classifier and the Relation classifier are employed in cascade to reevaluate which relations are the most likely to hold between adjacent spans (discourse subtrees of any size, including atomic EDUs). [sent-116, score-0.363]

46 This procedure is repeated until all spans are merged, and a discourse tree covering the full text is therefore produced. [sent-117, score-1.016]

47 We also explore how these two classifiers perform differently under different discourse conditions. [sent-124, score-0.754]

48 2 Instance extraction Because HILDA adopts a bottom-up approach for discourse tree building, errors produced on lower levels will certainly propagate to upper levels, usually causing the final discourse tree to be very dissimilar to the gold standard. [sent-126, score-1.586]

49 While appropriate postprocessing may be employed to fix these errors and help global discourse tree recovery, we feel that it might be more effective to directly improve the raw instance performance of the Structure and Relation classifiers. [sent-127, score-0.819]

50 Each instance is of the form (SL, SR), which is a 63 pair of adjacent text spans SL (left span) and SR (right span), extracted from the discourse tree representation in RST-DT. [sent-129, score-0.991]

51 From each discourse tree, we ex- tract positive instances as those pairs of text spans that are siblings of the same parent node, and negative examples as those pairs of adjacent text spans that are not siblings in the tree structure. [sent-130, score-1.246]

52 In all instances, both SL and SR must correspond to a constituent in the discourse tree, which can be either an atomic EDU or a concatenation of multiple consecutive EDUs. [sent-131, score-0.763]

53 Contextual features: For a globally coherent text, there exist particular sequential patterns in the local usage of different discourse relations. [sent-138, score-0.756]

54 Given (SL, SR), the pair of text spans of interest, contextual features attempt to encode the discourse relations assigned to the preceding and the following text span pairs. [sent-139, score-1.193]

55 However, their work was based on PDTB, which has a very different annotation framework from RST-DT (see Section 2): in PDTB, annotated discourse relations can form a chain-like structure such that contextual features can be more readily extracted. [sent-142, score-0.973]

56 However, in RST-DT, a full text is represented as a discourse tree structure, so the previous and the next discourse relations are not well-defined. [sent-143, score-1.662]

57 To find the previous discourse rela)t,i ownh RELprev th < i, such that it ends right before SL and all its leaves belong to a single subtree which neither SL nor SR is a part of. [sent-146, score-0.764]

58 RELprev is then the discourse relation which covers Sprev. [sent-148, score-0.828]

59 The next discourse relation RELnext that immediately follows (SL, SR) is found in the analogous way. [sent-149, score-0.828]

60 However, when building a discourse tree using a greedy bottom-up approach, as adopted by the HILDA discourse parser, RELprev and RELnext are not always available; therefore these contextual features represent an idealized situation. [sent-150, score-1.68]

61 In our experiments we wish to explore whether incorporating perfect contextual features can help better recognize discourse relations, and if so, set an upper bound of performance in more realistic situations. [sent-151, score-0.923]

62 (2009)’s syntactic production rules as features, we develop another set of production rules, namely discourse production rules, derived directly from the tree structure representation in RST-DT. [sent-153, score-1.0]

63 For example, with respect to the RST discourse tree shown in Figure 1, we extract the following discourse production rules: ATTRIBUTION → NO- REL NO-REL, SAME-UNIT → ATTRIBUTION NOREL, CONDITION → SAME-UNIT NO-REL, where NO-REL denotes a le→af node in the discourse subtree. [sent-154, score-2.289]

64 The intuition behind using discourse production rules is that the discourse tree structure is able to reflect the relatedness of different discourse relations discourse relations on the lower level of the tree can determine the relation of their direct parent to some degree. [sent-155, score-3.388]

65 , when applying feature selection for word pairs, we find all word pairs that appear in some text span pair that have a discourse relation between them. [sent-178, score-0.977]

66 Then for each extracted feature, we compute its mutual information with all 18 discourse relation classes defined in RST-DT, and use the highest mutual information to evaluate the effectiveness of that feature. [sent-179, score-0.855]

67 1, our research focus in this paper is the tree-building step of the HILDA discourse parser, which consists of two classifications: Structure and Relation classification. [sent-186, score-0.721]

68 Although HILDA’s bottom-up approach is aimed at building a discourse tree for the full text, it does not explicitly employ different strategies for withinsentence text spans and cross-sentence text spans. [sent-188, score-1.051]

69 However, we believe that discourse parsing is significantly more difficult for text spans at higher levels of the discourse tree structure. [sent-189, score-1.736]

70 Therefore, we conduct the following three sub-experiments to explore whether the two classifiers behave differently under different discourse conditions. [sent-190, score-0.754]

71 To rule out all confounding factors, all classifiers are trained and tested on the basis of individual text span pairs, by assuming the discourse subtree structure (if any) covering each individual text span has been already correctly identified (no error propagation). [sent-196, score-1.152]

72 1 Structure classification The number of training and testing instances used in this experiment for different discourse conditions is listed in Table 1. [sent-198, score-0.882]

73 Structure classification performance for all three discourse conditions is shown in Table 2. [sent-204, score-0.824]

74 However, under this discourse condition, the distribution of positive and negative instances in both training and test sets is extremely skewed, which makes it more sensible to compare the recall and F1 scores for evaluation. [sent-214, score-0.818]

75 For example, looking at the training F1 score under the cross-sentence condition, we can see that classification using full features and classification without contextual features both perform significantly better on the training data than HILDA does. [sent-218, score-0.319]

76 Comparing the results obtained under the first two conditions, we see that the binary classification problem of whether a discourse relation is likely to hold between two adjacent text spans is much more difficult under the cross-sentence condition. [sent-221, score-1.094]

77 In addition, given the extremely imbalanced nature of the dataset under this discourse condition, we might need to employ special approaches to deal with this needlein-a-haystack problem. [sent-225, score-0.77]

78 This suggests that sophisticated features or models in addition to our rich linguistic features must be incorporated in order to fit the problem sufficiently well. [sent-228, score-0.194]

79 However, with contextual features removed, our features perform quite similarly to those of Hernault et al. [sent-234, score-0.192]

80 09 − Table 2: Structure classification performance (in percentage) on text spans of within-sentence, cross-sentence, and all level. [sent-290, score-0.231]

81 2 Relation classification The Relation classifier has 18 possible output labels, which are the coarse-grained relation classes defined in RST-DT. [sent-294, score-0.221]

82 We do not consider nuclearity when classifying different discourse relations, i. [sent-295, score-0.721]

83 Relation classification performance under three discourse conditions is shown in Table 3. [sent-300, score-0.824]

84 78 Table 3: Relation classification performance on text spans of within-sentence, cross-sentence, and all levels. [sent-344, score-0.231]

85 Macro-averaged F-score is not influenced by the number of instances that exist in each relation class, by equally weighting the performance of each relation class3. [sent-347, score-0.32]

86 Therefore, traditional significance tests which operate on individual instances rather than individual relation classes are not applicable. [sent-351, score-0.232]

87 Similar to our observation in Structure classification, the performance of Relation classification for cross-sentence instances is also much poorer than that on within-sentence instances, which again reveals the difficulty of text-level discourse parsing. [sent-355, score-0.874]

88 7 Conclusions In this paper, we aimed to develop an RST-style text-level discourse parser. [sent-356, score-0.721]

89 We chose the HILDA discourse parser (Hernault et al. [sent-357, score-0.785]

90 , 2010b) as the basis of our work, and significantly improved its treebuilding step by incorporating our own rich linguistic features, together with features suggested by Lin et al. [sent-358, score-0.224]

91 We showed that contextual features are highly effective for both Structure and Relation classification under all discourse conditions. [sent-362, score-0.897]

92 Although perfect contextual features are available only in idealized situations, when they are correct, together with other features, they can almost correctly predict the tree structure and better predict the relation labels. [sent-363, score-0.39]

93 Our future work will be to fully implement an end-to-end discourse parser using our rich linguistic features, and focus on improving performance on cross-sentence instances. [sent-365, score-0.881]

94 A novel discourse parser based on Support Vector Machine classification. [sent-387, score-0.785]

95 A semi-supervised approach to improve classification of infrequent discourse relations using feature vector extension. [sent-392, score-0.893]

96 HILDA: A discourse parser using support vector machine classification. [sent-398, score-0.785]

97 Recognizing implicit discourse relations in the Penn Discourse Treebank. [sent-414, score-0.798]

98 Analysis of discourse structure with syntactic dependencies and data-driven shift-reduce parsing. [sent-444, score-0.766]

99 Sentence level discourse parsing using syntactic and lexical information. [sent-448, score-0.784]

100 An effective discourse parser that uses rich linguistic information. [sent-452, score-0.855]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('discourse', 0.721), ('hilda', 0.345), ('hernault', 0.29), ('sl', 0.168), ('spans', 0.123), ('pdtb', 0.11), ('rst', 0.108), ('relation', 0.107), ('mafs', 0.097), ('edus', 0.096), ('relations', 0.077), ('tree', 0.072), ('span', 0.07), ('sr', 0.069), ('rhetorical', 0.068), ('contextual', 0.068), ('parser', 0.064), ('parsing', 0.063), ('features', 0.062), ('lin', 0.057), ('instances', 0.056), ('tacc', 0.055), ('textlevel', 0.055), ('toronto', 0.055), ('wafs', 0.055), ('production', 0.054), ('attribution', 0.049), ('classification', 0.046), ('rich', 0.045), ('structure', 0.045), ('subtree', 0.043), ('duverle', 0.041), ('erasti', 0.041), ('mononuclear', 0.041), ('relprev', 0.041), ('treebuilding', 0.041), ('classifier', 0.041), ('adjacent', 0.039), ('condition', 0.039), ('acc', 0.039), ('text', 0.036), ('idealized', 0.036), ('soricut', 0.036), ('igng', 0.036), ('coherent', 0.035), ('full', 0.035), ('mann', 0.034), ('classifiers', 0.033), ('cue', 0.032), ('marcu', 0.032), ('conditions', 0.031), ('extending', 0.029), ('covering', 0.029), ('nc', 0.029), ('listed', 0.028), ('imbalanced', 0.028), ('lethanh', 0.028), ('oront', 0.028), ('prendinger', 0.028), ('relnext', 0.028), ('subba', 0.028), ('withinsentence', 0.028), ('hybrid', 0.027), ('classes', 0.027), ('basis', 0.027), ('thompson', 0.026), ('performance', 0.026), ('infrequent', 0.025), ('difficulty', 0.025), ('class', 0.025), ('linguistic', 0.025), ('incorporating', 0.024), ('edu', 0.024), ('ziheng', 0.024), ('connective', 0.024), ('mitsuru', 0.024), ('feature', 0.024), ('equally', 0.024), ('similarity', 0.022), ('baldridge', 0.022), ('knott', 0.022), ('constituent', 0.022), ('hold', 0.022), ('wish', 0.022), ('individual', 0.021), ('extremely', 0.021), ('hugo', 0.02), ('helmut', 0.02), ('siblings', 0.02), ('cascade', 0.02), ('paragraphs', 0.02), ('sensible', 0.02), ('parse', 0.02), ('belonging', 0.02), ('coherence', 0.02), ('consecutive', 0.02), ('selection', 0.019), ('units', 0.019), ('connectives', 0.019), ('treebank', 0.019)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000015 193 acl-2012-Text-level Discourse Parsing with Rich Linguistic Features

Author: Vanessa Wei Feng ; Graeme Hirst

2 0.43641958 157 acl-2012-PDTB-style Discourse Annotation of Chinese Text

Author: Yuping Zhou ; Nianwen Xue

Abstract: We describe a discourse annotation scheme for Chinese and report on the preliminary results. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text. Annotation results show that these adaptations work well in practice. Our scheme, taken together with other PDTB-style schemes (e.g. for English, Turkish, Hindi, and Czech), affords a broader perspective on how the generalized lexically grounded approach can flesh itself out in the context of cross-linguistic annotation of discourse relations.

3 0.37793431 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

Author: Christian Chiarcos

Abstract: This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in the distribution of relation words (that serve to signal these relations). Experiments on two large-scale English web corpora show that significant correlations between pairs of adjacent events and relation words exist, that they are reproducible on different data sets, and for three relation words, that their distribution corresponds to theorybased assumptions. 1 Motivation Texts are not merely accumulations of isolated utterances, but the arrangement of utterances conveys meaning; human text understanding can thus be described as a process to recover the global structure of texts and the relations linking its different parts (Vallduv ı´ 1992; Gernsbacher et al. 2004). To capture these aspects of meaning in NLP, it is necessary to develop operationalizable theories, and, within a supervised approach, large amounts of annotated training data. To facilitate manual annotation, weakly supervised or unsupervised techniques can be applied as preprocessing step for semimanual annotation, and this is part of the motivation of the approach described here. 213 Discourse relations involve different aspects of meaning. This may include factual knowledge about the connected discourse segments (a ‘subjectmatter’ relation, e.g., if one utterance represents the cause for another, Mann and Thompson 1988, p.257), argumentative purposes (a ‘presentational’ relation, e.g., one utterance motivates the reader to accept a claim formulated in another utterance, ibid., p.257), or relations between entities mentioned in the connected discourse segments (anaphoric relations, Webber et al. 2003). Discourse relations can be indicated explicitly by optional cues, e.g., adverbials (e.g., however), conjunctions (e.g., but), or complex phrases (e.g., in contrast to what Peter said a minute ago). Here, these cues are referred to as relation words. Assuming that relation words are associated with specific discourse relations (Knott and Dale 1994; Prasad et al. 2008), the distribution of relation words found between two (types of) events can yield insights into the range of discourse relations possible at this occasion and their respective likeliness. For this purpose, this paper proposes a background knowledge base (BKB) that hosts pairs of events (here heuristically represented by verbs) along with distributional profiles for relation words. The primary data structure of the BKB is a triple where one event (type) is connected with a particular relation word to another event (type). Triples are further augmented with a frequency score (expressing the likelihood of the triple to be observed), a significance score (see below), and a correlation score (indicating whether a pair of events has a positive or negative correlation with a particular relation word). ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 1s3–217, Triples can be easily acquired from automatically parsed corpora. While the relation word is usually part of the utterance that represents the source of the relation, determining the appropriate target (antecedent) of the relation may be difficult to achieve. As a heuristic, an adjacency preference is adopted, i.e., the target is identified with the main event of the preceding utterance.1 The BKB can be constructed from a sufficiently large corpus as follows: • • identify event types and relation words for every utterance create a candidate triple consisting of the event type of the utterance, the relation word, and the event type of the preceding utterance. add the candidate triple to the BKB, if it found in the BKB, increase its score by (or initialize it with) 1, – – • perform a pruning on all candidate triples, calcpuerlaftoer significance aonnd a lclo crarneldaitdioante scores Pruning uses statistical significance tests to evaluate whether the relative frequency of a relation word for a pair of events is significantly higher or lower than the relative frequency of the relation word in the entire corpus. Assuming that incorrect candidate triples (i.e., where the factual target of the relation was non-adjacent) are equally distributed, they should be filtered out by the significance tests. The goal of this paper is to evaluate the validity of this approach. 2 Experimental Setup By generalizing over multiple occurrences of the same events (or, more precisely, event types), one can identify preferences of event pairs for one or several relation words. These preferences capture context-invariant characteristics of pairs of events and are thus to considered to reflect a semantic predisposition for a particular discourse relation. Formally, an event is the semantic representation of the meaning conveyed in the utterance. We 1Relations between non-adjacent utterances are constrained by the structure of discourse (Webber 1991), and thus less likely than relations between adjacent utterances. 214 assume that the same event can reoccur in different contexts, we are thus studying relations between types of events. For the experiment described here, events are heuristically identified with the main predicates of a sentence, i.e., non-auxiliar, noncausative, non-modal verbal lexemes that serve as heads of main clauses. The primary data structure of the approach described here is a triple consisting of a source event, a relation word and a target (antecedent) event. These triples are harvested from large syntactically annotated corpora. For intersentential relations, the target is identified with the event of the immediately preceding main clause. These extraction preferences are heuristic approximations, and thus, an additional pruning step is necessary. For this purpose, statistical significance tests are adopted (χ2 for triples of frequent events and relation words, t-test for rare events and/or relation words) that compare the relative frequency of a rela- tion word given a pair of events with the relative frequency of the relation word in the entire corpus. All results with p ≥ .05 are excluded, i.e., only triples are preserved pfo ≥r w .0h5ic ahr teh eex xocblsuedrevde,d i positive or negative correlation between a pair of events and a relation word is not due to chance with at least 95% probability. Assuming an even distribution of incorrect target events, this should rule these out. Additionally, it also serves as a means of evaluation. Using statistical significance tests as pruning criterion entails that all triples eventually confirmed are statistically significant.2 This setup requires immense amounts of data: We are dealing with several thousand events (theoretically, the total number of verbs of a language). The chance probability for two events to occur in adjacent position is thus far below 10−6, and it decreases further if the likelihood of a relation word is taken into consideration. All things being equal, we thus need millions of sentences to create the BKB. Here, two large-scale corpora of English are employed, PukWaC and Wackypedia EN (Baroni et al. 2009). PukWaC is a 2G-token web corpus of British English crawled from the uk domain (Ferraresi et al. 2Subsequent studies may employ less rigid pruning criteria. For the purpose of the current paper, however, the statistical significance of all extracted triples serves as an criterion to evaluate methodological validity. 2008), and parsed with MaltParser (Nivre et al. 2006). It is distributed in 5 parts; Only PukWaC1 to PukWaC-4 were considered here, constituting 82.2% (72.5M sentences) of the entire corpus, PukWaC-5 is left untouched for forthcoming evaluation experiments. Wackypedia EN is a 0.8G-token dump of the English Wikipedia, annotated with the same tools. It is distributed in 4 different files; the last portion was left untouched for forthcoming evaluation experiments. The portion analyzed here comprises 33.2M sentences, 75.9% of the corpus. The extraction of events in these corpora uses simple patterns that combine dependency information and part-of-speech tags to retrieve the main verbs and store their lemmata as event types. The target (antecedent) event was identified with the last main event of the preceding sentence. As relation words, only sentence-initial children of the source event that were annotated as adverbial modifiers, verb modifiers or conjunctions were considered. 3 Evaluation To evaluate the validity of the approach, three fundamental questions need to be addressed: significance (are there significant correlations between pairs of events and relation words ?), reproducibility (can these correlations confirmed on independent data sets ?), and interpretability (can these correlations be interpreted in terms of theoretically-defined discourse relations ?). 3.1 Significance and Reproducibility Significance tests are part of the pruning stage of the algorithm. Therefore, the number of triples eventually retrieved confirms the existence of statistically significant correlations between pairs of events and relation words. The left column of Tab. 1 shows the number of triples obtained from PukWaC subcorpora of different size. For reproducibility, compare the triples identified with Wackypedia EN and PukWaC subcorpora of different size: Table 1 shows the number of triples found in both Wackypedia EN and PukWaC, and the agreement between both resources. For two triples involving the same events (event types) and the same relation word, agreement means that the relation word shows either positive or negative correlation 215 TasPbe13u7l4n2k98t. We254Mn1a c:CeAs(gurb42)et760cr8m,iop3e61r4l28np0st6uwicho21rm9W,e2673mas048p7c3okenytpdoagi21p8r,o35eE0s29Nit36nvgreipol8796r50s9%.n3509egative correlation of event pairs and relation words between Wackypedia EN and PukWaC subcorpora of different size TBH: thb ouetwnev r17 t1,o27,t0a95P41 ul2kWv6aCs,8.0 Htr5iple1v s, 45.12T35av9sg7.reH7em nv6 ts62(. %.9T2) Table 2: Agreement between but (B), however (H) and then (T) on PukWaC in both corpora, disagreement means positive correlation in one corpus and negative correlation in the other. Table 1 confirms that results obtained on one resource can be reproduced on another. This indicates that triples indeed capture context-invariant, and hence, semantic, characteristics of the relation between events. The data also indicates that reproducibility increases with the size of corpora from which a BKB is built. 3.2 Interpretability Any theory of discourse relations would predict that relation words with similar function should have similar distributions, whereas one would expect different distributions for functionally unrelated relation words. These expectations are tested here for three of the most frequent relation words found in the corpora, i.e., but, then and however. But and however can be grouped together under a generalized notion of contrast (Knott and Dale 1994; Prasad et al. 2008); then, on the other hand, indicates a tem- poral and/or causal relation. Table 2 confirms the expectation that event pairs that are correlated with but tend to show the same correlation with however, but not with then. 4 Discussion and Outlook This paper described a novel approach towards the unsupervised acquisition of discourse relations, with encouraging preliminary results: Large collections of parsed text are used to assess distributional profiles of relation words that indicate discourse relations that are possible between specific types of events; on this basis, a background knowledge base (BKB) was created that can be used to predict an appropriatediscoursemarkertoconnecttwoutterances with no overt relation word. This information can be used, for example, to facilitate the semiautomated annotation of discourse relations, by pointing out the ‘default’ relation word for a given pair of events. Similarly, Zhou et al. (2010) used a language model to predict discourse markers for implicitly realized discourse relations. As opposed to this shallow, n-gram-based approach, here, the internal structure of utterances is exploited: based on semantic considerations, syntactic patterns have been devised that extract triples of event pairs and relation words. The resulting BKB provides a distributional approximation of the discourse relations that can hold between two specific event types. Both approaches exploit complementary sources of knowledge, and may be combined with each other to achieve a more precise prediction of implicit discourse connectives. The validity of the approach was evaluated with respect to three evaluation criteria: The extracted associations between relation words and event pairs could be shown to be statistically significant, and to be reproducible on other corpora; for three highly frequent relation words, theoretical predictions about their relative distribution could be confirmed, indicating their interpretability in terms of presupposed taxonomies of discourse relations. Another prospective field of application can be seen in NLP applications, where selection preferences for relation words may serve as a cheap replacement for full-fledged discourse parsing. In the Natural Language Understanding domain, the BKB may help to disambiguate or to identify discourse relations between different events; in the context of Machine Translation, it may represent a factor guid- ing the insertion of relation words, a task that has been found to be problematic for languages that dif216 fer in their inventory and usage of discourse markers, e.g., German and English (Stede and Schmitz 2000). The approach is language-independent (except for the syntactic extraction patterns), and it does not require manually annotated data. It would thus be easy to create background knowledge bases with relation words for other languages or specific domains given a sufficient amount of textual data. – Related research includes, for example, the unsupervised recognition of causal and temporal relationships, as required, for example, for the recognition of textual entailment. Riaz and Girju (2010) exploit distributional information about pairs of utterances. Unlike approach described here, they are not restricted to adjacent utterances, and do not rely on explicit and recurrent relation words. Their approach can thus be applied to comparably small data sets. However, they are restricted to a specific type of relations whereas here the entire band- width of discourse relations that are explicitly realized in a language are covered. Prospectively, both approaches could be combined to compensate their respective weaknesses. Similar observations can be made with respect to Chambers and Jurafsky (2009) and Kasch and Oates (2010), who also study a single discourse relation (narration), and are thus more limited in scope than the approach described here. However, as their approach extends beyond pairs of events to complex event chains, it seems that both approaches provide complementary types of information and their results could also be combined in a fruitful way to achieve a more detailed assessment of discourse relations. The goal of this paper was to evaluate the methdological validity of the approach. It thus represents the basis for further experiments, e.g., with respect to the enrichment the BKB with information provided by Riaz and Girju (2010), Chambers and Jurafsky (2009) and Kasch and Oates (2010). Other directions of subsequent research may include address more elaborate models of events, and the investigation of the relationship between relation words and taxonomies of discourse relations. Acknowledgments This work was supported by a fellowship within the Postdoc program of the German Academic Exchange Service (DAAD). Initial experiments were conducted at the Collaborative Research Center (SFB) 632 “Information Structure” at the University of Potsdam, Germany. Iwould also like to thank three anonymous reviewers for valuable comments and feedback, as well as Manfred Stede and Ed Hovy whose work on discourse relations on the one hand and proposition stores on the other hand have been the main inspiration for this paper. References M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226, 2009. N. Chambers and D. Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 602–610. Association for Computational Linguistics, 2009. A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini. Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the 4th Web as Corpus Workshop (WAC-4) Can we beat Google, pages 47–54, 2008. Morton Ann Gernsbacher, Rachel R. W. Robertson, Paola Palladino, and Necia K. Werner. Managing mental representations during narrative comprehension. Discourse Processes, 37(2): 145–164, 2004. N. Kasch and T. Oates. Mining script-like structures from the web. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 34–42. Association for Computational Linguistics, 2010. A. Knott and R. Dale. Using linguistic phenomena to motivate a set ofcoherence relations. Discourse processes, 18(1):35–62, 1994. 217 J. van Kuppevelt and R. Smith, editors. Current Directions in Discourse andDialogue. Kluwer, Dordrecht, 2003. William C. Mann and Sandra A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988. J. Nivre, J. Hall, and J. Nilsson. Maltparser: A data-driven parser-generator for dependency parsing. In Proc. of LREC, pages 2216–2219. Citeseer, 2006. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. The penn discourse treebank 2.0. In Proc. 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008. M. Riaz and R. Girju. Another look at causality: Discovering scenario-specific contingency relationships with no supervision. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on, pages 361–368. IEEE, 2010. M. Stede and B. Schmitz. Discourse particles and discourse functions. Machine translation, 15(1): 125–147, 2000. Enric Vallduv ı´. The Informational Component. Garland, New York, 1992. Bonnie L. Webber. Structure and ostension in the interpretation of discourse deixis. Natural Language and Cognitive Processes, 2(6): 107–135, 1991. Bonnie L. Webber, Matthew Stone, Aravind K. Joshi, and Alistair Knott. Anaphora and discourse structure. Computational Linguistics, 4(29):545– 587, 2003. Z.-M. Zhou, Y. Xu, Z.-Y. Niu, M. Lan, J. Su, and C.L. Tan. Predicting discourse connectives for implicit discourse relation recognition. In COLING 2010, pages 1507–15 14, Beijing, China, August 2010.

4 0.37119433 47 acl-2012-Chinese Comma Disambiguation for Discourse Analysis

Author: Yaqin Yang ; Nianwen Xue

Abstract: The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structureoriented classification of the comma that can be automatically extracted from the Chinese Treebank based on syntactic patterns. We then experimented with two supervised learning methods that automatically disambiguate the Chinese comma based on this classification. The first method integrates comma classification into parsing, and the second method adopts a “post-processing” approach that extracts features from automatic parses to train a classifier. The experimental results show that the second approach compares favorably against the first approach.

5 0.20106934 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

Author: Ziheng Lin ; Chang Liu ; Hwee Tou Ng ; Min-Yen Kan

Abstract: An ideal summarization system should produce summaries that have high content coverage and linguistic quality. Many state-ofthe-art summarization systems focus on content coverage by extracting content-dense sentences from source articles. A current research focus is to process these sentences so that they read fluently as a whole. The current AESOP task encourages research on evaluating summaries on content, readability, and overall responsiveness. In this work, we adapt a machine translation metric to measure content coverage, apply an enhanced discourse coherence model to evaluate summary readability, and combine both in a trained regression model to evaluate overall responsiveness. The results show significantly improved performance over AESOP 2011 submitted metrics.

6 0.077066936 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

7 0.074953318 50 acl-2012-Collective Classification for Fine-grained Information Status

8 0.067380823 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

9 0.066972293 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis

10 0.065682776 191 acl-2012-Temporally Anchored Relation Extraction

11 0.0625607 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

12 0.059264526 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

13 0.058293119 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models

14 0.056508038 176 acl-2012-Sentence Compression with Semantic Role Constraints

15 0.055924561 73 acl-2012-Discriminative Learning for Joint Template Filling

16 0.055319935 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

17 0.053812873 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

18 0.053568531 51 acl-2012-Collective Generation of Natural Image Descriptions

19 0.051909566 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

20 0.051051043 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.18), (1, 0.087), (2, -0.165), (3, 0.098), (4, 0.034), (5, -0.133), (6, -0.257), (7, -0.132), (8, -0.302), (9, 0.527), (10, 0.015), (11, -0.072), (12, 0.016), (13, -0.015), (14, 0.156), (15, -0.043), (16, 0.021), (17, 0.043), (18, 0.033), (19, -0.027), (20, -0.072), (21, -0.019), (22, -0.103), (23, 0.002), (24, -0.029), (25, -0.061), (26, 0.044), (27, 0.016), (28, -0.093), (29, 0.066), (30, -0.064), (31, -0.038), (32, -0.034), (33, -0.003), (34, 0.026), (35, 0.007), (36, 0.03), (37, 0.032), (38, -0.034), (39, -0.0), (40, 0.029), (41, 0.027), (42, -0.008), (43, 0.009), (44, -0.04), (45, -0.08), (46, -0.015), (47, -0.022), (48, -0.018), (49, -0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97400349 193 acl-2012-Text-level Discourse Parsing with Rich Linguistic Features

Author: Vanessa Wei Feng ; Graeme Hirst

2 0.94574463 157 acl-2012-PDTB-style Discourse Annotation of Chinese Text

Author: Yuping Zhou ; Nianwen Xue

3 0.94419909 47 acl-2012-Chinese Comma Disambiguation for Discourse Analysis

Author: Yaqin Yang ; Nianwen Xue

4 0.78291631 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

Author: Christian Chiarcos

5 0.45706582 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

Author: Ziheng Lin ; Chang Liu ; Hwee Tou Ng ; Min-Yen Kan

6 0.30342948 50 acl-2012-Collective Classification for Fine-grained Information Status

7 0.23527329 129 acl-2012-Learning High-Level Planning from Text

8 0.22592552 133 acl-2012-Learning to "Read Between the Lines" using Bayesian Logic Programs

9 0.22334932 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis

10 0.21937679 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

11 0.21911854 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

12 0.21690762 6 acl-2012-A Comprehensive Gold Standard for the Enron Organizational Hierarchy

13 0.20042595 190 acl-2012-Syntactic Stylometry for Deception Detection

14 0.19894652 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

15 0.19000517 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction

16 0.18973239 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

17 0.18865001 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations

18 0.1865398 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing

19 0.1855841 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

20 0.18490684 191 acl-2012-Temporally Anchored Relation Extraction

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.024), (26, 0.038), (28, 0.056), (30, 0.034), (37, 0.029), (39, 0.049), (51, 0.265), (59, 0.011), (74, 0.021), (82, 0.015), (84, 0.04), (85, 0.041), (90, 0.102), (91, 0.021), (92, 0.064), (94, 0.032), (99, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.73108518 193 acl-2012-Text-level Discourse Parsing with Rich Linguistic Features

Author: Vanessa Wei Feng ; Graeme Hirst

2 0.51579505 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

Author: Pradeep Dasigi ; Weiwei Guo ; Mona Diab

Abstract: We describe an unsupervised approach to the problem of automatically detecting subgroups of people holding similar opinions in a discussion thread. An intuitive way of identifying this is to detect the attitudes of discussants towards each other or named entities or topics mentioned in the discussion. Sentiment tags play an important role in this detection, but we also note another dimension to the detection of people’s attitudes in a discussion: if two persons share the same opinion, they tend to use similar language content. We consider the latter to be an implicit attitude. In this paper, we investigate the impact of implicit and explicit attitude in two genres of social media discussion data, more formal wikipedia discussions and a debate discussion forum that is much more informal. Experimental results strongly suggest that implicit attitude is an important complement for explicit attitudes (expressed via sentiment) and it can improve the sub-group detection performance independent of genre.

3 0.51568115 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

Author: Gerard de Melo ; Gerhard Weikum

Abstract: We present UWN, a large multilingual lexical knowledge base that describes the meanings and relationships of words in over 200 languages. This paper explains how link prediction, information integration and taxonomy induction methods have been used to build UWN based on WordNet and extend it with millions of named entities from Wikipedia. We additionally introduce extensions to cover lexical relationships, frame-semantic knowledge, and language data. An online interface provides human access to the data, while a software API enables applications to look up over 16 million words and names.

4 0.51450014 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

Author: Bevan Jones ; Mark Johnson ; Sharon Goldwater

Abstract: Many semantic parsing models use tree transformations to map between natural language and meaning representation. However, while tree transformations are central to several state-of-the-art approaches, little use has been made of the rich literature on tree automata. This paper makes the connection concrete with a tree transducer based semantic parsing model and suggests that other models can be interpreted in a similar framework, increasing the generality of their contributions. In particular, this paper further introduces a variational Bayesian inference algorithm that is applicable to a wide class of tree transducers, producing state-of-the-art semantic parsing results while remaining applicable to any domain employing probabilistic tree transducers.

5 0.5141874 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger

Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.

6 0.51416254 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

7 0.51157004 139 acl-2012-MIX Is Not a Tree-Adjoining Language

8 0.51099592 197 acl-2012-Tokenization: Returning to a Long Solved Problem A Survey, Contrastive Experiment, Recommendations, and Toolkit

9 0.51088601 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

10 0.51037657 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

11 0.5093047 191 acl-2012-Temporally Anchored Relation Extraction

12 0.50904173 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

13 0.50863898 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

14 0.5084849 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

15 0.50773901 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool

16 0.50672436 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

17 0.50511146 167 acl-2012-QuickView: NLP-based Tweet Search

18 0.50438547 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

19 0.50343132 77 acl-2012-Ecological Evaluation of Persuasive Messages Using Google AdWords

20 0.50279844 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle