acl acl2012 acl2012-201 knowledge-graph by maker-knowledge-mining

201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations


Source: pdf

Author: Christian Chiarcos

Abstract: This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in the distribution of relation words (that serve to signal these relations). Experiments on two large-scale English web corpora show that significant correlations between pairs of adjacent events and relation words exist, that they are reproducible on different data sets, and for three relation words, that their distribution corresponds to theorybased assumptions. 1 Motivation Texts are not merely accumulations of isolated utterances, but the arrangement of utterances conveys meaning; human text understanding can thus be described as a process to recover the global structure of texts and the relations linking its different parts (Vallduv ı´ 1992; Gernsbacher et al. 2004). To capture these aspects of meaning in NLP, it is necessary to develop operationalizable theories, and, within a supervised approach, large amounts of annotated training data. To facilitate manual annotation, weakly supervised or unsupervised techniques can be applied as preprocessing step for semimanual annotation, and this is part of the motivation of the approach described here. 213 Discourse relations involve different aspects of meaning. This may include factual knowledge about the connected discourse segments (a ‘subjectmatter’ relation, e.g., if one utterance represents the cause for another, Mann and Thompson 1988, p.257), argumentative purposes (a ‘presentational’ relation, e.g., one utterance motivates the reader to accept a claim formulated in another utterance, ibid., p.257), or relations between entities mentioned in the connected discourse segments (anaphoric relations, Webber et al. 2003). Discourse relations can be indicated explicitly by optional cues, e.g., adverbials (e.g., however), conjunctions (e.g., but), or complex phrases (e.g., in contrast to what Peter said a minute ago). Here, these cues are referred to as relation words. Assuming that relation words are associated with specific discourse relations (Knott and Dale 1994; Prasad et al. 2008), the distribution of relation words found between two (types of) events can yield insights into the range of discourse relations possible at this occasion and their respective likeliness. For this purpose, this paper proposes a background knowledge base (BKB) that hosts pairs of events (here heuristically represented by verbs) along with distributional profiles for relation words. The primary data structure of the BKB is a triple where one event (type) is connected with a particular relation word to another event (type). Triples are further augmented with a frequency score (expressing the likelihood of the triple to be observed), a significance score (see below), and a correlation score (indicating whether a pair of events has a positive or negative correlation with a particular relation word). ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 1s3–217, Triples can be easily acquired from automatically parsed corpora. While the relation word is usually part of the utterance that represents the source of the relation, determining the appropriate target (antecedent) of the relation may be difficult to achieve. As a heuristic, an adjacency preference is adopted, i.e., the target is identified with the main event of the preceding utterance.1 The BKB can be constructed from a sufficiently large corpus as follows: • • identify event types and relation words for every utterance create a candidate triple consisting of the event type of the utterance, the relation word, and the event type of the preceding utterance. add the candidate triple to the BKB, if it found in the BKB, increase its score by (or initialize it with) 1, – – • perform a pruning on all candidate triples, calcpuerlaftoer significance aonnd a lclo crarneldaitdioante scores Pruning uses statistical significance tests to evaluate whether the relative frequency of a relation word for a pair of events is significantly higher or lower than the relative frequency of the relation word in the entire corpus. Assuming that incorrect candidate triples (i.e., where the factual target of the relation was non-adjacent) are equally distributed, they should be filtered out by the significance tests. The goal of this paper is to evaluate the validity of this approach. 2 Experimental Setup By generalizing over multiple occurrences of the same events (or, more precisely, event types), one can identify preferences of event pairs for one or several relation words. These preferences capture context-invariant characteristics of pairs of events and are thus to considered to reflect a semantic predisposition for a particular discourse relation. Formally, an event is the semantic representation of the meaning conveyed in the utterance. We 1Relations between non-adjacent utterances are constrained by the structure of discourse (Webber 1991), and thus less likely than relations between adjacent utterances. 214 assume that the same event can reoccur in different contexts, we are thus studying relations between types of events. For the experiment described here, events are heuristically identified with the main predicates of a sentence, i.e., non-auxiliar, noncausative, non-modal verbal lexemes that serve as heads of main clauses. The primary data structure of the approach described here is a triple consisting of a source event, a relation word and a target (antecedent) event. These triples are harvested from large syntactically annotated corpora. For intersentential relations, the target is identified with the event of the immediately preceding main clause. These extraction preferences are heuristic approximations, and thus, an additional pruning step is necessary. For this purpose, statistical significance tests are adopted (χ2 for triples of frequent events and relation words, t-test for rare events and/or relation words) that compare the relative frequency of a rela- tion word given a pair of events with the relative frequency of the relation word in the entire corpus. All results with p ≥ .05 are excluded, i.e., only triples are preserved pfo ≥r w .0h5ic ahr teh eex xocblsuedrevde,d i positive or negative correlation between a pair of events and a relation word is not due to chance with at least 95% probability. Assuming an even distribution of incorrect target events, this should rule these out. Additionally, it also serves as a means of evaluation. Using statistical significance tests as pruning criterion entails that all triples eventually confirmed are statistically significant.2 This setup requires immense amounts of data: We are dealing with several thousand events (theoretically, the total number of verbs of a language). The chance probability for two events to occur in adjacent position is thus far below 10−6, and it decreases further if the likelihood of a relation word is taken into consideration. All things being equal, we thus need millions of sentences to create the BKB. Here, two large-scale corpora of English are employed, PukWaC and Wackypedia EN (Baroni et al. 2009). PukWaC is a 2G-token web corpus of British English crawled from the uk domain (Ferraresi et al. 2Subsequent studies may employ less rigid pruning criteria. For the purpose of the current paper, however, the statistical significance of all extracted triples serves as an criterion to evaluate methodological validity. 2008), and parsed with MaltParser (Nivre et al. 2006). It is distributed in 5 parts; Only PukWaC1 to PukWaC-4 were considered here, constituting 82.2% (72.5M sentences) of the entire corpus, PukWaC-5 is left untouched for forthcoming evaluation experiments. Wackypedia EN is a 0.8G-token dump of the English Wikipedia, annotated with the same tools. It is distributed in 4 different files; the last portion was left untouched for forthcoming evaluation experiments. The portion analyzed here comprises 33.2M sentences, 75.9% of the corpus. The extraction of events in these corpora uses simple patterns that combine dependency information and part-of-speech tags to retrieve the main verbs and store their lemmata as event types. The target (antecedent) event was identified with the last main event of the preceding sentence. As relation words, only sentence-initial children of the source event that were annotated as adverbial modifiers, verb modifiers or conjunctions were considered. 3 Evaluation To evaluate the validity of the approach, three fundamental questions need to be addressed: significance (are there significant correlations between pairs of events and relation words ?), reproducibility (can these correlations confirmed on independent data sets ?), and interpretability (can these correlations be interpreted in terms of theoretically-defined discourse relations ?). 3.1 Significance and Reproducibility Significance tests are part of the pruning stage of the algorithm. Therefore, the number of triples eventually retrieved confirms the existence of statistically significant correlations between pairs of events and relation words. The left column of Tab. 1 shows the number of triples obtained from PukWaC subcorpora of different size. For reproducibility, compare the triples identified with Wackypedia EN and PukWaC subcorpora of different size: Table 1 shows the number of triples found in both Wackypedia EN and PukWaC, and the agreement between both resources. For two triples involving the same events (event types) and the same relation word, agreement means that the relation word shows either positive or negative correlation 215 TasPbe13u7l4n2k98t. We254Mn1a c:CeAs(gurb42)et760cr8m,iop3e61r4l28np0st6uwicho21rm9W,e2673mas048p7c3okenytpdoagi21p8r,o35eE0s29Nit36nvgreipol8796r50s9%.n3509egative correlation of event pairs and relation words between Wackypedia EN and PukWaC subcorpora of different size TBH: thb ouetwnev r17 t1,o27,t0a95P41 ul2kWv6aCs,8.0 Htr5iple1v s, 45.12T35av9sg7.reH7em nv6 ts62(. %.9T2) Table 2: Agreement between but (B), however (H) and then (T) on PukWaC in both corpora, disagreement means positive correlation in one corpus and negative correlation in the other. Table 1 confirms that results obtained on one resource can be reproduced on another. This indicates that triples indeed capture context-invariant, and hence, semantic, characteristics of the relation between events. The data also indicates that reproducibility increases with the size of corpora from which a BKB is built. 3.2 Interpretability Any theory of discourse relations would predict that relation words with similar function should have similar distributions, whereas one would expect different distributions for functionally unrelated relation words. These expectations are tested here for three of the most frequent relation words found in the corpora, i.e., but, then and however. But and however can be grouped together under a generalized notion of contrast (Knott and Dale 1994; Prasad et al. 2008); then, on the other hand, indicates a tem- poral and/or causal relation. Table 2 confirms the expectation that event pairs that are correlated with but tend to show the same correlation with however, but not with then. 4 Discussion and Outlook This paper described a novel approach towards the unsupervised acquisition of discourse relations, with encouraging preliminary results: Large collections of parsed text are used to assess distributional profiles of relation words that indicate discourse relations that are possible between specific types of events; on this basis, a background knowledge base (BKB) was created that can be used to predict an appropriatediscoursemarkertoconnecttwoutterances with no overt relation word. This information can be used, for example, to facilitate the semiautomated annotation of discourse relations, by pointing out the ‘default’ relation word for a given pair of events. Similarly, Zhou et al. (2010) used a language model to predict discourse markers for implicitly realized discourse relations. As opposed to this shallow, n-gram-based approach, here, the internal structure of utterances is exploited: based on semantic considerations, syntactic patterns have been devised that extract triples of event pairs and relation words. The resulting BKB provides a distributional approximation of the discourse relations that can hold between two specific event types. Both approaches exploit complementary sources of knowledge, and may be combined with each other to achieve a more precise prediction of implicit discourse connectives. The validity of the approach was evaluated with respect to three evaluation criteria: The extracted associations between relation words and event pairs could be shown to be statistically significant, and to be reproducible on other corpora; for three highly frequent relation words, theoretical predictions about their relative distribution could be confirmed, indicating their interpretability in terms of presupposed taxonomies of discourse relations. Another prospective field of application can be seen in NLP applications, where selection preferences for relation words may serve as a cheap replacement for full-fledged discourse parsing. In the Natural Language Understanding domain, the BKB may help to disambiguate or to identify discourse relations between different events; in the context of Machine Translation, it may represent a factor guid- ing the insertion of relation words, a task that has been found to be problematic for languages that dif216 fer in their inventory and usage of discourse markers, e.g., German and English (Stede and Schmitz 2000). The approach is language-independent (except for the syntactic extraction patterns), and it does not require manually annotated data. It would thus be easy to create background knowledge bases with relation words for other languages or specific domains given a sufficient amount of textual data. – Related research includes, for example, the unsupervised recognition of causal and temporal relationships, as required, for example, for the recognition of textual entailment. Riaz and Girju (2010) exploit distributional information about pairs of utterances. Unlike approach described here, they are not restricted to adjacent utterances, and do not rely on explicit and recurrent relation words. Their approach can thus be applied to comparably small data sets. However, they are restricted to a specific type of relations whereas here the entire band- width of discourse relations that are explicitly realized in a language are covered. Prospectively, both approaches could be combined to compensate their respective weaknesses. Similar observations can be made with respect to Chambers and Jurafsky (2009) and Kasch and Oates (2010), who also study a single discourse relation (narration), and are thus more limited in scope than the approach described here. However, as their approach extends beyond pairs of events to complex event chains, it seems that both approaches provide complementary types of information and their results could also be combined in a fruitful way to achieve a more detailed assessment of discourse relations. The goal of this paper was to evaluate the methdological validity of the approach. It thus represents the basis for further experiments, e.g., with respect to the enrichment the BKB with information provided by Riaz and Girju (2010), Chambers and Jurafsky (2009) and Kasch and Oates (2010). Other directions of subsequent research may include address more elaborate models of events, and the investigation of the relationship between relation words and taxonomies of discourse relations. Acknowledgments This work was supported by a fellowship within the Postdoc program of the German Academic Exchange Service (DAAD). Initial experiments were conducted at the Collaborative Research Center (SFB) 632 “Information Structure” at the University of Potsdam, Germany. Iwould also like to thank three anonymous reviewers for valuable comments and feedback, as well as Manfred Stede and Ed Hovy whose work on discourse relations on the one hand and proposition stores on the other hand have been the main inspiration for this paper. References M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226, 2009. N. Chambers and D. Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 602–610. Association for Computational Linguistics, 2009. A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini. Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the 4th Web as Corpus Workshop (WAC-4) Can we beat Google, pages 47–54, 2008. Morton Ann Gernsbacher, Rachel R. W. Robertson, Paola Palladino, and Necia K. Werner. Managing mental representations during narrative comprehension. Discourse Processes, 37(2): 145–164, 2004. N. Kasch and T. Oates. Mining script-like structures from the web. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 34–42. Association for Computational Linguistics, 2010. A. Knott and R. Dale. Using linguistic phenomena to motivate a set ofcoherence relations. Discourse processes, 18(1):35–62, 1994. 217 J. van Kuppevelt and R. Smith, editors. Current Directions in Discourse andDialogue. Kluwer, Dordrecht, 2003. William C. Mann and Sandra A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988. J. Nivre, J. Hall, and J. Nilsson. Maltparser: A data-driven parser-generator for dependency parsing. In Proc. of LREC, pages 2216–2219. Citeseer, 2006. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. The penn discourse treebank 2.0. In Proc. 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008. M. Riaz and R. Girju. Another look at causality: Discovering scenario-specific contingency relationships with no supervision. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on, pages 361–368. IEEE, 2010. M. Stede and B. Schmitz. Discourse particles and discourse functions. Machine translation, 15(1): 125–147, 2000. Enric Vallduv ı´. The Informational Component. Garland, New York, 1992. Bonnie L. Webber. Structure and ostension in the interpretation of discourse deixis. Natural Language and Cognitive Processes, 2(6): 107–135, 1991. Bonnie L. Webber, Matthew Stone, Aravind K. Joshi, and Alistair Knott. Anaphora and discourse structure. Computational Linguistics, 4(29):545– 587, 2003. Z.-M. Zhou, Y. Xu, Z.-Y. Niu, M. Lan, J. Su, and C.L. Tan. Predicting discourse connectives for implicit discourse relation recognition. In COLING 2010, pages 1507–15 14, Beijing, China, August 2010.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 de Abstract This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. [sent-2, score-0.65]

2 Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in the distribution of relation words (that serve to signal these relations). [sent-3, score-1.371]

3 Experiments on two large-scale English web corpora show that significant correlations between pairs of adjacent events and relation words exist, that they are reproducible on different data sets, and for three relation words, that their distribution corresponds to theorybased assumptions. [sent-4, score-1.084]

4 To facilitate manual annotation, weakly supervised or unsupervised techniques can be applied as preprocessing step for semimanual annotation, and this is part of the motivation of the approach described here. [sent-8, score-0.027]

5 This may include factual knowledge about the connected discourse segments (a ‘subjectmatter’ relation, e. [sent-10, score-0.499]

6 , if one utterance represents the cause for another, Mann and Thompson 1988, p. [sent-12, score-0.083]

7 , one utterance motivates the reader to accept a claim formulated in another utterance, ibid. [sent-15, score-0.083]

8 257), or relations between entities mentioned in the connected discourse segments (anaphoric relations, Webber et al. [sent-17, score-0.6]

9 Discourse relations can be indicated explicitly by optional cues, e. [sent-19, score-0.14]

10 Here, these cues are referred to as relation words. [sent-28, score-0.281]

11 Assuming that relation words are associated with specific discourse relations (Knott and Dale 1994; Prasad et al. [sent-29, score-0.877]

12 2008), the distribution of relation words found between two (types of) events can yield insights into the range of discourse relations possible at this occasion and their respective likeliness. [sent-30, score-1.125]

13 For this purpose, this paper proposes a background knowledge base (BKB) that hosts pairs of events (here heuristically represented by verbs) along with distributional profiles for relation words. [sent-31, score-0.739]

14 The primary data structure of the BKB is a triple where one event (type) is connected with a particular relation word to another event (type). [sent-32, score-0.881]

15 Triples are further augmented with a frequency score (expressing the likelihood of the triple to be observed), a significance score (see below), and a correlation score (indicating whether a pair of events has a positive or negative correlation with a particular relation word). [sent-33, score-0.91]

16 c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 1s3–217, Triples can be easily acquired from automatically parsed corpora. [sent-36, score-0.032]

17 While the relation word is usually part of the utterance that represents the source of the relation, determining the appropriate target (antecedent) of the relation may be difficult to achieve. [sent-37, score-0.678]

18 , the target is identified with the main event of the preceding utterance. [sent-40, score-0.357]

19 1 The BKB can be constructed from a sufficiently large corpus as follows: • • identify event types and relation words for every utterance create a candidate triple consisting of the event type of the utterance, the relation word, and the event type of the preceding utterance. [sent-41, score-1.528]

20 , where the factual target of the relation was non-adjacent) are equally distributed, they should be filtered out by the significance tests. [sent-45, score-0.44]

21 The goal of this paper is to evaluate the validity of this approach. [sent-46, score-0.072]

22 2 Experimental Setup By generalizing over multiple occurrences of the same events (or, more precisely, event types), one can identify preferences of event pairs for one or several relation words. [sent-47, score-1.115]

23 These preferences capture context-invariant characteristics of pairs of events and are thus to considered to reflect a semantic predisposition for a particular discourse relation. [sent-48, score-0.826]

24 Formally, an event is the semantic representation of the meaning conveyed in the utterance. [sent-49, score-0.234]

25 We 1Relations between non-adjacent utterances are constrained by the structure of discourse (Webber 1991), and thus less likely than relations between adjacent utterances. [sent-50, score-0.724]

26 214 assume that the same event can reoccur in different contexts, we are thus studying relations between types of events. [sent-51, score-0.405]

27 For the experiment described here, events are heuristically identified with the main predicates of a sentence, i. [sent-52, score-0.328]

28 , non-auxiliar, noncausative, non-modal verbal lexemes that serve as heads of main clauses. [sent-54, score-0.033]

29 The primary data structure of the approach described here is a triple consisting of a source event, a relation word and a target (antecedent) event. [sent-55, score-0.415]

30 These triples are harvested from large syntactically annotated corpora. [sent-56, score-0.324]

31 For intersentential relations, the target is identified with the event of the immediately preceding main clause. [sent-57, score-0.357]

32 These extraction preferences are heuristic approximations, and thus, an additional pruning step is necessary. [sent-58, score-0.154]

33 0h5ic ahr teh eex xocblsuedrevde,d i positive or negative correlation between a pair of events and a relation word is not due to chance with at least 95% probability. [sent-64, score-0.666]

34 Assuming an even distribution of incorrect target events, this should rule these out. [sent-65, score-0.033]

35 Additionally, it also serves as a means of evaluation. [sent-66, score-0.028]

36 Using statistical significance tests as pruning criterion entails that all triples eventually confirmed are statistically significant. [sent-67, score-0.651]

37 2 This setup requires immense amounts of data: We are dealing with several thousand events (theoretically, the total number of verbs of a language). [sent-68, score-0.248]

38 The chance probability for two events to occur in adjacent position is thus far below 10−6, and it decreases further if the likelihood of a relation word is taken into consideration. [sent-69, score-0.635]

39 All things being equal, we thus need millions of sentences to create the BKB. [sent-70, score-0.031]

40 Here, two large-scale corpora of English are employed, PukWaC and Wackypedia EN (Baroni et al. [sent-71, score-0.035]

41 2Subsequent studies may employ less rigid pruning criteria. [sent-74, score-0.078]

42 For the purpose of the current paper, however, the statistical significance of all extracted triples serves as an criterion to evaluate methodological validity. [sent-75, score-0.5]

43 It is distributed in 5 parts; Only PukWaC1 to PukWaC-4 were considered here, constituting 82. [sent-78, score-0.032]

44 5M sentences) of the entire corpus, PukWaC-5 is left untouched for forthcoming evaluation experiments. [sent-80, score-0.122]

45 It is distributed in 4 different files; the last portion was left untouched for forthcoming evaluation experiments. [sent-83, score-0.154]

46 The extraction of events in these corpora uses simple patterns that combine dependency information and part-of-speech tags to retrieve the main verbs and store their lemmata as event types. [sent-87, score-0.517]

47 The target (antecedent) event was identified with the last main event of the preceding sentence. [sent-88, score-0.591]

48 As relation words, only sentence-initial children of the source event that were annotated as adverbial modifiers, verb modifiers or conjunctions were considered. [sent-89, score-0.584]

49 3 Evaluation To evaluate the validity of the approach, three fundamental questions need to be addressed: significance (are there significant correlations between pairs of events and relation words ? [sent-90, score-0.831]

50 ), reproducibility (can these correlations confirmed on independent data sets ? [sent-91, score-0.253]

51 ), and interpretability (can these correlations be interpreted in terms of theoretically-defined discourse relations ? [sent-92, score-0.711]

52 1 Significance and Reproducibility Significance tests are part of the pruning stage of the algorithm. [sent-95, score-0.119]

53 Therefore, the number of triples eventually retrieved confirms the existence of statistically significant correlations between pairs of events and relation words. [sent-96, score-1.047]

54 1 shows the number of triples obtained from PukWaC subcorpora of different size. [sent-98, score-0.396]

55 For reproducibility, compare the triples identified with Wackypedia EN and PukWaC subcorpora of different size: Table 1 shows the number of triples found in both Wackypedia EN and PukWaC, and the agreement between both resources. [sent-99, score-0.757]

56 For two triples involving the same events (event types) and the same relation word, agreement means that the relation word shows either positive or negative correlation 215 TasPbe13u7l4n2k98t. [sent-100, score-1.213]

57 n3509egative correlation of event pairs and relation words between Wackypedia EN and PukWaC subcorpora of different size TBH: thb ouetwnev r17 t1,o27,t0a95P41 ul2kWv6aCs,8. [sent-102, score-0.735]

58 9T2) Table 2: Agreement between but (B), however (H) and then (T) on PukWaC in both corpora, disagreement means positive correlation in one corpus and negative correlation in the other. [sent-107, score-0.158]

59 Table 1 confirms that results obtained on one resource can be reproduced on another. [sent-108, score-0.07]

60 This indicates that triples indeed capture context-invariant, and hence, semantic, characteristics of the relation between events. [sent-109, score-0.605]

61 The data also indicates that reproducibility increases with the size of corpora from which a BKB is built. [sent-110, score-0.156]

62 2 Interpretability Any theory of discourse relations would predict that relation words with similar function should have similar distributions, whereas one would expect different distributions for functionally unrelated relation words. [sent-112, score-1.184]

63 These expectations are tested here for three of the most frequent relation words found in the corpora, i. [sent-113, score-0.308]

64 2008); then, on the other hand, indicates a tem- poral and/or causal relation. [sent-117, score-0.045]

65 Table 2 confirms the expectation that event pairs that are correlated with but tend to show the same correlation with however, but not with then. [sent-118, score-0.399]

66 This information can be used, for example, to facilitate the semiautomated annotation of discourse relations, by pointing out the ‘default’ relation word for a given pair of events. [sent-120, score-0.71]

67 (2010) used a language model to predict discourse markers for implicitly realized discourse relations. [sent-122, score-0.919]

68 As opposed to this shallow, n-gram-based approach, here, the internal structure of utterances is exploited: based on semantic considerations, syntactic patterns have been devised that extract triples of event pairs and relation words. [sent-123, score-0.962]

69 The resulting BKB provides a distributional approximation of the discourse relations that can hold between two specific event types. [sent-124, score-0.856]

70 Both approaches exploit complementary sources of knowledge, and may be combined with each other to achieve a more precise prediction of implicit discourse connectives. [sent-125, score-0.429]

71 Another prospective field of application can be seen in NLP applications, where selection preferences for relation words may serve as a cheap replacement for full-fledged discourse parsing. [sent-127, score-0.872]

72 It would thus be easy to create background knowledge bases with relation words for other languages or specific domains given a sufficient amount of textual data. [sent-132, score-0.37]

73 – Related research includes, for example, the unsupervised recognition of causal and temporal relationships, as required, for example, for the recognition of textual entailment. [sent-133, score-0.072]

74 Riaz and Girju (2010) exploit distributional information about pairs of utterances. [sent-134, score-0.095]

75 Unlike approach described here, they are not restricted to adjacent utterances, and do not rely on explicit and recurrent relation words. [sent-135, score-0.324]

76 Their approach can thus be applied to comparably small data sets. [sent-136, score-0.057]

77 However, they are restricted to a specific type of relations whereas here the entire band- width of discourse relations that are explicitly realized in a language are covered. [sent-137, score-0.736]

78 Similar observations can be made with respect to Chambers and Jurafsky (2009) and Kasch and Oates (2010), who also study a single discourse relation (narration), and are thus more limited in scope than the approach described here. [sent-139, score-0.741]

79 However, as their approach extends beyond pairs of events to complex event chains, it seems that both approaches provide complementary types of information and their results could also be combined in a fruitful way to achieve a more detailed assessment of discourse relations. [sent-140, score-0.953]

80 The goal of this paper was to evaluate the methdological validity of the approach. [sent-141, score-0.072]

81 It thus represents the basis for further experiments, e. [sent-142, score-0.031]

82 Other directions of subsequent research may include address more elaborate models of events, and the investigation of the relationship between relation words and taxonomies of discourse relations. [sent-145, score-0.773]

83 Iwould also like to thank three anonymous reviewers for valuable comments and feedback, as well as Manfred Stede and Ed Hovy whose work on discourse relations on the one hand and proposition stores on the other hand have been the main inspiration for this paper. [sent-148, score-0.569]

84 Structure and ostension in the interpretation of discourse deixis. [sent-232, score-0.455]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('discourse', 0.429), ('triples', 0.324), ('bkb', 0.304), ('relation', 0.281), ('events', 0.248), ('event', 0.234), ('pukwac', 0.212), ('relations', 0.14), ('wackypedia', 0.132), ('reproducibility', 0.121), ('triple', 0.101), ('kasch', 0.091), ('riaz', 0.091), ('significance', 0.087), ('utterance', 0.083), ('utterances', 0.081), ('stede', 0.079), ('ferraresi', 0.079), ('correlation', 0.079), ('pruning', 0.078), ('preferences', 0.076), ('correlations', 0.074), ('knott', 0.072), ('subcorpora', 0.072), ('validity', 0.072), ('baroni', 0.068), ('webber', 0.068), ('antecedent', 0.068), ('interpretability', 0.068), ('forthcoming', 0.061), ('gernsbacher', 0.061), ('untouched', 0.061), ('vallduv', 0.061), ('prasad', 0.058), ('confirmed', 0.058), ('reproducible', 0.053), ('oates', 0.053), ('distributional', 0.053), ('preceding', 0.053), ('chambers', 0.052), ('causal', 0.045), ('confirms', 0.044), ('en', 0.044), ('adjacent', 0.043), ('heuristically', 0.043), ('girju', 0.043), ('pairs', 0.042), ('tests', 0.041), ('maltparser', 0.041), ('narrative', 0.041), ('profiles', 0.041), ('factual', 0.039), ('identified', 0.037), ('mann', 0.037), ('dale', 0.036), ('taxonomies', 0.036), ('frequency', 0.035), ('conjunctions', 0.035), ('corpora', 0.035), ('markers', 0.034), ('modifiers', 0.034), ('eventually', 0.034), ('target', 0.033), ('serve', 0.033), ('distributed', 0.032), ('parsed', 0.032), ('purpose', 0.032), ('chance', 0.032), ('thus', 0.031), ('connected', 0.031), ('processes', 0.031), ('joshi', 0.031), ('bonnie', 0.031), ('relative', 0.031), ('background', 0.031), ('assuming', 0.029), ('criterion', 0.029), ('serves', 0.028), ('unsupervised', 0.027), ('realized', 0.027), ('words', 0.027), ('ceas', 0.026), ('citeseer', 0.026), ('sfb', 0.026), ('arrangement', 0.026), ('comparably', 0.026), ('manifested', 0.026), ('chiarcos', 0.026), ('functionally', 0.026), ('tbh', 0.026), ('admiralty', 0.026), ('ukwac', 0.026), ('icsc', 0.026), ('ahr', 0.026), ('dordrecht', 0.026), ('adverbials', 0.026), ('prospective', 0.026), ('morton', 0.026), ('ostension', 0.026), ('reproduced', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

Author: Christian Chiarcos

Abstract: This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in the distribution of relation words (that serve to signal these relations). Experiments on two large-scale English web corpora show that significant correlations between pairs of adjacent events and relation words exist, that they are reproducible on different data sets, and for three relation words, that their distribution corresponds to theorybased assumptions. 1 Motivation Texts are not merely accumulations of isolated utterances, but the arrangement of utterances conveys meaning; human text understanding can thus be described as a process to recover the global structure of texts and the relations linking its different parts (Vallduv ı´ 1992; Gernsbacher et al. 2004). To capture these aspects of meaning in NLP, it is necessary to develop operationalizable theories, and, within a supervised approach, large amounts of annotated training data. To facilitate manual annotation, weakly supervised or unsupervised techniques can be applied as preprocessing step for semimanual annotation, and this is part of the motivation of the approach described here. 213 Discourse relations involve different aspects of meaning. This may include factual knowledge about the connected discourse segments (a ‘subjectmatter’ relation, e.g., if one utterance represents the cause for another, Mann and Thompson 1988, p.257), argumentative purposes (a ‘presentational’ relation, e.g., one utterance motivates the reader to accept a claim formulated in another utterance, ibid., p.257), or relations between entities mentioned in the connected discourse segments (anaphoric relations, Webber et al. 2003). Discourse relations can be indicated explicitly by optional cues, e.g., adverbials (e.g., however), conjunctions (e.g., but), or complex phrases (e.g., in contrast to what Peter said a minute ago). Here, these cues are referred to as relation words. Assuming that relation words are associated with specific discourse relations (Knott and Dale 1994; Prasad et al. 2008), the distribution of relation words found between two (types of) events can yield insights into the range of discourse relations possible at this occasion and their respective likeliness. For this purpose, this paper proposes a background knowledge base (BKB) that hosts pairs of events (here heuristically represented by verbs) along with distributional profiles for relation words. The primary data structure of the BKB is a triple where one event (type) is connected with a particular relation word to another event (type). Triples are further augmented with a frequency score (expressing the likelihood of the triple to be observed), a significance score (see below), and a correlation score (indicating whether a pair of events has a positive or negative correlation with a particular relation word). ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 1s3–217, Triples can be easily acquired from automatically parsed corpora. While the relation word is usually part of the utterance that represents the source of the relation, determining the appropriate target (antecedent) of the relation may be difficult to achieve. As a heuristic, an adjacency preference is adopted, i.e., the target is identified with the main event of the preceding utterance.1 The BKB can be constructed from a sufficiently large corpus as follows: • • identify event types and relation words for every utterance create a candidate triple consisting of the event type of the utterance, the relation word, and the event type of the preceding utterance. add the candidate triple to the BKB, if it found in the BKB, increase its score by (or initialize it with) 1, – – • perform a pruning on all candidate triples, calcpuerlaftoer significance aonnd a lclo crarneldaitdioante scores Pruning uses statistical significance tests to evaluate whether the relative frequency of a relation word for a pair of events is significantly higher or lower than the relative frequency of the relation word in the entire corpus. Assuming that incorrect candidate triples (i.e., where the factual target of the relation was non-adjacent) are equally distributed, they should be filtered out by the significance tests. The goal of this paper is to evaluate the validity of this approach. 2 Experimental Setup By generalizing over multiple occurrences of the same events (or, more precisely, event types), one can identify preferences of event pairs for one or several relation words. These preferences capture context-invariant characteristics of pairs of events and are thus to considered to reflect a semantic predisposition for a particular discourse relation. Formally, an event is the semantic representation of the meaning conveyed in the utterance. We 1Relations between non-adjacent utterances are constrained by the structure of discourse (Webber 1991), and thus less likely than relations between adjacent utterances. 214 assume that the same event can reoccur in different contexts, we are thus studying relations between types of events. For the experiment described here, events are heuristically identified with the main predicates of a sentence, i.e., non-auxiliar, noncausative, non-modal verbal lexemes that serve as heads of main clauses. The primary data structure of the approach described here is a triple consisting of a source event, a relation word and a target (antecedent) event. These triples are harvested from large syntactically annotated corpora. For intersentential relations, the target is identified with the event of the immediately preceding main clause. These extraction preferences are heuristic approximations, and thus, an additional pruning step is necessary. For this purpose, statistical significance tests are adopted (χ2 for triples of frequent events and relation words, t-test for rare events and/or relation words) that compare the relative frequency of a rela- tion word given a pair of events with the relative frequency of the relation word in the entire corpus. All results with p ≥ .05 are excluded, i.e., only triples are preserved pfo ≥r w .0h5ic ahr teh eex xocblsuedrevde,d i positive or negative correlation between a pair of events and a relation word is not due to chance with at least 95% probability. Assuming an even distribution of incorrect target events, this should rule these out. Additionally, it also serves as a means of evaluation. Using statistical significance tests as pruning criterion entails that all triples eventually confirmed are statistically significant.2 This setup requires immense amounts of data: We are dealing with several thousand events (theoretically, the total number of verbs of a language). The chance probability for two events to occur in adjacent position is thus far below 10−6, and it decreases further if the likelihood of a relation word is taken into consideration. All things being equal, we thus need millions of sentences to create the BKB. Here, two large-scale corpora of English are employed, PukWaC and Wackypedia EN (Baroni et al. 2009). PukWaC is a 2G-token web corpus of British English crawled from the uk domain (Ferraresi et al. 2Subsequent studies may employ less rigid pruning criteria. For the purpose of the current paper, however, the statistical significance of all extracted triples serves as an criterion to evaluate methodological validity. 2008), and parsed with MaltParser (Nivre et al. 2006). It is distributed in 5 parts; Only PukWaC1 to PukWaC-4 were considered here, constituting 82.2% (72.5M sentences) of the entire corpus, PukWaC-5 is left untouched for forthcoming evaluation experiments. Wackypedia EN is a 0.8G-token dump of the English Wikipedia, annotated with the same tools. It is distributed in 4 different files; the last portion was left untouched for forthcoming evaluation experiments. The portion analyzed here comprises 33.2M sentences, 75.9% of the corpus. The extraction of events in these corpora uses simple patterns that combine dependency information and part-of-speech tags to retrieve the main verbs and store their lemmata as event types. The target (antecedent) event was identified with the last main event of the preceding sentence. As relation words, only sentence-initial children of the source event that were annotated as adverbial modifiers, verb modifiers or conjunctions were considered. 3 Evaluation To evaluate the validity of the approach, three fundamental questions need to be addressed: significance (are there significant correlations between pairs of events and relation words ?), reproducibility (can these correlations confirmed on independent data sets ?), and interpretability (can these correlations be interpreted in terms of theoretically-defined discourse relations ?). 3.1 Significance and Reproducibility Significance tests are part of the pruning stage of the algorithm. Therefore, the number of triples eventually retrieved confirms the existence of statistically significant correlations between pairs of events and relation words. The left column of Tab. 1 shows the number of triples obtained from PukWaC subcorpora of different size. For reproducibility, compare the triples identified with Wackypedia EN and PukWaC subcorpora of different size: Table 1 shows the number of triples found in both Wackypedia EN and PukWaC, and the agreement between both resources. For two triples involving the same events (event types) and the same relation word, agreement means that the relation word shows either positive or negative correlation 215 TasPbe13u7l4n2k98t. We254Mn1a c:CeAs(gurb42)et760cr8m,iop3e61r4l28np0st6uwicho21rm9W,e2673mas048p7c3okenytpdoagi21p8r,o35eE0s29Nit36nvgreipol8796r50s9%.n3509egative correlation of event pairs and relation words between Wackypedia EN and PukWaC subcorpora of different size TBH: thb ouetwnev r17 t1,o27,t0a95P41 ul2kWv6aCs,8.0 Htr5iple1v s, 45.12T35av9sg7.reH7em nv6 ts62(. %.9T2) Table 2: Agreement between but (B), however (H) and then (T) on PukWaC in both corpora, disagreement means positive correlation in one corpus and negative correlation in the other. Table 1 confirms that results obtained on one resource can be reproduced on another. This indicates that triples indeed capture context-invariant, and hence, semantic, characteristics of the relation between events. The data also indicates that reproducibility increases with the size of corpora from which a BKB is built. 3.2 Interpretability Any theory of discourse relations would predict that relation words with similar function should have similar distributions, whereas one would expect different distributions for functionally unrelated relation words. These expectations are tested here for three of the most frequent relation words found in the corpora, i.e., but, then and however. But and however can be grouped together under a generalized notion of contrast (Knott and Dale 1994; Prasad et al. 2008); then, on the other hand, indicates a tem- poral and/or causal relation. Table 2 confirms the expectation that event pairs that are correlated with but tend to show the same correlation with however, but not with then. 4 Discussion and Outlook This paper described a novel approach towards the unsupervised acquisition of discourse relations, with encouraging preliminary results: Large collections of parsed text are used to assess distributional profiles of relation words that indicate discourse relations that are possible between specific types of events; on this basis, a background knowledge base (BKB) was created that can be used to predict an appropriatediscoursemarkertoconnecttwoutterances with no overt relation word. This information can be used, for example, to facilitate the semiautomated annotation of discourse relations, by pointing out the ‘default’ relation word for a given pair of events. Similarly, Zhou et al. (2010) used a language model to predict discourse markers for implicitly realized discourse relations. As opposed to this shallow, n-gram-based approach, here, the internal structure of utterances is exploited: based on semantic considerations, syntactic patterns have been devised that extract triples of event pairs and relation words. The resulting BKB provides a distributional approximation of the discourse relations that can hold between two specific event types. Both approaches exploit complementary sources of knowledge, and may be combined with each other to achieve a more precise prediction of implicit discourse connectives. The validity of the approach was evaluated with respect to three evaluation criteria: The extracted associations between relation words and event pairs could be shown to be statistically significant, and to be reproducible on other corpora; for three highly frequent relation words, theoretical predictions about their relative distribution could be confirmed, indicating their interpretability in terms of presupposed taxonomies of discourse relations. Another prospective field of application can be seen in NLP applications, where selection preferences for relation words may serve as a cheap replacement for full-fledged discourse parsing. In the Natural Language Understanding domain, the BKB may help to disambiguate or to identify discourse relations between different events; in the context of Machine Translation, it may represent a factor guid- ing the insertion of relation words, a task that has been found to be problematic for languages that dif216 fer in their inventory and usage of discourse markers, e.g., German and English (Stede and Schmitz 2000). The approach is language-independent (except for the syntactic extraction patterns), and it does not require manually annotated data. It would thus be easy to create background knowledge bases with relation words for other languages or specific domains given a sufficient amount of textual data. – Related research includes, for example, the unsupervised recognition of causal and temporal relationships, as required, for example, for the recognition of textual entailment. Riaz and Girju (2010) exploit distributional information about pairs of utterances. Unlike approach described here, they are not restricted to adjacent utterances, and do not rely on explicit and recurrent relation words. Their approach can thus be applied to comparably small data sets. However, they are restricted to a specific type of relations whereas here the entire band- width of discourse relations that are explicitly realized in a language are covered. Prospectively, both approaches could be combined to compensate their respective weaknesses. Similar observations can be made with respect to Chambers and Jurafsky (2009) and Kasch and Oates (2010), who also study a single discourse relation (narration), and are thus more limited in scope than the approach described here. However, as their approach extends beyond pairs of events to complex event chains, it seems that both approaches provide complementary types of information and their results could also be combined in a fruitful way to achieve a more detailed assessment of discourse relations. The goal of this paper was to evaluate the methdological validity of the approach. It thus represents the basis for further experiments, e.g., with respect to the enrichment the BKB with information provided by Riaz and Girju (2010), Chambers and Jurafsky (2009) and Kasch and Oates (2010). Other directions of subsequent research may include address more elaborate models of events, and the investigation of the relationship between relation words and taxonomies of discourse relations. Acknowledgments This work was supported by a fellowship within the Postdoc program of the German Academic Exchange Service (DAAD). Initial experiments were conducted at the Collaborative Research Center (SFB) 632 “Information Structure” at the University of Potsdam, Germany. Iwould also like to thank three anonymous reviewers for valuable comments and feedback, as well as Manfred Stede and Ed Hovy whose work on discourse relations on the one hand and proposition stores on the other hand have been the main inspiration for this paper. References M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226, 2009. N. Chambers and D. Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 602–610. Association for Computational Linguistics, 2009. A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini. Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the 4th Web as Corpus Workshop (WAC-4) Can we beat Google, pages 47–54, 2008. Morton Ann Gernsbacher, Rachel R. W. Robertson, Paola Palladino, and Necia K. Werner. Managing mental representations during narrative comprehension. Discourse Processes, 37(2): 145–164, 2004. N. Kasch and T. Oates. Mining script-like structures from the web. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 34–42. Association for Computational Linguistics, 2010. A. Knott and R. Dale. Using linguistic phenomena to motivate a set ofcoherence relations. Discourse processes, 18(1):35–62, 1994. 217 J. van Kuppevelt and R. Smith, editors. Current Directions in Discourse andDialogue. Kluwer, Dordrecht, 2003. William C. Mann and Sandra A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988. J. Nivre, J. Hall, and J. Nilsson. Maltparser: A data-driven parser-generator for dependency parsing. In Proc. of LREC, pages 2216–2219. Citeseer, 2006. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. The penn discourse treebank 2.0. In Proc. 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008. M. Riaz and R. Girju. Another look at causality: Discovering scenario-specific contingency relationships with no supervision. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on, pages 361–368. IEEE, 2010. M. Stede and B. Schmitz. Discourse particles and discourse functions. Machine translation, 15(1): 125–147, 2000. Enric Vallduv ı´. The Informational Component. Garland, New York, 1992. Bonnie L. Webber. Structure and ostension in the interpretation of discourse deixis. Natural Language and Cognitive Processes, 2(6): 107–135, 1991. Bonnie L. Webber, Matthew Stone, Aravind K. Joshi, and Alistair Knott. Anaphora and discourse structure. Computational Linguistics, 4(29):545– 587, 2003. Z.-M. Zhou, Y. Xu, Z.-Y. Niu, M. Lan, J. Su, and C.L. Tan. Predicting discourse connectives for implicit discourse relation recognition. In COLING 2010, pages 1507–15 14, Beijing, China, August 2010.

2 0.37793431 193 acl-2012-Text-level Discourse Parsing with Rich Linguistic Features

Author: Vanessa Wei Feng ; Graeme Hirst

Abstract: In this paper, we develop an RST-style textlevel discourse parser, based on the HILDA discourse parser (Hernault et al., 2010b). We significantly improve its tree-building step by incorporating our own rich linguistic features. We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions.

3 0.2950469 157 acl-2012-PDTB-style Discourse Annotation of Chinese Text

Author: Yuping Zhou ; Nianwen Xue

Abstract: We describe a discourse annotation scheme for Chinese and report on the preliminary results. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text. Annotation results show that these adaptations work well in practice. Our scheme, taken together with other PDTB-style schemes (e.g. for English, Turkish, Hindi, and Czech), affords a broader perspective on how the generalized lexically grounded approach can flesh itself out in the context of cross-linguistic annotation of discourse relations.

4 0.23753364 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive

Author: Joel Nothman ; Matthew Honnibal ; Ben Hachey ; James R. Curran

Abstract: Interpreting news requires identifying its constituent events. Events are complex linguistically and ontologically, so disambiguating their reference is challenging. We introduce event linking, which canonically labels an event reference with the article where it was first reported. This implicitly relaxes coreference to co-reporting, and will practically enable augmenting news archives with semantic hyperlinks. We annotate and analyse a corpus of 150 documents, extracting 501 links to a news archive with reasonable inter-annotator agreement.

5 0.22559591 47 acl-2012-Chinese Comma Disambiguation for Discourse Analysis

Author: Yaqin Yang ; Nianwen Xue

Abstract: The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structureoriented classification of the comma that can be automatically extracted from the Chinese Treebank based on syntactic patterns. We then experimented with two supervised learning methods that automatically disambiguate the Chinese comma based on this classification. The first method integrates comma classification into parsing, and the second method adopts a “post-processing” approach that extracts features from automatic parses to train a classifier. The experimental results show that the second approach compares favorably against the first approach.

6 0.17414621 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

7 0.16741998 191 acl-2012-Temporally Anchored Relation Extraction

8 0.14314452 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

9 0.14056963 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

10 0.1360528 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

11 0.11546271 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text

12 0.10774799 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection

13 0.098256953 208 acl-2012-Unsupervised Relation Discovery with Sense Disambiguation

14 0.097914457 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

15 0.087204874 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

16 0.075672857 91 acl-2012-Extracting and modeling durations for habits and events from Twitter

17 0.075429633 73 acl-2012-Discriminative Learning for Joint Template Filling

18 0.074990883 169 acl-2012-Reducing Wrong Labels in Distant Supervision for Relation Extraction

19 0.069250189 177 acl-2012-Sentence Dependency Tagging in Online Question Answering Forums

20 0.067729898 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.203), (1, 0.175), (2, -0.143), (3, 0.246), (4, 0.075), (5, -0.198), (6, -0.217), (7, -0.144), (8, -0.228), (9, 0.3), (10, -0.05), (11, -0.044), (12, 0.012), (13, -0.013), (14, 0.098), (15, -0.038), (16, -0.04), (17, 0.06), (18, 0.01), (19, 0.033), (20, -0.07), (21, 0.022), (22, -0.08), (23, 0.002), (24, -0.036), (25, -0.093), (26, -0.0), (27, 0.052), (28, -0.092), (29, 0.041), (30, 0.053), (31, -0.085), (32, 0.04), (33, -0.01), (34, -0.019), (35, -0.008), (36, 0.0), (37, -0.082), (38, -0.003), (39, -0.042), (40, -0.084), (41, -0.036), (42, 0.022), (43, 0.005), (44, -0.006), (45, 0.046), (46, 0.09), (47, 0.038), (48, 0.094), (49, 0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97368139 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

Author: Christian Chiarcos

Abstract: This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in the distribution of relation words (that serve to signal these relations). Experiments on two large-scale English web corpora show that significant correlations between pairs of adjacent events and relation words exist, that they are reproducible on different data sets, and for three relation words, that their distribution corresponds to theorybased assumptions. 1 Motivation Texts are not merely accumulations of isolated utterances, but the arrangement of utterances conveys meaning; human text understanding can thus be described as a process to recover the global structure of texts and the relations linking its different parts (Vallduv ı´ 1992; Gernsbacher et al. 2004). To capture these aspects of meaning in NLP, it is necessary to develop operationalizable theories, and, within a supervised approach, large amounts of annotated training data. To facilitate manual annotation, weakly supervised or unsupervised techniques can be applied as preprocessing step for semimanual annotation, and this is part of the motivation of the approach described here. 213 Discourse relations involve different aspects of meaning. This may include factual knowledge about the connected discourse segments (a ‘subjectmatter’ relation, e.g., if one utterance represents the cause for another, Mann and Thompson 1988, p.257), argumentative purposes (a ‘presentational’ relation, e.g., one utterance motivates the reader to accept a claim formulated in another utterance, ibid., p.257), or relations between entities mentioned in the connected discourse segments (anaphoric relations, Webber et al. 2003). Discourse relations can be indicated explicitly by optional cues, e.g., adverbials (e.g., however), conjunctions (e.g., but), or complex phrases (e.g., in contrast to what Peter said a minute ago). Here, these cues are referred to as relation words. Assuming that relation words are associated with specific discourse relations (Knott and Dale 1994; Prasad et al. 2008), the distribution of relation words found between two (types of) events can yield insights into the range of discourse relations possible at this occasion and their respective likeliness. For this purpose, this paper proposes a background knowledge base (BKB) that hosts pairs of events (here heuristically represented by verbs) along with distributional profiles for relation words. The primary data structure of the BKB is a triple where one event (type) is connected with a particular relation word to another event (type). Triples are further augmented with a frequency score (expressing the likelihood of the triple to be observed), a significance score (see below), and a correlation score (indicating whether a pair of events has a positive or negative correlation with a particular relation word). ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 1s3–217, Triples can be easily acquired from automatically parsed corpora. While the relation word is usually part of the utterance that represents the source of the relation, determining the appropriate target (antecedent) of the relation may be difficult to achieve. As a heuristic, an adjacency preference is adopted, i.e., the target is identified with the main event of the preceding utterance.1 The BKB can be constructed from a sufficiently large corpus as follows: • • identify event types and relation words for every utterance create a candidate triple consisting of the event type of the utterance, the relation word, and the event type of the preceding utterance. add the candidate triple to the BKB, if it found in the BKB, increase its score by (or initialize it with) 1, – – • perform a pruning on all candidate triples, calcpuerlaftoer significance aonnd a lclo crarneldaitdioante scores Pruning uses statistical significance tests to evaluate whether the relative frequency of a relation word for a pair of events is significantly higher or lower than the relative frequency of the relation word in the entire corpus. Assuming that incorrect candidate triples (i.e., where the factual target of the relation was non-adjacent) are equally distributed, they should be filtered out by the significance tests. The goal of this paper is to evaluate the validity of this approach. 2 Experimental Setup By generalizing over multiple occurrences of the same events (or, more precisely, event types), one can identify preferences of event pairs for one or several relation words. These preferences capture context-invariant characteristics of pairs of events and are thus to considered to reflect a semantic predisposition for a particular discourse relation. Formally, an event is the semantic representation of the meaning conveyed in the utterance. We 1Relations between non-adjacent utterances are constrained by the structure of discourse (Webber 1991), and thus less likely than relations between adjacent utterances. 214 assume that the same event can reoccur in different contexts, we are thus studying relations between types of events. For the experiment described here, events are heuristically identified with the main predicates of a sentence, i.e., non-auxiliar, noncausative, non-modal verbal lexemes that serve as heads of main clauses. The primary data structure of the approach described here is a triple consisting of a source event, a relation word and a target (antecedent) event. These triples are harvested from large syntactically annotated corpora. For intersentential relations, the target is identified with the event of the immediately preceding main clause. These extraction preferences are heuristic approximations, and thus, an additional pruning step is necessary. For this purpose, statistical significance tests are adopted (χ2 for triples of frequent events and relation words, t-test for rare events and/or relation words) that compare the relative frequency of a rela- tion word given a pair of events with the relative frequency of the relation word in the entire corpus. All results with p ≥ .05 are excluded, i.e., only triples are preserved pfo ≥r w .0h5ic ahr teh eex xocblsuedrevde,d i positive or negative correlation between a pair of events and a relation word is not due to chance with at least 95% probability. Assuming an even distribution of incorrect target events, this should rule these out. Additionally, it also serves as a means of evaluation. Using statistical significance tests as pruning criterion entails that all triples eventually confirmed are statistically significant.2 This setup requires immense amounts of data: We are dealing with several thousand events (theoretically, the total number of verbs of a language). The chance probability for two events to occur in adjacent position is thus far below 10−6, and it decreases further if the likelihood of a relation word is taken into consideration. All things being equal, we thus need millions of sentences to create the BKB. Here, two large-scale corpora of English are employed, PukWaC and Wackypedia EN (Baroni et al. 2009). PukWaC is a 2G-token web corpus of British English crawled from the uk domain (Ferraresi et al. 2Subsequent studies may employ less rigid pruning criteria. For the purpose of the current paper, however, the statistical significance of all extracted triples serves as an criterion to evaluate methodological validity. 2008), and parsed with MaltParser (Nivre et al. 2006). It is distributed in 5 parts; Only PukWaC1 to PukWaC-4 were considered here, constituting 82.2% (72.5M sentences) of the entire corpus, PukWaC-5 is left untouched for forthcoming evaluation experiments. Wackypedia EN is a 0.8G-token dump of the English Wikipedia, annotated with the same tools. It is distributed in 4 different files; the last portion was left untouched for forthcoming evaluation experiments. The portion analyzed here comprises 33.2M sentences, 75.9% of the corpus. The extraction of events in these corpora uses simple patterns that combine dependency information and part-of-speech tags to retrieve the main verbs and store their lemmata as event types. The target (antecedent) event was identified with the last main event of the preceding sentence. As relation words, only sentence-initial children of the source event that were annotated as adverbial modifiers, verb modifiers or conjunctions were considered. 3 Evaluation To evaluate the validity of the approach, three fundamental questions need to be addressed: significance (are there significant correlations between pairs of events and relation words ?), reproducibility (can these correlations confirmed on independent data sets ?), and interpretability (can these correlations be interpreted in terms of theoretically-defined discourse relations ?). 3.1 Significance and Reproducibility Significance tests are part of the pruning stage of the algorithm. Therefore, the number of triples eventually retrieved confirms the existence of statistically significant correlations between pairs of events and relation words. The left column of Tab. 1 shows the number of triples obtained from PukWaC subcorpora of different size. For reproducibility, compare the triples identified with Wackypedia EN and PukWaC subcorpora of different size: Table 1 shows the number of triples found in both Wackypedia EN and PukWaC, and the agreement between both resources. For two triples involving the same events (event types) and the same relation word, agreement means that the relation word shows either positive or negative correlation 215 TasPbe13u7l4n2k98t. We254Mn1a c:CeAs(gurb42)et760cr8m,iop3e61r4l28np0st6uwicho21rm9W,e2673mas048p7c3okenytpdoagi21p8r,o35eE0s29Nit36nvgreipol8796r50s9%.n3509egative correlation of event pairs and relation words between Wackypedia EN and PukWaC subcorpora of different size TBH: thb ouetwnev r17 t1,o27,t0a95P41 ul2kWv6aCs,8.0 Htr5iple1v s, 45.12T35av9sg7.reH7em nv6 ts62(. %.9T2) Table 2: Agreement between but (B), however (H) and then (T) on PukWaC in both corpora, disagreement means positive correlation in one corpus and negative correlation in the other. Table 1 confirms that results obtained on one resource can be reproduced on another. This indicates that triples indeed capture context-invariant, and hence, semantic, characteristics of the relation between events. The data also indicates that reproducibility increases with the size of corpora from which a BKB is built. 3.2 Interpretability Any theory of discourse relations would predict that relation words with similar function should have similar distributions, whereas one would expect different distributions for functionally unrelated relation words. These expectations are tested here for three of the most frequent relation words found in the corpora, i.e., but, then and however. But and however can be grouped together under a generalized notion of contrast (Knott and Dale 1994; Prasad et al. 2008); then, on the other hand, indicates a tem- poral and/or causal relation. Table 2 confirms the expectation that event pairs that are correlated with but tend to show the same correlation with however, but not with then. 4 Discussion and Outlook This paper described a novel approach towards the unsupervised acquisition of discourse relations, with encouraging preliminary results: Large collections of parsed text are used to assess distributional profiles of relation words that indicate discourse relations that are possible between specific types of events; on this basis, a background knowledge base (BKB) was created that can be used to predict an appropriatediscoursemarkertoconnecttwoutterances with no overt relation word. This information can be used, for example, to facilitate the semiautomated annotation of discourse relations, by pointing out the ‘default’ relation word for a given pair of events. Similarly, Zhou et al. (2010) used a language model to predict discourse markers for implicitly realized discourse relations. As opposed to this shallow, n-gram-based approach, here, the internal structure of utterances is exploited: based on semantic considerations, syntactic patterns have been devised that extract triples of event pairs and relation words. The resulting BKB provides a distributional approximation of the discourse relations that can hold between two specific event types. Both approaches exploit complementary sources of knowledge, and may be combined with each other to achieve a more precise prediction of implicit discourse connectives. The validity of the approach was evaluated with respect to three evaluation criteria: The extracted associations between relation words and event pairs could be shown to be statistically significant, and to be reproducible on other corpora; for three highly frequent relation words, theoretical predictions about their relative distribution could be confirmed, indicating their interpretability in terms of presupposed taxonomies of discourse relations. Another prospective field of application can be seen in NLP applications, where selection preferences for relation words may serve as a cheap replacement for full-fledged discourse parsing. In the Natural Language Understanding domain, the BKB may help to disambiguate or to identify discourse relations between different events; in the context of Machine Translation, it may represent a factor guid- ing the insertion of relation words, a task that has been found to be problematic for languages that dif216 fer in their inventory and usage of discourse markers, e.g., German and English (Stede and Schmitz 2000). The approach is language-independent (except for the syntactic extraction patterns), and it does not require manually annotated data. It would thus be easy to create background knowledge bases with relation words for other languages or specific domains given a sufficient amount of textual data. – Related research includes, for example, the unsupervised recognition of causal and temporal relationships, as required, for example, for the recognition of textual entailment. Riaz and Girju (2010) exploit distributional information about pairs of utterances. Unlike approach described here, they are not restricted to adjacent utterances, and do not rely on explicit and recurrent relation words. Their approach can thus be applied to comparably small data sets. However, they are restricted to a specific type of relations whereas here the entire band- width of discourse relations that are explicitly realized in a language are covered. Prospectively, both approaches could be combined to compensate their respective weaknesses. Similar observations can be made with respect to Chambers and Jurafsky (2009) and Kasch and Oates (2010), who also study a single discourse relation (narration), and are thus more limited in scope than the approach described here. However, as their approach extends beyond pairs of events to complex event chains, it seems that both approaches provide complementary types of information and their results could also be combined in a fruitful way to achieve a more detailed assessment of discourse relations. The goal of this paper was to evaluate the methdological validity of the approach. It thus represents the basis for further experiments, e.g., with respect to the enrichment the BKB with information provided by Riaz and Girju (2010), Chambers and Jurafsky (2009) and Kasch and Oates (2010). Other directions of subsequent research may include address more elaborate models of events, and the investigation of the relationship between relation words and taxonomies of discourse relations. Acknowledgments This work was supported by a fellowship within the Postdoc program of the German Academic Exchange Service (DAAD). Initial experiments were conducted at the Collaborative Research Center (SFB) 632 “Information Structure” at the University of Potsdam, Germany. Iwould also like to thank three anonymous reviewers for valuable comments and feedback, as well as Manfred Stede and Ed Hovy whose work on discourse relations on the one hand and proposition stores on the other hand have been the main inspiration for this paper. References M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226, 2009. N. Chambers and D. Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 602–610. Association for Computational Linguistics, 2009. A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini. Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the 4th Web as Corpus Workshop (WAC-4) Can we beat Google, pages 47–54, 2008. Morton Ann Gernsbacher, Rachel R. W. Robertson, Paola Palladino, and Necia K. Werner. Managing mental representations during narrative comprehension. Discourse Processes, 37(2): 145–164, 2004. N. Kasch and T. Oates. Mining script-like structures from the web. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 34–42. Association for Computational Linguistics, 2010. A. Knott and R. Dale. Using linguistic phenomena to motivate a set ofcoherence relations. Discourse processes, 18(1):35–62, 1994. 217 J. van Kuppevelt and R. Smith, editors. Current Directions in Discourse andDialogue. Kluwer, Dordrecht, 2003. William C. Mann and Sandra A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988. J. Nivre, J. Hall, and J. Nilsson. Maltparser: A data-driven parser-generator for dependency parsing. In Proc. of LREC, pages 2216–2219. Citeseer, 2006. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. The penn discourse treebank 2.0. In Proc. 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008. M. Riaz and R. Girju. Another look at causality: Discovering scenario-specific contingency relationships with no supervision. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on, pages 361–368. IEEE, 2010. M. Stede and B. Schmitz. Discourse particles and discourse functions. Machine translation, 15(1): 125–147, 2000. Enric Vallduv ı´. The Informational Component. Garland, New York, 1992. Bonnie L. Webber. Structure and ostension in the interpretation of discourse deixis. Natural Language and Cognitive Processes, 2(6): 107–135, 1991. Bonnie L. Webber, Matthew Stone, Aravind K. Joshi, and Alistair Knott. Anaphora and discourse structure. Computational Linguistics, 4(29):545– 587, 2003. Z.-M. Zhou, Y. Xu, Z.-Y. Niu, M. Lan, J. Su, and C.L. Tan. Predicting discourse connectives for implicit discourse relation recognition. In COLING 2010, pages 1507–15 14, Beijing, China, August 2010.

2 0.85845506 157 acl-2012-PDTB-style Discourse Annotation of Chinese Text

Author: Yuping Zhou ; Nianwen Xue

Abstract: We describe a discourse annotation scheme for Chinese and report on the preliminary results. Our scheme, inspired by the Penn Discourse TreeBank (PDTB), adopts the lexically grounded approach; at the same time, it makes adaptations based on the linguistic and statistical characteristics of Chinese text. Annotation results show that these adaptations work well in practice. Our scheme, taken together with other PDTB-style schemes (e.g. for English, Turkish, Hindi, and Czech), affords a broader perspective on how the generalized lexically grounded approach can flesh itself out in the context of cross-linguistic annotation of discourse relations.

3 0.81630391 193 acl-2012-Text-level Discourse Parsing with Rich Linguistic Features

Author: Vanessa Wei Feng ; Graeme Hirst

Abstract: In this paper, we develop an RST-style textlevel discourse parser, based on the HILDA discourse parser (Hernault et al., 2010b). We significantly improve its tree-building step by incorporating our own rich linguistic features. We also analyze the difficulty of extending traditional sentence-level discourse parsing to text-level parsing by comparing discourseparsing performance under different discourse conditions.

4 0.75790095 47 acl-2012-Chinese Comma Disambiguation for Discourse Analysis

Author: Yaqin Yang ; Nianwen Xue

Abstract: The Chinese comma signals the boundary of discourse units and also anchors discourse relations between adjacent text spans. In this work, we propose a discourse structureoriented classification of the comma that can be automatically extracted from the Chinese Treebank based on syntactic patterns. We then experimented with two supervised learning methods that automatically disambiguate the Chinese comma based on this classification. The first method integrates comma classification into parsing, and the second method adopts a “post-processing” approach that extracts features from automatic parses to train a classifier. The experimental results show that the second approach compares favorably against the first approach.

5 0.54817581 85 acl-2012-Event Linking: Grounding Event Reference in a News Archive

Author: Joel Nothman ; Matthew Honnibal ; Ben Hachey ; James R. Curran

Abstract: Interpreting news requires identifying its constituent events. Events are complex linguistically and ontologically, so disambiguating their reference is challenging. We introduce event linking, which canonically labels an event reference with the article where it was first reported. This implicitly relaxes coreference to co-reporting, and will practically enable augmenting news archives with semantic hyperlinks. We annotate and analyse a corpus of 150 documents, extracting 501 links to a news archive with reasonable inter-annotator agreement.

6 0.41745484 129 acl-2012-Learning High-Level Planning from Text

7 0.41134024 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures

8 0.40918773 33 acl-2012-Automatic Event Extraction with Structured Preference Modeling

9 0.40694222 135 acl-2012-Learning to Temporally Order Medical Events in Clinical Text

10 0.38680574 191 acl-2012-Temporally Anchored Relation Extraction

11 0.37722266 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

12 0.37632838 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

13 0.37560886 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

14 0.34244844 50 acl-2012-Collective Classification for Fine-grained Information Status

15 0.33434719 17 acl-2012-A Novel Burst-based Text Representation Model for Scalable Event Detection

16 0.32858431 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

17 0.30330548 73 acl-2012-Discriminative Learning for Joint Template Filling

18 0.29734635 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study

19 0.29719201 195 acl-2012-The Creation of a Corpus of English Metalanguage

20 0.29283977 91 acl-2012-Extracting and modeling durations for habits and events from Twitter


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.026), (26, 0.048), (28, 0.04), (30, 0.022), (37, 0.023), (39, 0.047), (49, 0.302), (59, 0.013), (74, 0.014), (82, 0.047), (84, 0.028), (85, 0.027), (90, 0.12), (92, 0.041), (94, 0.029), (99, 0.098)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.79212862 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction

Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto

Abstract: This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.

same-paper 2 0.76826847 201 acl-2012-Towards the Unsupervised Acquisition of Discourse Relations

Author: Christian Chiarcos

Abstract: This paper describes a novel approach towards the empirical approximation of discourse relations between different utterances in texts. Following the idea that every pair of events comes with preferences regarding the range and frequency of discourse relations connecting both parts, the paper investigates whether these preferences are manifested in the distribution of relation words (that serve to signal these relations). Experiments on two large-scale English web corpora show that significant correlations between pairs of adjacent events and relation words exist, that they are reproducible on different data sets, and for three relation words, that their distribution corresponds to theorybased assumptions. 1 Motivation Texts are not merely accumulations of isolated utterances, but the arrangement of utterances conveys meaning; human text understanding can thus be described as a process to recover the global structure of texts and the relations linking its different parts (Vallduv ı´ 1992; Gernsbacher et al. 2004). To capture these aspects of meaning in NLP, it is necessary to develop operationalizable theories, and, within a supervised approach, large amounts of annotated training data. To facilitate manual annotation, weakly supervised or unsupervised techniques can be applied as preprocessing step for semimanual annotation, and this is part of the motivation of the approach described here. 213 Discourse relations involve different aspects of meaning. This may include factual knowledge about the connected discourse segments (a ‘subjectmatter’ relation, e.g., if one utterance represents the cause for another, Mann and Thompson 1988, p.257), argumentative purposes (a ‘presentational’ relation, e.g., one utterance motivates the reader to accept a claim formulated in another utterance, ibid., p.257), or relations between entities mentioned in the connected discourse segments (anaphoric relations, Webber et al. 2003). Discourse relations can be indicated explicitly by optional cues, e.g., adverbials (e.g., however), conjunctions (e.g., but), or complex phrases (e.g., in contrast to what Peter said a minute ago). Here, these cues are referred to as relation words. Assuming that relation words are associated with specific discourse relations (Knott and Dale 1994; Prasad et al. 2008), the distribution of relation words found between two (types of) events can yield insights into the range of discourse relations possible at this occasion and their respective likeliness. For this purpose, this paper proposes a background knowledge base (BKB) that hosts pairs of events (here heuristically represented by verbs) along with distributional profiles for relation words. The primary data structure of the BKB is a triple where one event (type) is connected with a particular relation word to another event (type). Triples are further augmented with a frequency score (expressing the likelihood of the triple to be observed), a significance score (see below), and a correlation score (indicating whether a pair of events has a positive or negative correlation with a particular relation word). ProceedJienjgus, R ofep thueb 5lic0t hof A Knonrueaa,l M 8-e1e4ti Jnugly o f2 t0h1e2 A.s ?c so2c0ia1t2io Ans fsoorc Ciatoiomnp fuotart Cioonmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi2c 1s3–217, Triples can be easily acquired from automatically parsed corpora. While the relation word is usually part of the utterance that represents the source of the relation, determining the appropriate target (antecedent) of the relation may be difficult to achieve. As a heuristic, an adjacency preference is adopted, i.e., the target is identified with the main event of the preceding utterance.1 The BKB can be constructed from a sufficiently large corpus as follows: • • identify event types and relation words for every utterance create a candidate triple consisting of the event type of the utterance, the relation word, and the event type of the preceding utterance. add the candidate triple to the BKB, if it found in the BKB, increase its score by (or initialize it with) 1, – – • perform a pruning on all candidate triples, calcpuerlaftoer significance aonnd a lclo crarneldaitdioante scores Pruning uses statistical significance tests to evaluate whether the relative frequency of a relation word for a pair of events is significantly higher or lower than the relative frequency of the relation word in the entire corpus. Assuming that incorrect candidate triples (i.e., where the factual target of the relation was non-adjacent) are equally distributed, they should be filtered out by the significance tests. The goal of this paper is to evaluate the validity of this approach. 2 Experimental Setup By generalizing over multiple occurrences of the same events (or, more precisely, event types), one can identify preferences of event pairs for one or several relation words. These preferences capture context-invariant characteristics of pairs of events and are thus to considered to reflect a semantic predisposition for a particular discourse relation. Formally, an event is the semantic representation of the meaning conveyed in the utterance. We 1Relations between non-adjacent utterances are constrained by the structure of discourse (Webber 1991), and thus less likely than relations between adjacent utterances. 214 assume that the same event can reoccur in different contexts, we are thus studying relations between types of events. For the experiment described here, events are heuristically identified with the main predicates of a sentence, i.e., non-auxiliar, noncausative, non-modal verbal lexemes that serve as heads of main clauses. The primary data structure of the approach described here is a triple consisting of a source event, a relation word and a target (antecedent) event. These triples are harvested from large syntactically annotated corpora. For intersentential relations, the target is identified with the event of the immediately preceding main clause. These extraction preferences are heuristic approximations, and thus, an additional pruning step is necessary. For this purpose, statistical significance tests are adopted (χ2 for triples of frequent events and relation words, t-test for rare events and/or relation words) that compare the relative frequency of a rela- tion word given a pair of events with the relative frequency of the relation word in the entire corpus. All results with p ≥ .05 are excluded, i.e., only triples are preserved pfo ≥r w .0h5ic ahr teh eex xocblsuedrevde,d i positive or negative correlation between a pair of events and a relation word is not due to chance with at least 95% probability. Assuming an even distribution of incorrect target events, this should rule these out. Additionally, it also serves as a means of evaluation. Using statistical significance tests as pruning criterion entails that all triples eventually confirmed are statistically significant.2 This setup requires immense amounts of data: We are dealing with several thousand events (theoretically, the total number of verbs of a language). The chance probability for two events to occur in adjacent position is thus far below 10−6, and it decreases further if the likelihood of a relation word is taken into consideration. All things being equal, we thus need millions of sentences to create the BKB. Here, two large-scale corpora of English are employed, PukWaC and Wackypedia EN (Baroni et al. 2009). PukWaC is a 2G-token web corpus of British English crawled from the uk domain (Ferraresi et al. 2Subsequent studies may employ less rigid pruning criteria. For the purpose of the current paper, however, the statistical significance of all extracted triples serves as an criterion to evaluate methodological validity. 2008), and parsed with MaltParser (Nivre et al. 2006). It is distributed in 5 parts; Only PukWaC1 to PukWaC-4 were considered here, constituting 82.2% (72.5M sentences) of the entire corpus, PukWaC-5 is left untouched for forthcoming evaluation experiments. Wackypedia EN is a 0.8G-token dump of the English Wikipedia, annotated with the same tools. It is distributed in 4 different files; the last portion was left untouched for forthcoming evaluation experiments. The portion analyzed here comprises 33.2M sentences, 75.9% of the corpus. The extraction of events in these corpora uses simple patterns that combine dependency information and part-of-speech tags to retrieve the main verbs and store their lemmata as event types. The target (antecedent) event was identified with the last main event of the preceding sentence. As relation words, only sentence-initial children of the source event that were annotated as adverbial modifiers, verb modifiers or conjunctions were considered. 3 Evaluation To evaluate the validity of the approach, three fundamental questions need to be addressed: significance (are there significant correlations between pairs of events and relation words ?), reproducibility (can these correlations confirmed on independent data sets ?), and interpretability (can these correlations be interpreted in terms of theoretically-defined discourse relations ?). 3.1 Significance and Reproducibility Significance tests are part of the pruning stage of the algorithm. Therefore, the number of triples eventually retrieved confirms the existence of statistically significant correlations between pairs of events and relation words. The left column of Tab. 1 shows the number of triples obtained from PukWaC subcorpora of different size. For reproducibility, compare the triples identified with Wackypedia EN and PukWaC subcorpora of different size: Table 1 shows the number of triples found in both Wackypedia EN and PukWaC, and the agreement between both resources. For two triples involving the same events (event types) and the same relation word, agreement means that the relation word shows either positive or negative correlation 215 TasPbe13u7l4n2k98t. We254Mn1a c:CeAs(gurb42)et760cr8m,iop3e61r4l28np0st6uwicho21rm9W,e2673mas048p7c3okenytpdoagi21p8r,o35eE0s29Nit36nvgreipol8796r50s9%.n3509egative correlation of event pairs and relation words between Wackypedia EN and PukWaC subcorpora of different size TBH: thb ouetwnev r17 t1,o27,t0a95P41 ul2kWv6aCs,8.0 Htr5iple1v s, 45.12T35av9sg7.reH7em nv6 ts62(. %.9T2) Table 2: Agreement between but (B), however (H) and then (T) on PukWaC in both corpora, disagreement means positive correlation in one corpus and negative correlation in the other. Table 1 confirms that results obtained on one resource can be reproduced on another. This indicates that triples indeed capture context-invariant, and hence, semantic, characteristics of the relation between events. The data also indicates that reproducibility increases with the size of corpora from which a BKB is built. 3.2 Interpretability Any theory of discourse relations would predict that relation words with similar function should have similar distributions, whereas one would expect different distributions for functionally unrelated relation words. These expectations are tested here for three of the most frequent relation words found in the corpora, i.e., but, then and however. But and however can be grouped together under a generalized notion of contrast (Knott and Dale 1994; Prasad et al. 2008); then, on the other hand, indicates a tem- poral and/or causal relation. Table 2 confirms the expectation that event pairs that are correlated with but tend to show the same correlation with however, but not with then. 4 Discussion and Outlook This paper described a novel approach towards the unsupervised acquisition of discourse relations, with encouraging preliminary results: Large collections of parsed text are used to assess distributional profiles of relation words that indicate discourse relations that are possible between specific types of events; on this basis, a background knowledge base (BKB) was created that can be used to predict an appropriatediscoursemarkertoconnecttwoutterances with no overt relation word. This information can be used, for example, to facilitate the semiautomated annotation of discourse relations, by pointing out the ‘default’ relation word for a given pair of events. Similarly, Zhou et al. (2010) used a language model to predict discourse markers for implicitly realized discourse relations. As opposed to this shallow, n-gram-based approach, here, the internal structure of utterances is exploited: based on semantic considerations, syntactic patterns have been devised that extract triples of event pairs and relation words. The resulting BKB provides a distributional approximation of the discourse relations that can hold between two specific event types. Both approaches exploit complementary sources of knowledge, and may be combined with each other to achieve a more precise prediction of implicit discourse connectives. The validity of the approach was evaluated with respect to three evaluation criteria: The extracted associations between relation words and event pairs could be shown to be statistically significant, and to be reproducible on other corpora; for three highly frequent relation words, theoretical predictions about their relative distribution could be confirmed, indicating their interpretability in terms of presupposed taxonomies of discourse relations. Another prospective field of application can be seen in NLP applications, where selection preferences for relation words may serve as a cheap replacement for full-fledged discourse parsing. In the Natural Language Understanding domain, the BKB may help to disambiguate or to identify discourse relations between different events; in the context of Machine Translation, it may represent a factor guid- ing the insertion of relation words, a task that has been found to be problematic for languages that dif216 fer in their inventory and usage of discourse markers, e.g., German and English (Stede and Schmitz 2000). The approach is language-independent (except for the syntactic extraction patterns), and it does not require manually annotated data. It would thus be easy to create background knowledge bases with relation words for other languages or specific domains given a sufficient amount of textual data. – Related research includes, for example, the unsupervised recognition of causal and temporal relationships, as required, for example, for the recognition of textual entailment. Riaz and Girju (2010) exploit distributional information about pairs of utterances. Unlike approach described here, they are not restricted to adjacent utterances, and do not rely on explicit and recurrent relation words. Their approach can thus be applied to comparably small data sets. However, they are restricted to a specific type of relations whereas here the entire band- width of discourse relations that are explicitly realized in a language are covered. Prospectively, both approaches could be combined to compensate their respective weaknesses. Similar observations can be made with respect to Chambers and Jurafsky (2009) and Kasch and Oates (2010), who also study a single discourse relation (narration), and are thus more limited in scope than the approach described here. However, as their approach extends beyond pairs of events to complex event chains, it seems that both approaches provide complementary types of information and their results could also be combined in a fruitful way to achieve a more detailed assessment of discourse relations. The goal of this paper was to evaluate the methdological validity of the approach. It thus represents the basis for further experiments, e.g., with respect to the enrichment the BKB with information provided by Riaz and Girju (2010), Chambers and Jurafsky (2009) and Kasch and Oates (2010). Other directions of subsequent research may include address more elaborate models of events, and the investigation of the relationship between relation words and taxonomies of discourse relations. Acknowledgments This work was supported by a fellowship within the Postdoc program of the German Academic Exchange Service (DAAD). Initial experiments were conducted at the Collaborative Research Center (SFB) 632 “Information Structure” at the University of Potsdam, Germany. Iwould also like to thank three anonymous reviewers for valuable comments and feedback, as well as Manfred Stede and Ed Hovy whose work on discourse relations on the one hand and proposition stores on the other hand have been the main inspiration for this paper. References M. Baroni, S. Bernardini, A. Ferraresi, and E. Zanchetta. The wacky wide web: a collection of very large linguistically processed webcrawled corpora. Language Resources and Evaluation, 43(3):209–226, 2009. N. Chambers and D. Jurafsky. Unsupervised learning of narrative schemas and their participants. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2-Volume 2, pages 602–610. Association for Computational Linguistics, 2009. A. Ferraresi, E. Zanchetta, M. Baroni, and S. Bernardini. Introducing and evaluating ukwac, a very large web-derived corpus of english. In Proceedings of the 4th Web as Corpus Workshop (WAC-4) Can we beat Google, pages 47–54, 2008. Morton Ann Gernsbacher, Rachel R. W. Robertson, Paola Palladino, and Necia K. Werner. Managing mental representations during narrative comprehension. Discourse Processes, 37(2): 145–164, 2004. N. Kasch and T. Oates. Mining script-like structures from the web. In Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 34–42. Association for Computational Linguistics, 2010. A. Knott and R. Dale. Using linguistic phenomena to motivate a set ofcoherence relations. Discourse processes, 18(1):35–62, 1994. 217 J. van Kuppevelt and R. Smith, editors. Current Directions in Discourse andDialogue. Kluwer, Dordrecht, 2003. William C. Mann and Sandra A. Thompson. Rhetorical Structure Theory: Toward a functional theory of text organization. Text, 8(3):243–281, 1988. J. Nivre, J. Hall, and J. Nilsson. Maltparser: A data-driven parser-generator for dependency parsing. In Proc. of LREC, pages 2216–2219. Citeseer, 2006. R. Prasad, N. Dinesh, A. Lee, E. Miltsakaki, L. Robaldo, A. Joshi, and B. Webber. The penn discourse treebank 2.0. In Proc. 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 2008. M. Riaz and R. Girju. Another look at causality: Discovering scenario-specific contingency relationships with no supervision. In Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on, pages 361–368. IEEE, 2010. M. Stede and B. Schmitz. Discourse particles and discourse functions. Machine translation, 15(1): 125–147, 2000. Enric Vallduv ı´. The Informational Component. Garland, New York, 1992. Bonnie L. Webber. Structure and ostension in the interpretation of discourse deixis. Natural Language and Cognitive Processes, 2(6): 107–135, 1991. Bonnie L. Webber, Matthew Stone, Aravind K. Joshi, and Alistair Knott. Anaphora and discourse structure. Computational Linguistics, 4(29):545– 587, 2003. Z.-M. Zhou, Y. Xu, Z.-Y. Niu, M. Lan, J. Su, and C.L. Tan. Predicting discourse connectives for implicit discourse relation recognition. In COLING 2010, pages 1507–15 14, Beijing, China, August 2010.

3 0.72829795 36 acl-2012-BIUTEE: A Modular Open-Source System for Recognizing Textual Entailment

Author: Asher Stern ; Ido Dagan

Abstract: This paper introduces BIUTEE1 , an opensource system for recognizing textual entailment. Its main advantages are its ability to utilize various types of knowledge resources, and its extensibility by which new knowledge resources and inference components can be easily integrated. These abilities make BIUTEE an appealing RTE system for two research communities: (1) researchers of end applications, that can benefit from generic textual inference, and (2) RTE researchers, who can integrate their novel algorithms and knowledge resources into our system, saving the time and effort of developing a complete RTE system from scratch. Notable assistance for these re- searchers is provided by a visual tracing tool, by which researchers can refine and “debug” their knowledge resources and inference components.

4 0.53200358 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

Author: Ce Zhang ; Feng Niu ; Christopher Re ; Jude Shavlik

Abstract: Classically, training relation extractors relies on high-quality, manually annotated training data, which can be expensive to obtain. To mitigate this cost, NLU researchers have considered two newly available sources of less expensive (but potentially lower quality) labeled data from distant supervision and crowd sourcing. There is, however, no study comparing the relative impact of these two sources on the precision and recall of post-learning answers. To fill this gap, we empirically study how state-of-the-art techniques are affected by scaling these two sources. We use corpus sizes of up to 100 million documents and tens of thousands of crowd-source labeled examples. Our experiments show that increasing the corpus size for distant supervision has a statistically significant, positive impact on quality (F1 score). In contrast, human feedback has a positive and statistically significant, but lower, impact on precision and recall.

5 0.51744598 191 acl-2012-Temporally Anchored Relation Extraction

Author: Guillermo Garrido ; Anselmo Penas ; Bernardo Cabaleiro ; Alvaro Rodrigo

Abstract: Although much work on relation extraction has aimed at obtaining static facts, many of the target relations are actually fluents, as their validity is naturally anchored to a certain time period. This paper proposes a methodological approach to temporally anchored relation extraction. Our proposal performs distant supervised learning to extract a set of relations from a natural language corpus, and anchors each of them to an interval of temporal validity, aggregating evidence from documents supporting the relation. We use a rich graphbased document-level representation to generate novel features for this task. Results show that our implementation for temporal anchoring is able to achieve a 69% of the upper bound performance imposed by the relation extraction step. Compared to the state of the art, the overall system achieves the highest precision reported.

6 0.50548011 187 acl-2012-Subgroup Detection in Ideological Discussions

7 0.50512201 159 acl-2012-Pattern Learning for Relation Extraction with a Hierarchical Topic Model

8 0.50177866 62 acl-2012-Cross-Lingual Mixture Model for Sentiment Classification

9 0.49924716 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

10 0.49877271 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese

11 0.49813151 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

12 0.49791893 102 acl-2012-Genre Independent Subgroup Detection in Online Discussion Threads: A Study of Implicit Attitude using Textual Latent Semantics

13 0.49637046 12 acl-2012-A Graph-based Cross-lingual Projection Approach for Weakly Supervised Relation Extraction

14 0.49112383 99 acl-2012-Finding Salient Dates for Building Thematic Timelines

15 0.48983297 156 acl-2012-Online Plagiarized Detection Through Exploiting Lexical, Syntax, and Semantic Information

16 0.48930928 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation

17 0.48792982 21 acl-2012-A System for Real-time Twitter Sentiment Analysis of 2012 U.S. Presidential Election Cycle

18 0.4869754 73 acl-2012-Discriminative Learning for Joint Template Filling

19 0.48510399 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling

20 0.48411429 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence