emnlp emnlp2013 emnlp2013-152 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gary Patterson ; Andrew Kehler
Abstract: We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. The model is trained on a set of high frequency relations extracted from the Penn Discourse Treebank and achieves an accuracy of 86.6%. Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. We also present results of an experiment that provides insight into the nature and difficulty of the task.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. [sent-2, score-1.418]
2 Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. [sent-5, score-0.449]
3 Consider, for example, the EXPLANATION coherence relation that holds in (1), in which the second clause provides a cause or reason for the eventuality described in the first: (1) a. [sent-13, score-0.524]
4 As example (1) shows, coherence relations can be marked explicitly—by using lexical connectives such as coordinating or subordinating conjunctions (e. [sent-18, score-0.432]
5 Either way, establishing the relation itself requires the reader to go through a complex inferential process necessitating that a variety of assumptions be made, typically supported by context and/or world knowledge, that are not explicitly asserted by the actual linguistic material. [sent-21, score-0.421]
6 Importantly, the role of the connective in (1a) is therefore not to establish that an EXPLANATION relation holds. [sent-23, score-0.931]
7 The fact that both (1a) and (1b) are felicitous may lead us to believe that the choice to insert a connective between clauses is simply optional. [sent-25, score-0.841]
8 Sometimes the use of a connective is required, since omitting it would likely result in incorrect inferences being drawn by the addressee. [sent-29, score-0.642]
9 For example, the use of when in (2a) implies a backward temporal ordering of events, which is reversed if the connective is left out, as in (2b). [sent-30, score-0.642]
10 On the other hand, a connective can seem unnecessary if the relation between the two clauses is sufficiently implied by other cues in the text. [sent-36, score-1.113]
11 The foregoing examples suggest that the appropriateness of including an explicit connective is inherently gradient, and is in fact correlated with ease of inference: the more difficult recovering the correct relation would be without a connective, the more necessary it is to include one. [sent-44, score-1.024]
12 However, it is also possible that the decision to include a connective depends in part on stylistic and other types of factors as well, such that there might be predictive information in the kinds of shallow linguistic and textual features that systems do have access to. [sent-46, score-0.689]
13 This capability would 915 be useful to generation systems as a post-cursor to discourse-level message planning and sentence realization processes, as well as summarization systems that take existing sentences and have to reconsider connective placement upon reassembling them. [sent-48, score-0.642]
14 , 2010) that focuses on building supervised models for classifying implicit relations using a variety of contextual features, such as the polarity of clauses, the semantic class and tense/aspect of verbs, and information from syntactic parses. [sent-53, score-0.344]
15 With respect to explicit relations, Elhadad and McKe- own (1990) sketch a procedure to select an appropriate connective to link two propositions as part of a larger text generation system, using linguistic features derived from the sentences. [sent-54, score-0.735]
16 The procedure selects the best connective from a given set of candidates, but does not allow for the option of leaving the relation implicit. [sent-55, score-0.931]
17 More recently, Asr and Demberg (2012a) look at both explicit and implicit relations, and make the observation that certain relation types are more likely to be realized explicitly than others. [sent-56, score-0.71]
18 Relatedly, Asr and Demberg (2012b) discuss which connectives are the strongest predictors of which relation types. [sent-57, score-0.467]
19 (2008)), a large-scale corpus of annotated discourse coherence relations covering the one-million-word Wall Street Journal corpus of the Penn Treebank (Marcus et al. [sent-60, score-0.449]
20 Crucially for our purposes, for the implicit relations the corpus indicates the most suitable connective if the relation were instead signaled explicitly. [sent-64, score-1.43]
21 For example, the annotators decided that the best connective to signal the REASON relation in (4) would be because, rather than other plausible candidates, such as as or since. [sent-65, score-0.969]
22 (WSJ0037) In total, there are 18,459 explicit and 16,053 implicit relations annotated in the PDTB. [sent-69, score-0.437]
23 First, whereas explicit relations in the PDTB can hold between spans of text that ei- ther are or are not adjacent, we excluded the nonadjacent cases. [sent-71, score-0.369]
24 This was done to ensure consistency in discourse structure between the relations considered in the model, since only the implicit relations between adjacent clauses were annotated. [sent-72, score-0.859]
25 Two exceptions were made for REASON and RESULT relations, which do appear at the lowest 1The next most common relation type, CONDITION, was excluded because it is always marked explicitly in the corpus. [sent-80, score-0.391]
26 Table 1: Distribution of Training Set As Table 1 shows, the preference for an overt connective varies significantly according to the type of relation. [sent-89, score-0.721]
27 The ASYNCHRONOUS, CONJUNCTION, CONTRAST and SYNCHRONOUS relations are realized explicitly the majority of the time, whereas INSTANTIATION, REASON, RESTATEMENT and RESULT relations are more often left implicit. [sent-90, score-0.577]
28 We can also see that some relation types (such as RESTATEMENT, INSTANTIATION and SYNCHRONOUS) exhibit a strong preference to be realized in a particular form, whereas other types show more variability in whether they are realized explicitly or implicitly. [sent-91, score-0.638]
29 A naive model that uses the semantic type of the coherence relation as the sole predictive feature makes a binary classification based simply on the majority category for that relation type. [sent-93, score-0.869]
30 Relation-level features In addition to the semantic type of the relation, include as a feature the connective used to signal relation in the text (or, for the implicit relations, connective indicated by the annotators as most we the the ap- propriate). [sent-100, score-1.867]
31 This feature (Connect) is included based upon the observation that connectives vary as to their rates of being realized explicitly—even for connectives that signal relations with the same semantic sense. [sent-101, score-0.566]
32 Consequently, given a relation of a particular semantic type, an indication of the best fitting connective may be a consistent predictor of whether or not this relation is realized explicitly. [sent-102, score-1.344]
33 As mentioned above, the PDTB is annotated to describe the attribution of the propositions expressed within a relation to individuals or entities in the text. [sent-104, score-0.344]
34 For example, in the relation shown in (5), the first clause contains a direct quotation, clearly attributing the proposition expressed to the individual Rep. [sent-105, score-0.399]
35 Stark’ may serve as a distraction, with the re917 sult that the intended coherence relation is harder to infer without a connective. [sent-113, score-0.46]
36 Accordingly, we may suspect that there is a greater prevalence of implicit relations in these cases, since the reader is assumed to be habituated to the way in which the information in this type of article is presented. [sent-117, score-0.392]
37 Consequently, for the domain at hand we include a binary feature (Financial) indicating whether the relation pertains to financial information. [sent-118, score-0.405]
38 This feature takes the value 1 if the textual spans of both arguments in the relation contain percentage amounts or dollar figures. [sent-119, score-0.421]
39 The arguments were identified by the annotators according to a principle of minimality, whereby the annotations indicate the shortest text spans necessary for the appropriate coherence relation to be interpreted. [sent-121, score-0.548]
40 Our observation of the data indicates that relations which have supplementary material annotated alongside one or both of their arguments are more often than not re- alized explicitly with connectives. [sent-124, score-0.379]
41 As a result, we include binary features (Supp1, Supp2) indicating whether the first and second arguments of the relation include such supplementary information. [sent-125, score-0.493]
42 It might be thought that the greater the syntactic complexity of an argument, the more likely it is that the relation containing it is marked explicitly, so as to give the reader more help in drawing the correct intended inference between the arguments. [sent-128, score-0.333]
43 Given this intuition, we may expect that arguments with higher density of information are correlated with the increased use of connectives as a means of facilitating the inference of the relation type and thereby easing the overall processing burden. [sent-132, score-0.571]
44 Finally, the accessibility of the subject of the second argument in a relation may play a role in determining whether the relation is explicitly marked. [sent-136, score-0.83]
45 Specifically, informal observation of the data suggests that there is a tendency for the second ar- gument of an implicit relation to begin with a longer, contentful noun phrase, rather than a pronoun. [sent-137, score-0.46]
46 Discourse-level features The final class of features takes account of the way 918 in which a relation fits into the broader discourse structure in the text. [sent-139, score-0.438]
47 In their work on implicit relation classification, Pitler et al. [sent-140, score-0.46]
48 These results suggest that the semantic type and the presence of a connective in one relation may be predictive of whether or not the following relation in the text is marked with a connective. [sent-142, score-1.379]
49 Consequently, we include features indicating the semantic type of the relation occurring immediately prior in the text (PrevSemType), and whether this relation was marked implicitly or explicitly (PrevForm). [sent-143, score-0.756]
50 The other discourse-level features take account of the dependencies between the relation in question and its neighboring relations in the text. [sent-144, score-0.462]
51 As part of a supervised learning model developed to classify the semantic class of implicit relations in the PDTB, Lin et al. [sent-145, score-0.344]
52 The first type of dependency between adjacent relations is one where the second argument of one relation is also the first argument of the following relation, as in Figure 1. [sent-148, score-0.862]
53 Accordingly, we include two binary features indicating whether an argument is shared with the preceding relation (Arg1isPrevArg2) or the following relation (Arg2isNextArg1) in the corpus. [sent-149, score-0.795]
54 Figure 1: Shared argument The other main type of discourse dependency, a ‘fully embedded’ dependency, is one where an entire relation (including both of its arguments) is completely embedded within one argument of an adjacent relation in the text, as in Figure 2. [sent-150, score-1.203]
55 To capture this type of dependency structure, we include two binary features (EmbedNext, EmbedPrev) indicating whether the current relation is embedded within either one of its adjacent relations. [sent-151, score-0.523]
56 We also include two binary features (Arg1Embed, Arg2Embed) to indicate whether either argument of the current relation completely contains an embedded relation. [sent-152, score-0.551]
57 Figure 2: Fully embedded argument The two relations in (6) exemplify a typical instantiation of this embedded dependency structure. [sent-153, score-0.512]
58 (WSJ0261) In this example, there is an implicit REASON relation holding between the two complete sentences, and an explicit SYNCHRONOUS relation signaled by the connective when holding between the two clauses of the second sentence. [sent-156, score-1.878]
59 Since the REASON relation fully embeds another relation within its second argument, the feature Arg2Embed for this relation takes the value 1. [sent-157, score-0.898]
60 For the SYNCHRONOUS relation, the feature EmbedPrev takes the value 1 since the entire relation is fully contained within the second argument of the preceding relation in the text. [sent-158, score-0.762]
61 3 Table 2 shows the model accuracy for each relation type, together with the baseline performance based on the majority category for that type. [sent-164, score-0.36]
62 919 Table 2: Classification Accuracy by Relation Type The model achieved an improvement in accuracy across all relation types but one: RESTATEMENT relations, for which the baseline accuracy was already close to 100%. [sent-168, score-0.357]
63 Across all relation types, we found that the features relating to the discourse dependencies between a relation and its neighbors were the strongest and most consistent predictors of whether that relation is explicit or implicit. [sent-181, score-1.188]
64 A relation that is fully embedded within a single argument of an adjacent relation in the text (indicated by the features EmbedPrev and EmbedNext) has a much higher likelihood of being signaled explicitly. [sent-182, score-1.039]
65 Conversely, a relation that fully contains another relation within one of its arguments (indicated by Arg1Embed and Arg2Embed) has a significantly higher likelihood of being implicit. [sent-183, score-0.711]
66 The result is consistent with the embedded discourse dependency shown in (6), in which the implicit REASON relation fully contains an explicit SYNCHRONOUS relation within its second argument. [sent-184, score-1.098]
67 The model also found that the features which indicate whether a relation has shared arguments with either the preceding or following relations in the text (Arg1isPrevArg2, Arg2isNextArg1) are both predictors of an implicit outcome. [sent-185, score-0.814]
68 In other words, if a clause in the text serves as the argument for two adjacent relations, then both ofthese relations are more likely to be realized implicitly. [sent-186, score-0.534]
69 The next most predictive feature was the connective used to signal the relation (Connect). [sent-187, score-1.016]
70 The features indexing syntactic complexity (NPSbj1 and NPSbj2) were found to be marginally predictive of an explicit outcome for most relation types, but the overall effect in the model was relatively small—resulting in only a 0. [sent-193, score-0.473]
71 Somewhat unexpectedly, the factors indicating the semantic type 920 of the previous relation in the text (PrevSemType) and whether or not this relation was explicitly signaled by a connective (PrevForm) were found not to be significant predictors. [sent-195, score-1.553]
72 (2008) in that certain bigrams of coherence relation types are significantly more prevalent than others. [sent-197, score-0.416]
73 However, the differences in the frequencies were evidently not sufficiently correlated with the explicit/implicit distinction as to make the type or form of the previous relation a significant feature in the model. [sent-198, score-0.337]
74 The majority of these errors were cases where the model predicted that the relation would be explicit—the most likely outcome for a CONTRAST relation—whereas in the corpus the intended relation was signaled by linguistic cues other than an overt connective. [sent-202, score-0.931]
75 (WSJ2372) Other ways that the contrast relation is signaled implicitly include contrasting temporal modifiers (It wasn ’t so long ago X. [sent-206, score-0.444]
76 , 2009) has sought to make use of such cues to identify and classify implicit relations in the text. [sent-216, score-0.379]
77 The results of this brief error analysis suggest that such indirect cues could also be useful factors in determining whether to choose to use a connective for a given relation type when generating text. [sent-217, score-1.047]
78 4 Judgment Study The system described in the last section outperformed a baseline majority-category classifier on the task of deciding whether a relation should be made explicit or left implicit. [sent-218, score-0.415]
79 First, the system was able to make this improvement using relatively shallow features extracted from the text, without access to the richer types of contextual information and world knowledge required for establishing coherence relations during actual discourse comprehension. [sent-220, score-0.481]
80 Second, the data suggest that the appropriateness of in- cluding a connective is not as cut-and-dried as a binary classification task may suggest, but is instead gradient, with many cases for which the inclusion of a connective appears to be optional. [sent-221, score-1.354]
81 In order to shed light on this question, we carried out an experiment to see how consistently humans choose to use lexical connectives to signal intended coherence relations between clauses. [sent-226, score-0.514]
82 1 Methodology We selected a balanced sample of 100 clause-pair tokens from the test set, reflecting the distribution of the different major relation types (six relations were represented in the sample). [sent-228, score-0.497]
83 The experimental stimulus for each item consisted of two versions of the same clause pair, one including a connective between the clauses, and the other without. [sent-230, score-0.751]
84 For relations that were realized explicitly in the corpus, as in (8a), the alternative implicit stimulus omitted the connective and showed the second argument 921 as a separate sentence, as in (8b). [sent-231, score-1.334]
85 For the implicit relations, the alternative explicit stimulus for the experiment used the connective annotated in the PDTB as the one being most appropriate. [sent-241, score-0.944]
86 The relative ordering of the presentation of the explicit and implicit forms was randomized, without regard to the actual corpus outcome for that stimulus. [sent-243, score-0.34]
87 5 The distribution of correctly-judged items across relation types is shown in Table 4. [sent-250, score-0.32]
88 The lowest scoring relation type was CONTRAST, for which 9 of the 10 explicit tokens were judged correctly but only 4 out of the 11 implicit tokens were correctly identified. [sent-257, score-0.671]
89 In two-thirds (21) of these cases, the judges indicated a preference for a connective when the relation in the corpus was implicit. [sent-260, score-1.145]
90 Without such pressures to edit the copy down to a minimal form, the judges may have preferred to see the relations signaled explicitly in cases in which either decision would result in a felicitous passage. [sent-264, score-0.63]
91 In the remaining 11 cases, for which the relations in the corpus were explicitly signaled with a connective, the judges on average indicated a preference to leave the relation implicit. [sent-265, score-0.897]
92 We found that all 7 of the CONTRAST mismatches were instances where the second argument of the relation in the corpus was a sentence beginning with the coordinating conjunction but, as in (9). [sent-268, score-0.473]
93 (WSJ2300) relations had a sentential second argument beginning with the conjunction and. [sent-271, score-0.357]
94 As we have suggested, however, this could be the result of our experimental judges having different preferences than the writers and editors at the Wall Street Journal for cases in which connective placement is truly optional. [sent-275, score-0.826]
95 If inaccurate predictions are associated with optionality of connective use, we might expect that both human judges and the classification model would be less certain about their categorizations of these examples than for the cases that were correctly classified. [sent-277, score-0.927]
96 Taken together, these results are consistent with the idea that, at least for a significant portion of the data, the incorrect judgments made by both the judges and the model may have occurred on passages for which either including or omitting the connective would have been acceptable. [sent-289, score-0.863]
97 5 Conclusion We have presented a model that predicts whether the coherence relation holding between two clauses is marked explicitly with a lexical connective or left implicit. [sent-290, score-1.35]
98 The variability in the judgments of native speakers when presented with these data suggests that the use of a connective is in many cases simply optional; in such cases the decision may reflect lower-level stylistic choices on the part of the author. [sent-292, score-0.762]
99 Recognizing implicit discourse relations in the Penn Discourse Treebank. [sent-321, score-0.493]
100 Automatic sense prediction for implicit discourse relations in text. [sent-337, score-0.493]
wordName wordTfidf (topN-words)
[('connective', 0.642), ('relation', 0.289), ('relations', 0.173), ('implicit', 0.171), ('signaled', 0.155), ('argument', 0.153), ('discourse', 0.149), ('clauses', 0.147), ('judges', 0.146), ('connectives', 0.132), ('coherence', 0.127), ('pdtb', 0.114), ('arguments', 0.102), ('explicit', 0.093), ('realized', 0.091), ('embedded', 0.076), ('clause', 0.071), ('optionality', 0.069), ('explicitly', 0.066), ('vase', 0.06), ('pitler', 0.057), ('attribution', 0.055), ('father', 0.055), ('financial', 0.052), ('embedprev', 0.052), ('felicitous', 0.052), ('type', 0.048), ('predictive', 0.047), ('adjacent', 0.046), ('holding', 0.046), ('predictors', 0.046), ('asr', 0.046), ('restatement', 0.045), ('outcome', 0.044), ('intended', 0.044), ('judgments', 0.044), ('responses', 0.042), ('wall', 0.042), ('synchronous', 0.041), ('penn', 0.041), ('consequently', 0.041), ('proposition', 0.039), ('senses', 0.038), ('sporleder', 0.038), ('lascarides', 0.038), ('stimulus', 0.038), ('cases', 0.038), ('signal', 0.038), ('supplementary', 0.038), ('majority', 0.037), ('whereas', 0.037), ('reason', 0.037), ('incorrectly', 0.037), ('indicated', 0.037), ('excluded', 0.036), ('cues', 0.035), ('tokens', 0.035), ('acy', 0.034), ('asserted', 0.034), ('attmismatch', 0.034), ('bolar', 0.034), ('complied', 0.034), ('delight', 0.034), ('demberg', 0.034), ('detriment', 0.034), ('embednext', 0.034), ('fatemeh', 0.034), ('fda', 0.034), ('fragile', 0.034), ('hasn', 0.034), ('maggie', 0.034), ('nesbit', 0.034), ('omission', 0.034), ('pharmaceutical', 0.034), ('prevform', 0.034), ('prevsemtype', 0.034), ('producer', 0.034), ('retail', 0.034), ('torabi', 0.034), ('tract', 0.034), ('urinary', 0.034), ('instantiation', 0.034), ('accuracy', 0.034), ('genre', 0.033), ('prasad', 0.033), ('whether', 0.033), ('classification', 0.032), ('actual', 0.032), ('conjunction', 0.031), ('outcomes', 0.031), ('passages', 0.031), ('street', 0.031), ('fully', 0.031), ('presence', 0.031), ('preference', 0.031), ('indicating', 0.031), ('items', 0.031), ('spans', 0.03), ('comprised', 0.03), ('response', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 152 emnlp-2013-Predicting the Presence of Discourse Connectives
Author: Gary Patterson ; Andrew Kehler
Abstract: We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. The model is trained on a set of high frequency relations extracted from the Penn Discourse Treebank and achieves an accuracy of 86.6%. Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. We also present results of an experiment that provides insight into the nature and difficulty of the task.
2 0.17078359 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan
Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.
3 0.1437086 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic
Author: Qi Zhang ; Jin Qian ; Huan Chen ; Jihua Kang ; Xuanjing Huang
Abstract: Explanatory sentences are employed to clarify reasons, details, facts, and so on. High quality online product reviews usually include not only positive or negative opinions, but also a variety of explanations of why these opinions were given. These explanations can help readers get easily comprehensible information of the discussed products and aspects. Moreover, explanatory relations can also benefit sentiment analysis applications. In this work, we focus on the task of identifying subjective text segments and extracting their corresponding explanations from product reviews in discourse level. We propose a novel joint extraction method using firstorder logic to model rich linguistic features and long distance constraints. Experimental results demonstrate the effectiveness of the proposed method.
4 0.12718371 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction
Author: Filipe Mesquita ; Jordan Schmidek ; Denilson Barbosa
Abstract: A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets, and sheds some light on the issue. The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.
5 0.11610303 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
Author: Oier Lopez de Lacalle ; Mirella Lapata
Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.
6 0.10435805 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
7 0.094204247 160 emnlp-2013-Relational Inference for Wikification
8 0.091550797 41 emnlp-2013-Building Event Threads out of Multiple News Articles
9 0.090125494 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
10 0.083554558 192 emnlp-2013-Unsupervised Induction of Contingent Event Pairs from Film Scenes
11 0.081658944 118 emnlp-2013-Learning Biological Processes with Global Constraints
12 0.070122227 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision
13 0.069756046 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?
14 0.060504138 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations
15 0.059570875 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
16 0.059038673 121 emnlp-2013-Learning Topics and Positions from Debatepedia
17 0.058156494 90 emnlp-2013-Generating Coherent Event Schemas at Scale
18 0.055251356 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
19 0.055216551 174 emnlp-2013-Single-Document Summarization as a Tree Knapsack Problem
20 0.051700979 35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations
topicId topicWeight
[(0, -0.19), (1, 0.115), (2, -0.021), (3, 0.112), (4, -0.017), (5, -0.002), (6, -0.113), (7, 0.003), (8, -0.003), (9, 0.046), (10, 0.066), (11, -0.029), (12, -0.041), (13, 0.1), (14, -0.014), (15, 0.172), (16, -0.078), (17, 0.032), (18, 0.08), (19, 0.214), (20, 0.034), (21, 0.182), (22, -0.15), (23, -0.091), (24, -0.068), (25, 0.017), (26, -0.014), (27, -0.015), (28, -0.02), (29, 0.014), (30, -0.121), (31, -0.051), (32, 0.064), (33, 0.047), (34, -0.024), (35, -0.097), (36, -0.175), (37, -0.061), (38, 0.057), (39, 0.048), (40, 0.037), (41, 0.013), (42, 0.026), (43, -0.002), (44, 0.019), (45, 0.082), (46, 0.061), (47, 0.033), (48, -0.015), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.96063089 152 emnlp-2013-Predicting the Presence of Discourse Connectives
Author: Gary Patterson ; Andrew Kehler
Abstract: We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. The model is trained on a set of high frequency relations extracted from the Penn Discourse Treebank and achieves an accuracy of 86.6%. Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. We also present results of an experiment that provides insight into the nature and difficulty of the task.
2 0.68785053 68 emnlp-2013-Effectiveness and Efficiency of Open Relation Extraction
Author: Filipe Mesquita ; Jordan Schmidek ; Denilson Barbosa
Abstract: A large number of Open Relation Extraction approaches have been proposed recently, covering a wide range of NLP machinery, from “shallow” (e.g., part-of-speech tagging) to “deep” (e.g., semantic role labeling–SRL). A natural question then is what is the tradeoff between NLP depth (and associated computational cost) versus effectiveness. This paper presents a fair and objective experimental comparison of 8 state-of-the-art approaches over 5 different datasets, and sheds some light on the issue. The paper also describes a novel method, EXEMPLAR, which adapts ideas from SRL to less costly NLP machinery, resulting in substantial gains both in efficiency and effectiveness, over binary and n-ary relation extraction tasks.
3 0.67993408 63 emnlp-2013-Discourse Level Explanatory Relation Extraction from Product Reviews Using First-Order Logic
Author: Qi Zhang ; Jin Qian ; Huan Chen ; Jihua Kang ; Xuanjing Huang
Abstract: Explanatory sentences are employed to clarify reasons, details, facts, and so on. High quality online product reviews usually include not only positive or negative opinions, but also a variety of explanations of why these opinions were given. These explanations can help readers get easily comprehensible information of the discussed products and aspects. Moreover, explanatory relations can also benefit sentiment analysis applications. In this work, we focus on the task of identifying subjective text segments and extracting their corresponding explanations from product reviews in discourse level. We propose a novel joint extraction method using firstorder logic to model rich linguistic features and long distance constraints. Experimental results demonstrate the effectiveness of the proposed method.
4 0.66623688 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan
Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.
5 0.59159094 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
Author: Oier Lopez de Lacalle ; Mirella Lapata
Abstract: In this paper we present an unsupervised approach to relational information extraction. Our model partitions tuples representing an observed syntactic relationship between two named entities (e.g., “X was born in Y” and “X is from Y”) into clusters corresponding to underlying semantic relation types (e.g., BornIn, Located). Our approach incorporates general domain knowledge which we encode as First Order Logic rules and automatically combine with a topic model developed specifically for the relation extraction task. Evaluation results on the ACE 2007 English Relation Detection and Categorization (RDC) task show that our model outperforms competitive unsupervised approaches by a wide margin and is able to produce clusters shaped by both the data and the rules.
6 0.54549736 106 emnlp-2013-Inducing Document Plans for Concept-to-Text Generation
7 0.48937893 41 emnlp-2013-Building Event Threads out of Multiple News Articles
8 0.48556602 93 emnlp-2013-Harvesting Parallel News Streams to Generate Paraphrases of Event Relations
9 0.47000542 35 emnlp-2013-Automatically Detecting and Attributing Indirect Quotations
10 0.46087429 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision
11 0.43404704 160 emnlp-2013-Relational Inference for Wikification
12 0.40717229 174 emnlp-2013-Single-Document Summarization as a Tree Knapsack Problem
13 0.38592371 108 emnlp-2013-Interpreting Anaphoric Shell Nouns using Antecedents of Cataphoric Shell Nouns as Training Data
14 0.37574282 12 emnlp-2013-A Semantically Enhanced Approach to Determine Textual Similarity
15 0.35592481 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM
16 0.33509588 118 emnlp-2013-Learning Biological Processes with Global Constraints
17 0.33336222 137 emnlp-2013-Multi-Relational Latent Semantic Analysis
18 0.32798934 203 emnlp-2013-With Blinkers on: Robust Prediction of Eye Movements across Readers
19 0.32575789 62 emnlp-2013-Detection of Product Comparisons - How Far Does an Out-of-the-Box Semantic Role Labeling System Take You?
20 0.31331176 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
topicId topicWeight
[(3, 0.05), (18, 0.046), (22, 0.073), (23, 0.221), (30, 0.072), (45, 0.01), (47, 0.011), (50, 0.014), (51, 0.205), (66, 0.047), (71, 0.052), (75, 0.047), (77, 0.025), (90, 0.02), (96, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.83627301 152 emnlp-2013-Predicting the Presence of Discourse Connectives
Author: Gary Patterson ; Andrew Kehler
Abstract: We present a classification model that predicts the presence or omission of a lexical connective between two clauses, based upon linguistic features of the clauses and the type of discourse relation holding between them. The model is trained on a set of high frequency relations extracted from the Penn Discourse Treebank and achieves an accuracy of 86.6%. Analysis of the results reveals that the most informative features relate to the discourse dependencies between sequences of coherence relations in the text. We also present results of an experiment that provides insight into the nature and difficulty of the task.
Author: Andrew J. Anderson ; Elia Bruni ; Ulisse Bordignon ; Massimo Poesio ; Marco Baroni
Abstract: Traditional distributional semantic models extract word meaning representations from cooccurrence patterns of words in text corpora. Recently, the distributional approach has been extended to models that record the cooccurrence of words with visual features in image collections. These image-based models should be complementary to text-based ones, providing a more cognitively plausible view of meaning grounded in visual perception. In this study, we test whether image-based models capture the semantic patterns that emerge from fMRI recordings of the neural signal. Our results indicate that, indeed, there is a significant correlation between image-based and brain-based semantic similarities, and that image-based models complement text-based ones, so that the best correlations are achieved when the two modalities are combined. Despite some unsatisfactory, but explained out- comes (in particular, failure to detect differential association of models with brain areas), the results show, on the one hand, that imagebased distributional semantic models can be a precious new tool to explore semantic representation in the brain, and, on the other, that neural data can be used as the ultimate test set to validate artificial semantic models in terms of their cognitive plausibility.
Author: Shize Xu ; Shanshan Wang ; Yan Zhang
Abstract: The rapid development of Web2.0 leads to significant information redundancy. Especially for a complex news event, it is difficult to understand its general idea within a single coherent picture. A complex event often contains branches, intertwining narratives and side news which are all called storylines. In this paper, we propose a novel solution to tackle the challenging problem of storylines extraction and reconstruction. Specifically, we first investigate two requisite properties of an ideal storyline. Then a unified algorithm is devised to extract all effective storylines by optimizing these properties at the same time. Finally, we reconstruct all extracted lines and generate the high-quality story map. Experiments on real-world datasets show that our method is quite efficient and highly competitive, which can bring about quicker, clearer and deeper comprehension to readers.
4 0.73022306 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
Author: Zhiyuan Chen ; Arjun Mukherjee ; Bing Liu ; Meichun Hsu ; Malu Castellanos ; Riddhiman Ghosh
Abstract: Aspect extraction is one of the key tasks in sentiment analysis. In recent years, statistical models have been used for the task. However, such models without any domain knowledge often produce aspects that are not interpretable in applications. To tackle the issue, some knowledge-based topic models have been proposed, which allow the user to input some prior domain knowledge to generate coherent aspects. However, existing knowledge-based topic models have several major shortcomings, e.g., little work has been done to incorporate the cannot-link type of knowledge or to automatically adjust the number of topics based on domain knowledge. This paper proposes a more advanced topic model, called MC-LDA (LDA with m-set and c-set), to address these problems, which is based on an Extended generalized Pólya urn (E-GPU) model (which is also proposed in this paper). Experiments on real-life product reviews from a variety of domains show that MCLDA outperforms the existing state-of-the-art models markedly.
5 0.72878218 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
Author: Zhongqing Wang ; Shoushan LI ; Fang Kong ; Guodong Zhou
Abstract: Personal profile information on social media like LinkedIn.com and Facebook.com is at the core of many interesting applications, such as talent recommendation and contextual advertising. However, personal profiles usually lack organization confronted with the large amount of available information. Therefore, it is always a challenge for people to find desired information from them. In this paper, we address the task of personal profile summarization by leveraging both personal profile textual information and social networks. Here, using social networks is motivated by the intuition that, people with similar academic, business or social connections (e.g. co-major, co-university, and cocorporation) tend to have similar experience and summaries. To achieve the learning process, we propose a collective factor graph (CoFG) model to incorporate all these resources of knowledge to summarize personal profiles with local textual attribute functions and social connection factors. Extensive evaluation on a large-scale dataset from LinkedIn.com demonstrates the effectiveness of the proposed approach. 1
6 0.72843701 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
7 0.72761887 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
8 0.72701383 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction
9 0.72635049 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
10 0.7248466 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation
11 0.72460824 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
12 0.72330219 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology
13 0.72205269 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity
14 0.72065604 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors
15 0.72033465 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training
16 0.71939433 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing
17 0.71903098 79 emnlp-2013-Exploiting Multiple Sources for Open-Domain Hypernym Discovery
18 0.71800828 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
19 0.71770221 69 emnlp-2013-Efficient Collective Entity Linking with Stacking
20 0.71696115 143 emnlp-2013-Open Domain Targeted Sentiment