acl acl2011 acl2011-96 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
Reference: text
sentIndex sentText sentNum sentScore
1 ch Abstract Temporal–contrastive discourse connectives (although, while, since, etc. [sent-3, score-0.853]
2 ) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. [sent-4, score-0.197]
3 They are often ambiguous and therefore difficult to translate from one language to another. [sent-5, score-0.045]
4 We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems. [sent-6, score-1.113]
5 1 Introduction The probabilistic phrase-based models used in statistical machine translation (SMT) have been improved by integrating linguistic information during training stages. [sent-7, score-0.083]
6 On the other hand, integrating discourse information, such as discourse relations holding between two spans oftext or between sentences, has not yet been applied to SMT. [sent-10, score-0.562]
7 This paper describes several disambiguation and translation experiments for a specific subset of discourse connectives. [sent-11, score-0.427]
8 Based on examinations in multilingual corpora, we identified the connectives although, but, however, meanwhile, since, though, when and while as being particularly problematic for machine translation. [sent-12, score-0.597]
9 These discourse connectives 46 signal various types of relations between clauses, such as temporal, contrast, concession, expansion, cause and condition, which are, as we also show, hard to annotate even by humans. [sent-13, score-0.876]
10 Disambiguating these senses and tagging them in large corpora is hypothesized to help in improving SMT systems to avoid translation errors. [sent-14, score-0.316]
11 Resources and the state of the art for discourse connective disambiguation and parsing are described in Section 3. [sent-17, score-0.788]
12 Section 4 summarizes our experiments for disambiguating the senses of temporal–contrastive connectives. [sent-18, score-0.301]
13 The impact of connective disambiguation on SMT is briefly presented in Section 5. [sent-19, score-0.532]
14 2 Translating Connectives Discourse connectives can signal multiple senses (Miltsakaki et al. [sent-21, score-0.871]
15 For instance, the connective since can have a temporal and causal meaning. [sent-23, score-0.585]
16 The disambiguation of these senses is crucial to the correct translation of texts from one language to another. [sent-24, score-0.446]
17 Translation can be difficult because there may be no direct lexical correspondence for the explicit source language connective in the target language, as shown by the reference translation of the first example in Table 1, taken from the Europarl corpus (Koehn, 2005). [sent-25, score-0.571]
18 More often, the incorrect rendering of the sense of a connective can lead to wrong translations, as in the second, third and fourth example in Table 1, which were translated by the Moses SMT decoder (Koehn Portland, OPR,ro UcSeeAdi 1n9g-s2 o4f J uthnee A 2C01L-1H. [sent-26, score-0.567]
19 The discourse connectives, their translations, and their senses are indicated in bold. [sent-29, score-0.507]
20 The first example is a reference translation from EN into FR, while the second, third and fourth example are wrong translations generated by MT (EN–FR and EN–DE), hence marked with an asterisk. [sent-30, score-0.208]
21 The reference translation for the second example uses the French connective car with a correct causal sense, instead of the wrong depuis que generated by SMT, which expresses a temporal relation. [sent-33, score-0.788]
22 In the third example, the SMT system failed to translate the English connective while to French. [sent-34, score-0.474]
23 The French translation is therefore not coherent, the contrastive discourse information cannot be established without an explicit connective. [sent-35, score-0.486]
24 In its German translation, it would be correct to use the connective auch wenn (for contrast) instead of obwohl (for concession). [sent-38, score-0.45]
25 These examples illustrate the difficulties in trans- lating discourse connectives, even when they are lexically explicit. [sent-39, score-0.279]
26 When examining the frequency and sense distribution of these connectives and their translations in the Europarl corpus, the results confirm that at least such a fine-grained disambiguation as the one between contrast and concession is necessary for a correct translation. [sent-43, score-1.035]
27 Table 2 shows cases where the different senses of the connectives while and although lead to different translations. [sent-44, score-0.848]
28 Disambiguation of the senses here can help finding the correct lexical correspondence of the connective. [sent-45, score-0.275]
29 To confirm that the automatic translation of discourse connectives is not straightforward, we annotated 80 sentences from the Europarl corpus containing the connective while with the corresponding sense (T, CO or CT) and another 60 sentences containing the French connective alors que (T or CT). [sent-46, score-1.97]
30 We then translated these sentences with the already mentioned EN–FR and FR–EN Moses SMT system and compared the output manually to the reference translations from the corpus. [sent-47, score-0.073]
31 The overall system performance was 61% of correct translations for sentences with while and 55% of correct translations with alors que. [sent-48, score-0.189]
32 As mistakes we either counted missing target connective words (only when the output sentence became incoherent) or wrong connective words because of failure in correct sense rendering. [sent-49, score-0.975]
33 Also, the manual sense annotation task is not trivial. [sent-50, score-0.118]
34 In a manual annotation experiment, the senses of the connective while (T, CO and CT) were indicated in 30 sentences by 4 annotators. [sent-51, score-0.724]
35 The overall agreement on the senses was not higher than a kappa value of 0. [sent-52, score-0.28]
36 3 Data and Related Work One of the few available discourse annotated corpora in English is the Penn Discourse Treebank (PDTB) (Prasad et al. [sent-54, score-0.256]
37 For this resource, one hundred types of explicit connectives were manually annotated, as well as implicit relations not signaled by a connective. [sent-56, score-0.683]
38 For French, the ANNODIS project for annotation of discourse (Pery-Woodley et al. [sent-57, score-0.285]
39 For German, a lexicon of discourse connectives exists since the 1990s, namely DiMLex for lexicon of discourse markers (Stede and Umbach, 1998). [sent-61, score-1.184]
40 An equivalent, more recent database for French is LexConn for lexicon of connectives (Roze et al. [sent-62, score-0.624]
41 For the first classification experiments in Section 4, we concentrated on English and the explicit connectives in the PDTB data. [sent-65, score-0.655]
42 The sense hierarchy used in the PDTB consists of three levels, reaching from four top level senses (Temporal, Contingency, Comparison and Expansion) via 16 subsenses on the second level to 23 further subsenses on the third level. [sent-66, score-0.579]
43 As the annotators were allowed to assign one or two senses for each connective there are 129 possible simple or complex senses for more than 18,000 explicit connectives. [sent-67, score-0.986]
44 The PDTB further sees connectives as discourse-level predicates that have two propositional arguments. [sent-68, score-0.597]
45 Argument 2 is the one containing the explicit connective. [sent-69, score-0.058]
46 [argument 2]), which is very helpful to examine the context of a connective (see Section 4. [sent-76, score-0.426]
47 – The release of the PDTB had quite an impact on disambiguation experiments. [sent-78, score-0.106]
48 The state of the art for recognizing explicit connectives in English is therefore already high, at a level of 94% for disambiguating the four main senses on the first level of the PDTB sense hierarchy (Pitler and Nenkova, 2009). [sent-79, score-1.119]
49 However, when using all 100 types of connectives 48 and the whole PDTB training set, it is not so difficult to achieve such a high score, because of the large amount of instances and the rather broad distinction of the four main classes only. [sent-80, score-0.597]
50 As we show in the next section, when building separate classifiers for specific connectives with senses from the more detailed second hierarchy level of the PDTB, it is more difficult to reach high accuracies. [sent-81, score-0.915]
51 (2010) built the first end-to-end PDTB discourse parser, which is able to parse unrestricted text with an F1 score of 38. [sent-83, score-0.277]
52 18% on PDTB test data and for senses on the second hierarchy level. [sent-84, score-0.293]
53 5 decision tree and NaiveBayes algorithms often used in recent research on discourse connective classification. [sent-88, score-0.682]
54 Our first experiment was aimed at sense disambiguation down to the third level of the PDTB hierarchy. [sent-89, score-0.225]
55 The training set here consisted of all 100 types of explicit connectives annotated in the PDTB training set (15,366 instances). [sent-90, score-0.655]
56 The only two features were the (capitalized) connective word tokens from the PDTB and their Part of Speech (POS) tags. [sent-92, score-0.426]
57 For all 129 possible sense combinations, including complex senses, results reach 66. [sent-93, score-0.071]
58 86% for correctly classified connectives (with the 4 main senses), when using the connective token as the only feature. [sent-98, score-1.023]
59 condition (COND), contrast (CT), concession (CO) and expansion (E). [sent-104, score-0.183]
60 All subsenses from the third PDTB hierarchy level were merged under second level ones (C, COND, CT, CO). [sent-105, score-0.174]
61 Exceptions were the top level senses T and E, which, so far, need no further disambiguation for translation. [sent-106, score-0.382]
62 In addition, we extracted separate training sets for each of the 8 temporal–contrastive connectives in question and one training set for all them. [sent-107, score-0.597]
63 The number of occurrences and senses in the sets for the single con- nectives is listed in Table 3. [sent-108, score-0.251]
64 The total number of instances in the training set for all 8 connectives is 5,299 occurrences, with a sense distribution of 56. [sent-109, score-0.668]
65 1 Features The following basic surface features were considered when disambiguating the senses signaled by connectives. [sent-117, score-0.329]
66 Future automated disambiguation will be applied to unrestricted text, identifying the discourse arguments and syntactical elements in automatically parsed and POS–tagged sentences. [sent-119, score-0.422]
67 the (capitalized) connective word form its POS tag first word of argument 1 last word of argument 1 first word of argument 2 last word of argument 2 POS tag of the first word of argument 2 8. [sent-127, score-0.766]
68 punctuation pattern 49 The cased word forms (feature 1) were left as is, therefore also indicating whether the connective is located at the beginning of a sentence or not. [sent-130, score-0.426]
69 (2010) and duVerle and Prendinger (2009), the context of a connective is very important. [sent-136, score-0.426]
70 The arguments may include other (reinforcing or opposite) connectives, numbers and antonyms (to express contrastive relations). [sent-137, score-0.107]
71 We extracted the words at the beginning and at the end of argument 1 (features 3, 4) and argument 2 (features 5, 6) which are, as observed, other connectives, gerunds, adverbs or determiners (further generalized by features 7 and 8). [sent-138, score-0.136]
72 The paths to syntactical ancestors (feature 9) in which the connective word form appears are quite numerous and were therefore truncated to a maximum of four ancestors (e. [sent-139, score-0.519]
73 where C is the explicit connective and A a placeholder for all the other words. [sent-145, score-0.484]
74 Punctuation is important for locating connectives as many of them are subordinating and coordinating conjunctions, separated by commas (Haddow, 2005, p. [sent-146, score-0.597]
75 2 Results In the disambiguation experiments described here, results were generated separately for every temporal–contrastive connective (supposing one may try to improve the translation of only certain connectives), in addition to one result for the whole subset. [sent-149, score-0.597]
76 They were measured using accuracy (percentage of correctly classified instances) and the kappa value. [sent-151, score-0.047]
77 the prediction for the most frequent sense annotated for the corresponding connective. [sent-154, score-0.071]
78 The last result for all 8 temporal–contrastive connectives reports a six-way classification of senses very close to one another: the accuracy and kappa values are well above random agreement and prediction of the majority class. [sent-157, score-0.895]
79 Note that experiments for specific subsets of connectives have very rarely been tried in research. [sent-158, score-0.597]
80 The results for the single connectives are comparable with ours in the case of since and while, where similar senses were used. [sent-164, score-0.848]
81 5 SMT Experiments We have started to explore how to constrain an SMT system to use labeled connectives resulting from the experiments above. [sent-166, score-0.597]
82 There are at least two methods to integrate labeled discourse connectives in the SMT process. [sent-167, score-0.853]
83 , 2007) in order to encourage it to translate a specific sense of a connective with an acceptable equivalent. [sent-169, score-0.54]
84 A second, more natural method for an SMT system would be to apply the discourse information obtained from the disambiguation module, adding the sense tags to the discourse connectives in a large parallel corpus. [sent-170, score-1.286]
85 Information about the possible senses of the connective while, labeled as temporal(1), contrast(2) or concession(3)) was directly introduced to the English source language phrases when there was an appro2Paired t-tests were performed at 95% confidence level. [sent-173, score-0.677]
86 50 priate translation of the connective in the French equivalent phrase. [sent-175, score-0.491]
87 The following example gives an idea of the changes in the phrase table of the above-mentioned EN–FR Moses SMT system: < original: and the commission , while preserving | || et la commission tout en de´ faenndda tnhte |c c| |o 1m m3. [sent-177, score-0.217]
88 7|1 e8t |a a| |c o| |m m| 1m i1s aenndda nwhti l|e|| many || | ee-0t 6bi 1en 5 que d0e7 neo-0m6b 2r. [sent-180, score-0.064]
89 7an18y | | | |t t| |b i1e 1n > modified: and the commission , while-1 preserving | || et la commission tout en da´ nefden thdean cto |m m| |m m1i s1s i1o 1n 2, w. [sent-185, score-0.217]
90 71h8il |- |1 1| p| |r |e s1e r1v and while-3 many ||| et bien que de nombreuses || | 1 10. [sent-186, score-0.084]
91 labeled connectives are correctly tends to confirm the hypothesis translated. [sent-192, score-0.623]
92 This of this paper, that information regarding discourse connectives indeed can lead to better translations. [sent-193, score-0.853]
93 6 Conclusion and Future Work The paper described new translation-oriented proaches to the disambiguation plicit discourse connectives temporal–contrastive ap- of a subset of ex- with highly ambiguous senses. [sent-194, score-1.018]
94 Although lexically ex- plicit, their translation by current SMT systems is often wrong. [sent-195, score-0.088]
95 In addition, the paper showed a first method to force an existing and trained SMT system to translate discourse connectives correctly. [sent-198, score-0.878]
96 This led to noticeable improvements on the translations of the tested sentences. [sent-199, score-0.051]
97 We will continue to train SMT systems on automatically labeled discourse connectives in large corpora. [sent-200, score-0.853]
98 Rhetorical structure theory: towards a functional theory oftext organization. [sent-250, score-0.032]
99 ANNODIS: une approche out- ille de l’annotation de structures discursives. [sent-258, score-0.057]
100 DiMLex: a lexicon of discourse markers for text generation and understanding. [sent-276, score-0.304]
wordName wordTfidf (topN-words)
[('connectives', 0.597), ('connective', 0.426), ('pdtb', 0.311), ('discourse', 0.256), ('senses', 0.251), ('concession', 0.137), ('temporal', 0.129), ('smt', 0.12), ('contrastive', 0.107), ('disambiguation', 0.106), ('sense', 0.071), ('ct', 0.068), ('argument', 0.068), ('europarl', 0.067), ('translation', 0.065), ('que', 0.064), ('miltsakaki', 0.06), ('cond', 0.059), ('lexconn', 0.059), ('subsenses', 0.059), ('explicit', 0.058), ('en', 0.057), ('commission', 0.052), ('idiap', 0.052), ('translations', 0.051), ('prasad', 0.051), ('disambiguating', 0.05), ('french', 0.048), ('fr', 0.047), ('hierarchy', 0.042), ('alors', 0.039), ('annodis', 0.039), ('dimlex', 0.039), ('nagard', 0.039), ('plicit', 0.039), ('roze', 0.039), ('syntactical', 0.039), ('moses', 0.039), ('pitler', 0.037), ('co', 0.036), ('dinesh', 0.035), ('rashmi', 0.035), ('duverle', 0.035), ('stede', 0.035), ('tout', 0.035), ('nov', 0.032), ('oftext', 0.032), ('koehn', 0.031), ('exemplifies', 0.03), ('eleni', 0.03), ('nikhil', 0.03), ('philippe', 0.03), ('causal', 0.03), ('annotation', 0.029), ('kappa', 0.029), ('signaled', 0.028), ('wrong', 0.028), ('ancestors', 0.027), ('lexicon', 0.027), ('confirm', 0.026), ('translate', 0.025), ('level', 0.025), ('capitalized', 0.024), ('reaching', 0.024), ('correct', 0.024), ('third', 0.023), ('switzerland', 0.023), ('expansion', 0.023), ('contrast', 0.023), ('signal', 0.023), ('rhetorical', 0.023), ('lexically', 0.023), ('weka', 0.023), ('reference', 0.022), ('preserving', 0.021), ('unrestricted', 0.021), ('aravind', 0.021), ('mann', 0.021), ('joshi', 0.021), ('nenkova', 0.021), ('markers', 0.021), ('de', 0.02), ('ambiguous', 0.02), ('clauses', 0.02), ('fourth', 0.019), ('pos', 0.019), ('acceptable', 0.018), ('accuracy', 0.018), ('philipp', 0.018), ('manual', 0.018), ('integrating', 0.018), ('school', 0.017), ('asher', 0.017), ('martigny', 0.017), ('ofrelations', 0.017), ('livio', 0.017), ('robaldo', 0.017), ('ziheng', 0.017), ('matr', 0.017), ('approche', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
2 0.26907539 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
3 0.1668023 158 acl-2011-Identification of Domain-Specific Senses in a Machine-Readable Dictionary
Author: Fumiyo Fukumoto ; Yoshimi Suzuki
Abstract: This paper focuses on domain-specific senses and presents a method for assigning category/domain label to each sense of words in a dictionary. The method first identifies each sense of a word in the dictionary to its corresponding category. We used a text classification technique to select appropriate senses for each domain. Then, senses were scored by computing the rank scores. We used Markov Random Walk (MRW) model. The method was tested on English and Japanese resources, WordNet 3.0 and EDR Japanese dictionary. For evaluation of the method, we compared English results with the Subject Field Codes (SFC) resources. We also compared each English and Japanese results to the first sense heuristics in the WSD task. These results suggest that identification of domain-specific senses (IDSS) may actually be of benefit.
4 0.14906538 307 acl-2011-Towards Tracking Semantic Change by Visual Analytics
Author: Christian Rohrdantz ; Annette Hautli ; Thomas Mayer ; Miriam Butt ; Daniel A. Keim ; Frans Plank
Abstract: This paper presents a new approach to detecting and tracking changes in word meaning by visually modeling and representing diachronic development in word contexts. Previous studies have shown that computational models are capable of clustering and disambiguating senses, a more recent trend investigates whether changes in word meaning can be tracked by automatic methods. The aim of our study is to offer a new instrument for investigating the diachronic development of word senses in a way that allows for a better understanding of the nature of semantic change in general. For this purpose we combine techniques from the field of Visual Analytics with unsupervised methods from Natural Language Processing, allowing for an interactive visual exploration of semantic change.
5 0.14595191 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
Author: Tim Van de Cruys ; Marianna Apidianaki
Abstract: In this paper, we present a unified model for the automatic induction of word senses from text, and the subsequent disambiguation of particular word instances using the automatically extracted sense inventory. The induction step and the disambiguation step are based on the same principle: words and contexts are mapped to a limited number of topical dimensions in a latent semantic word space. The intuition is that a particular sense is associated with a particular topic, so that different senses can be discriminated through their association with particular topical dimensions; in a similar vein, a particular instance of a word can be disambiguated by determining its most important topical dimensions. The model is evaluated on the SEMEVAL-20 10 word sense induction and disambiguation task, on which it reaches stateof-the-art results.
6 0.10298906 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
7 0.092676617 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
8 0.085948862 294 acl-2011-Temporal Evaluation
9 0.085584357 334 acl-2011-Which Noun Phrases Denote Which Concepts?
10 0.068994857 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
11 0.064085446 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
12 0.061362065 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents
13 0.060274344 167 acl-2011-Improving Dependency Parsing with Semantic Classes
14 0.057982497 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation
15 0.057900071 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals
16 0.057144567 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
17 0.056836586 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
18 0.05605603 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
19 0.053080756 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
20 0.052309044 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus
topicId topicWeight
[(0, 0.139), (1, -0.029), (2, -0.021), (3, 0.023), (4, 0.0), (5, 0.03), (6, 0.097), (7, 0.028), (8, -0.021), (9, -0.008), (10, 0.003), (11, -0.195), (12, 0.089), (13, -0.057), (14, -0.038), (15, -0.073), (16, 0.077), (17, 0.168), (18, -0.044), (19, 0.129), (20, 0.042), (21, 0.019), (22, -0.035), (23, 0.005), (24, 0.006), (25, 0.054), (26, -0.019), (27, -0.016), (28, -0.004), (29, 0.047), (30, -0.065), (31, 0.094), (32, 0.025), (33, 0.033), (34, -0.052), (35, -0.067), (36, -0.165), (37, 0.076), (38, -0.07), (39, -0.127), (40, -0.109), (41, 0.029), (42, 0.051), (43, -0.108), (44, -0.057), (45, 0.035), (46, -0.016), (47, -0.048), (48, -0.064), (49, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.93749148 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
2 0.68816924 158 acl-2011-Identification of Domain-Specific Senses in a Machine-Readable Dictionary
Author: Fumiyo Fukumoto ; Yoshimi Suzuki
Abstract: This paper focuses on domain-specific senses and presents a method for assigning category/domain label to each sense of words in a dictionary. The method first identifies each sense of a word in the dictionary to its corresponding category. We used a text classification technique to select appropriate senses for each domain. Then, senses were scored by computing the rank scores. We used Markov Random Walk (MRW) model. The method was tested on English and Japanese resources, WordNet 3.0 and EDR Japanese dictionary. For evaluation of the method, we compared English results with the Subject Field Codes (SFC) resources. We also compared each English and Japanese results to the first sense heuristics in the WSD task. These results suggest that identification of domain-specific senses (IDSS) may actually be of benefit.
3 0.66110778 307 acl-2011-Towards Tracking Semantic Change by Visual Analytics
Author: Christian Rohrdantz ; Annette Hautli ; Thomas Mayer ; Miriam Butt ; Daniel A. Keim ; Frans Plank
Abstract: This paper presents a new approach to detecting and tracking changes in word meaning by visually modeling and representing diachronic development in word contexts. Previous studies have shown that computational models are capable of clustering and disambiguating senses, a more recent trend investigates whether changes in word meaning can be tracked by automatic methods. The aim of our study is to offer a new instrument for investigating the diachronic development of word senses in a way that allows for a better understanding of the nature of semantic change in general. For this purpose we combine techniques from the field of Visual Analytics with unsupervised methods from Natural Language Processing, allowing for an interactive visual exploration of semantic change.
4 0.62282526 53 acl-2011-Automatically Evaluating Text Coherence Using Discourse Relations
Author: Ziheng Lin ; Hwee Tou Ng ; Min-Yen Kan
Abstract: We present a novel model to represent and assess the discourse coherence of text. Our model assumes that coherent text implicitly favors certain types of discourse relation transitions. We implement this model and apply it towards the text ordering ranking task, which aims to discern an original text from a permuted ordering of its sentences. The experimental results demonstrate that our model is able to significantly outperform the state-ofthe-art coherence model by Barzilay and Lapata (2005), reducing the error rate of the previous approach by an average of 29% over three data sets against human upperbounds. We further show that our model is synergistic with the previous approach, demonstrating an error reduction of 73% when the features from both models are combined for the task.
5 0.59439069 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
Author: Tim Van de Cruys ; Marianna Apidianaki
Abstract: In this paper, we present a unified model for the automatic induction of word senses from text, and the subsequent disambiguation of particular word instances using the automatically extracted sense inventory. The induction step and the disambiguation step are based on the same principle: words and contexts are mapped to a limited number of topical dimensions in a latent semantic word space. The intuition is that a particular sense is associated with a particular topic, so that different senses can be discriminated through their association with particular topical dimensions; in a similar vein, a particular instance of a word can be disambiguated by determining its most important topical dimensions. The model is evaluated on the SEMEVAL-20 10 word sense induction and disambiguation task, on which it reaches stateof-the-art results.
6 0.54014796 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
7 0.53949767 334 acl-2011-Which Noun Phrases Denote Which Concepts?
8 0.53794551 294 acl-2011-Temporal Evaluation
9 0.53149271 138 acl-2011-French TimeBank: An ISO-TimeML Annotated Reference Corpus
10 0.50578451 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
11 0.40834633 311 acl-2011-Translationese and Its Dialects
12 0.39102468 157 acl-2011-I Thou Thee, Thou Traitor: Predicting Formal vs. Informal Address in English Literature
13 0.36689404 322 acl-2011-Unsupervised Learning of Semantic Relation Composition
14 0.34031898 101 acl-2011-Disentangling Chat with Local Coherence Models
15 0.33892885 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
16 0.33488941 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis
17 0.33121705 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications
18 0.3275528 167 acl-2011-Improving Dependency Parsing with Semantic Classes
19 0.3262583 280 acl-2011-Sentence Ordering Driven by Local and Global Coherence for Summary Generation
20 0.31837496 260 acl-2011-Recognizing Authority in Dialogue with an Integer Linear Programming Constrained Model
topicId topicWeight
[(5, 0.04), (17, 0.041), (26, 0.041), (27, 0.277), (37, 0.09), (39, 0.035), (41, 0.046), (53, 0.02), (55, 0.022), (59, 0.063), (72, 0.032), (91, 0.023), (96, 0.151), (97, 0.019), (98, 0.012)]
simIndex simValue paperId paperTitle
1 0.87619072 337 acl-2011-Wikipedia Revision Toolkit: Efficiently Accessing Wikipedias Edit History
Author: Oliver Ferschke ; Torsten Zesch ; Iryna Gurevych
Abstract: We present an open-source toolkit which allows (i) to reconstruct past states of Wikipedia, and (ii) to efficiently access the edit history of Wikipedia articles. Reconstructing past states of Wikipedia is a prerequisite for reproducing previous experimental work based on Wikipedia. Beyond that, the edit history of Wikipedia articles has been shown to be a valuable knowledge source for NLP, but access is severely impeded by the lack of efficient tools for managing the huge amount of provided data. By using a dedicated storage format, our toolkit massively decreases the data volume to less than 2% of the original size, and at the same time provides an easy-to-use interface to access the revision data. The language-independent design allows to process any language represented in Wikipedia. We expect this work to consolidate NLP research using Wikipedia in general, and to foster research making use of the knowledge encoded in Wikipedia’s edit history.
same-paper 2 0.76576161 96 acl-2011-Disambiguating temporal-contrastive connectives for machine translation
Author: Thomas Meyer
Abstract: Temporal–contrastive discourse connectives (although, while, since, etc.) signal various types ofrelations between clauses such as temporal, contrast, concession and cause. They are often ambiguous and therefore difficult to translate from one language to another. We discuss several new and translation-oriented experiments for the disambiguation of a specific subset of discourse connectives in order to correct some of the translation errors made by current statistical machine translation systems.
3 0.70480126 101 acl-2011-Disentangling Chat with Local Coherence Models
Author: Micha Elsner ; Eugene Charniak
Abstract: We evaluate several popular models of local discourse coherence for domain and task generality by applying them to chat disentanglement. Using experiments on synthetic multiparty conversations, we show that most models transfer well from text to dialogue. Coherence models improve results overall when good parses and topic models are available, and on a constrained task for real chat data.
4 0.64653075 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words
Author: Hal Daume III ; Jagadeesh Jagarlamudi
Abstract: We show that unseen words account for a large part of the translation error when moving to new domains. Using an extension of a recent approach to mining translations from comparable corpora (Haghighi et al., 2008), we are able to find translations for otherwise OOV terms. We show several approaches to integrating such translations into a phrasebased translation system, yielding consistent improvements in translations quality (between 0.5 and 1.5 Bleu points) on four domains and two language pairs.
5 0.59265578 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico
Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.
6 0.58988893 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
7 0.5887807 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization
8 0.58687216 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing
9 0.58609027 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
10 0.58563542 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations
11 0.58506989 240 acl-2011-ParaSense or How to Use Parallel Corpora for Word Sense Disambiguation
12 0.58493042 133 acl-2011-Extracting Social Power Relationships from Natural Language
13 0.58482951 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation
14 0.58429545 131 acl-2011-Extracting Opinion Expressions and Their Polarities - Exploration of Pipelines and Joint Models
15 0.58331287 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
16 0.58318675 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates
17 0.58280277 311 acl-2011-Translationese and Its Dialects
18 0.58138055 44 acl-2011-An exponential translation model for target language morphology
19 0.58133483 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
20 0.58088005 269 acl-2011-Scaling up Automatic Cross-Lingual Semantic Role Annotation