emnlp emnlp2011 emnlp2011-6 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Prodromos Malakasiotis ; Ion Androutsopoulos
Abstract: We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind.
Reference: text
sentIndex sentText sentNum sentScore
1 The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. [sent-2, score-0.598]
2 The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. [sent-3, score-0.852]
3 Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. [sent-5, score-1.634]
4 We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. [sent-6, score-0.638]
5 The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind. [sent-7, score-0.524]
6 1 Introduction In recent years, significant effort has been devoted to research on paraphrasing (Androutsopoulos and Malakasiotis, 2010; Madnani and Dorr, 2010). [sent-8, score-0.392]
7 , methods that detect whether or not two in96 put sentences or other texts are paraphrases; (ii) generation methods, where the aim is to produce paraphrases of a given input sentence; and (iii) extraction methods, which aim to extract paraphrasing rules (e. [sent-11, score-0.864]
8 Significant progress has also been made in paraphrase extraction, where most recent methods produce large numbers of paraphrasing rules from multilingual parallel corpora (Bannard and CallisonBurch, 2005; Callison-Burch, 2008; Zhao et al. [sent-18, score-0.927]
9 In this paper, we are concerned with paraphrase generation, which has received less attention than the other two categories. [sent-22, score-0.377]
10 There are currently two main approaches to paraphrase generation. [sent-23, score-0.377]
11 The first one treats paraphrase generation as a machine translation problem, with the peculiarity that the target language is the same as the source one. [sent-24, score-0.489]
12 , 2009a); in both cases, paraphrases can then be generated by invoking an SMT system’s decoder (Koehn, 2009). [sent-30, score-0.371]
13 A second paraphrase generation approach is to treat existing machine translation engines as black boxes, and translate each input sentence to a pivot language and then back to the original language (Duboue and Chu-Carroll, 2006). [sent-31, score-0.672]
14 An extension of this approach uses multiple translation engines and pivot languages (Zhao et al. [sent-32, score-0.239]
15 In this paper, we investigate a different paraphrase generation approach, which does not produce paraphrases by invoking machine translation system(s). [sent-34, score-0.832]
16 We use an existing collection of monolingual paraphrasing rules extracted from multilingual parallel corpora (Zhao et al. [sent-35, score-0.599]
17 1 Given an input sentence, we use the paraphrasing rules to generate a large number of candidate paraphrases. [sent-38, score-0.614]
18 The intuition is that a good paraphrase is grammatical, preserves the meaning of the original sentence, while also being as different as possible. [sent-41, score-0.473]
19 Experimental results show that including in the ranking (or classification) component features from an existing paraphrase recognizer leads to improved results. [sent-42, score-0.654]
20 We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammatical1See, for example, Collins and Koo (2005). [sent-43, score-0.566]
21 The paper is accompanied by a new publicly available paraphrasing dataset we constructed for evaluations of this kind. [sent-45, score-0.524]
22 Further experiments indicate that when paraphrasing rules apply to the input sentences, our paraphrasing method is competitive to a state of the art paraphrase generator that uses multiple translation engines and pivot languages (Zhao et al. [sent-46, score-1.605]
23 We note that paraphrase generation is useful in several language processing tasks. [sent-48, score-0.407]
24 In question answering, for example, paraphrase generators can be used to paraphrase the user’s queries (Duboue and Chu-Carroll, 2006; Riezler and Liu, 2010); and in machine translation, paraphrase generation can help improve the translations (Callison-Burch et al. [sent-49, score-1.237]
25 2 Generating candidate paraphrases We use the approximately one million English paraphrasing rules of Zhao et al. [sent-57, score-0.894]
26 Roughly speaking, the rules were extracted from a parallel English-Chinese corpus, based on the assumption that two English phrases e1 and e2 that are often aligned to the same Chinese phrase c are likely to be paraphrases and, hence, they can be treated as a paraphrasing rule e1 ↔ e2. [sent-59, score-0.911]
27 2This pivot-based paraphrase extraction approach was first proposed by Bannard and Callison-Burch (2005). [sent-63, score-0.377]
28 It underlies several other paraphrase extraction methods (Riezler et al. [sent-64, score-0.377]
29 (2009b) use a log-linear ranker to assign scores to candidate English paraphrase pairs he1, e2i ; the ranker uses the alignment probabilities P(c|e1) a thnde P(e2 |c) as features, along w pritohb afebaitluitrieess tPha(tc assess the quality ofefa tthuree corresponding alignments. [sent-67, score-0.915]
30 consider two English phrases e1 and e2 as paraphrases, if they are often aligned to two Chinese phrases c1 and c2, which are themselves paraphrases according to Model 1(with English used as the pivot language). [sent-69, score-0.4]
31 nTkheer, resulting paraphrasing rcuorlees e1 ↔ e2 typically contain short phrases (up to four or ↔five e words excluding slots) on each side; hence, they can be used to rewrite only parts of longer sentences. [sent-73, score-0.392]
32 Given an input (source) sentence S, we generate candidate paraphrases by applying rules whose left or right hand side matches any part of S. [sent-74, score-0.528]
33 For example, rule (1) matches the source sentence (4); hence, (4) can be rewritten as the candidate paraphrase (5). [sent-75, score-0.546]
34 We allow all possible combinations of applicable rules to apply to S, excluding combinations that include rules rewriting overlapping parts of S. [sent-79, score-0.294]
35 3 A dataset of candidate paraphrases Our generate and rank method relies on existing large collections of paraphrasing rules to generate candidate paraphrases. [sent-99, score-1.081]
36 We selected randomly 75 source (S) sentences from the AQUAINT corpus, such that at least one of the paraphrasing rules applied to each S. [sent-102, score-0.53]
37 The judges were asked to provide grammaticality, meaning preservation, and overall paraphrase quality scores for each hS, Ci pair, each score on a 1q–ua4l sitcyal sec (1 fso fro totally unacceptable, a4c hfo src perfect); guidelines and examples were also provided. [sent-114, score-0.699]
38 Figure 1shows the distribution ofthe overall quality scores in the 1,935 hS, Ci pairs of the evaluaitityon s dataset; th thee d 1i,s9tr3i5bu thiSo,nCs iof p tahires grammaticality and meaning preservation scores are similar. [sent-115, score-0.794]
39 Notice that although we used only the 20 applicable paraphrasing rules with the highest scores to generate the hS, Ci pairs, less than half of the candidate paraphrases (C) were c leosnss tidheanred ha good, haen dca approximately only 20% perfect. [sent-116, score-0.944]
40 Several judges commented that they had trouble deciding to what extent the overall quality score should reflect grammaticality or meaning preservation. [sent-143, score-0.537]
41 They also wondered if it was fair to consider as 51 41 perfect candidate paraphrases that differed in only one or two words from the source sentences, i. [sent-144, score-0.453]
42 4 Ranking candidate paraphrases We now discuss the ranking component of our method, which assesses the candidate paraphrases. [sent-152, score-0.638]
43 EToa cahll ohwS, tChei ranking component atos assess utrhee degree to which a candidate C is grammatical, or at least as grammatical as the source S, we include in the feature vectors the language model scores of S, C, and the difference between the two scores. [sent-155, score-0.381]
44 9 To allow the ranker to consider the (context-insensitive) quality scores of the rules that generated C from S, we also include as features the highest, lowest, and average r1, r2, r3, and r4 scores (Section 2) of these rules, 12 features in total. [sent-159, score-0.449]
45 (2009a) in the only comparable paraphrase generation method we are aware of that uses paraphrasing rules. [sent-161, score-0.799]
46 By contrast, we first gen- erate a large number of candidates using the paraphrasing rules, and we then rank them. [sent-164, score-0.44]
47 (2010), hereafter called ZHAO-ENG, which uses multiple machine translation engines and pivot languages, instead of paraphrasing rules, and which Zhao et al. [sent-166, score-0.667]
48 10Application-specific features are also included, which can be used, for example, to favor paraphrases that are shorter than the input in sentence compression (Knight and Marcu, 2002; Clarke and Lapata, 2008). [sent-173, score-0.356]
49 11The MSR corpus contains pairs that are paraphrases or not. [sent-175, score-0.336]
50 It is a benchmark for paraphrase recognizers, not generators. [sent-176, score-0.377]
51 It provides only one paraphrase (true or false) of each source, and few of the true paraphrases can be obtained by the rules we use. [sent-177, score-0.793]
52 13Malakasiotis (2009) shows that although there is a lot of redundancy in the recognizer’s feature set, the full feature set still leads to better paraphrase recognition results, compared to subsets constructed via feature selection with hill-climbing or beam search. [sent-203, score-0.411]
53 Notice, also, that the recognizer does not use paraphrasing rules. [sent-205, score-0.485]
54 For simplicity, we used only the judges’ overall quality scores in these experiments, and we treated the problem as one of binary classification; overall quality scores of 1 and 2 where conflated to a negative category, and scores of 3 and 4 to a positive category. [sent-209, score-0.304]
55 Clearly, ME-REC outperforms the baseline, which uses only the average (contextinsensitive) scores of the applied paraphrasing rules. [sent-216, score-0.442]
56 and diversity scores instead; the grammaticality and meaning preservation scores were those provided by the judges, while diversity was automatically computed as the edit distance (Levenshtein, computed on tokens) between S and C. [sent-220, score-0.972]
57 t,o o fb e e a hli fneeaatur rceo vmecbtinoart oiofn a no hfS t,hCe grammaticality score g(xi), the meaning preservation score m(xi), and the diversity d(xi), as in Equation (6), where λ3 = 1− λ1 − λ2. [sent-225, score-0.704]
58 Hence, generic paraphrase generators, like ours, intended to be useful in many different applications, should be evaluated for many different combinations of the λi weights. [sent-228, score-0.414]
59 We employed a Support Vector Regression (SVR) model in the experiments of this section, instead of Figure 3: Performance ofour method’s SVR ranking component with (SVR-REC) and without (SVR-BASE) the additional features of the paraphrase recognizer. [sent-231, score-0.561]
60 2, and SVR-BASE the SVR ranker without the 136 features of the paraphrase recognizer. [sent-236, score-0.554]
61 , when all or most of the weight is placed on meaning preservation, there is no or very small difference between SVR-REC and SVR-BASE, suggesting that the extra features of the paraphrase recognizer are not as useful to the SVR, when assessing meaning preservation, as we would have hoped. [sent-269, score-0.668]
62 We believe that the dataset of Section 3 and the evaluation methodology summarized by Figure 3 will prove useful to other researchers, who may wish to evaluate other ranking components of generateand-rank paraphrasing methods against ours, for example with different ranking algorithms or features. [sent-271, score-0.677]
63 Similar datasets of candidate paraphrases can also be created using different collections of paraphrasing rules. [sent-272, score-0.829]
64 proceed to investigate how well our overall generateand-rank method (with SVR-REC) compares against a state of the art paraphrase generator. [sent-275, score-0.452]
65 , 2009a), which used paraphrasing rules and an SMT-like decoder (we call that previous method ZHAO-RUL). [sent-278, score-0.502]
66 Given an input sentence S, ZHAO-ENG produces candidate paraphrases by translating S to 6 pivot languages via 3 different commercial machine translation engines (treated as black boxes) and then back to the original language, again via 3 machine translation engines (54 combinations). [sent-279, score-0.835]
67 Roughly speaking, ZHAO-ENG then ranks the candidate paraphrases by their average distance from all the other candidates, selecting the candidate(s) with the smallest distance; distance is measured as BLEU score (Papineni et al. [sent-280, score-0.444]
68 17 Hence, ZHAO-ENG is also, in effect, a generate-and-rank paraphraser, but the candidates are generated by invoking multiple machine translation engines instead of applying paraphrasing rules, and they are ranked by the average distance measure rather than using an SVR. [sent-282, score-0.676]
69 An obvious practical advantage of ZHAO-ENG is that it exploits the vast resources of existing commercial machine translation engines when generating candidate paraphrases, which allows it to always obtain large numbers of candidate paraphrases. [sent-283, score-0.35]
70 By contrast, the collection of paraphrasing rules that we currently use does not manage to produce any candidate paraphrases in 40% of the sentences of the New York Times part of AQUAINT, because no rule applies. [sent-284, score-0.949]
71 Hence, in terms of ability to always paraphrase the input, ZHAO-ENG is clearly better, though it should be possible to improve our methods’s performance in that respect by using larger collections of paraphrasing rules. [sent-285, score-0.814]
72 18 A further interesting question, however, is how good the paraphrases of the two methods are, when both methods manage to paraphrase the input, i. [sent-286, score-0.683]
73 18Recall that the paraphrasing rules we use were extracted from an English-Chinese parallel corpus. [sent-290, score-0.55]
74 This scenario can be seen as an emulation of the case where the collection of paraphrasing rules is sufficiently large to guarantee that at least one rule applies to any source sentence. [sent-295, score-0.585]
75 We then selected 300 random source sentences S from AQUAINT that matched at least one of the paraphrasing rules, excluding sentences that had been used before. [sent-299, score-0.42]
76 Then, for each one of the 300 S sentences, we kept the single best candidate paraphrase C1 and C2, respectively, returned by our paraphraser and ZHAOENG. [sent-300, score-0.575]
77 nTdh ihsS t,iCmei tphaei judges assigned only grammaticality and meaning preser- vation scores (on a 1–4 scale); diversity was again computed as edit distance. [sent-302, score-0.625]
78 Table 2 lists the average grammaticality, meaning preservation, and diversity scores of the two methods. [sent-306, score-0.239]
79 All scores were normalized in [0, 1], but the reader should keep in mind that diversity was computed as edit distance, whereas the other two scores were provided by human judges on a 1–4 scale. [sent-307, score-0.365]
80 The grammaticality score of our method was better than ZHAO-ENG’s, and the difference was statistically significant. [sent-308, score-0.238]
81 The difference in diversity was larger and statistically significant, with the diversity scores indicating that it takes approximately twice as many edit operations (insert, delete, replace) to turn each source sentence to ZHAO-ENG’s paraphrase, compared to the paraphrase of our method. [sent-310, score-0.714]
82 We note that our method can be tuned, by adjusting the λi weights, to produce paraphrases with 19We used Analysis of Variance (ANOVA) (Fisher, 1925), followed by post-hoc Tukey tests to check whether the scores of the two methods differ significantly (p < 0. [sent-311, score-0.356]
83 3-5 E386NG Table 2: Evaluation of our paraphrasing method (with SVR-REC) against ZHAO-ENG, using human judges. [sent-316, score-0.392]
84 higher grammaticality, meaning preservation, or diversity scores; for example, we could increase λ3 and decrease λ1 to obtain higher diversity at the cost oflower grammaticality in the results ofTable It is unclear how ZHAO-ENG could be tuned that way. [sent-318, score-0.571]
85 It would be interesting to investigate in future work if our method’s coverage (sentences it can paraphrase) can increase to ZHAOENG’s level by using larger collections of paraphrasing rules. [sent-321, score-0.437]
86 It would also be interesting to combine the two methods, perhaps by using SVR-REC (without features for the quality scores of the rules) to rank candidate paraphrases generated by ZHAO-ENG. [sent-322, score-0.504]
87 6 Conclusions and future work We presented a generate-and-rank method to paraphrase sentences. [sent-323, score-0.377]
88 The method first produces candidate paraphrases by applying existing paraphrasing rules extracted from parallel corpora, and it then ranks (or classifies) the candidates to keep the best ones. [sent-324, score-0.99]
89 Further experiments with an SVR ranker indicated that our full feature set, which includes features from an existing paraphrase recognizer, leads to improved performance, compared to a smaller feature set that includes only the contextinsensitive scores of the rules and language modeling scores. [sent-328, score-0.747]
90 We also propose a new methodology to evaluate the ranking components of generate-andrank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. [sent-329, score-0.638]
91 The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind. [sent-330, score-0.524]
92 Finally, we evaluated our overall method against a state of the art sentence paraphraser, which generates candidates by using several commercial machine translation systems and pivot languages. [sent-331, score-0.304]
93 Our method performed better in terms of grammaticality, equally well in meaning preservation, and worse in diversity, but it could be tuned to obtain higher diversity at the cost of lower grammaticality, whereas it is unclear how the system we compare against could be tuned this way. [sent-333, score-0.243]
94 On the other hand, an advantage of the paraphraser we compared against is that it always produces paraphrases; by contast, our system does not produce paraphrases when no paraphrasing rule applies to the source sentence. [sent-334, score-0.893]
95 Larger collections of paraphrasing rules would be needed to improve our method in that respect. [sent-335, score-0.547]
96 We also plan to investigate the possibility of embedding our SVR ranker in the sentence paraphraser we compared against, i. [sent-337, score-0.265]
97 Answering the question you wish they had asked: The impact of paraphrasing for question answering. [sent-406, score-0.392]
98 Automatic generation of paraphrases to be used as translation references in objective evaluation measures of machine translation. [sent-453, score-0.39]
99 Using paraphrases for parameter tuning in statistical machine translation. [sent-471, score-0.306]
100 Pivot approach for extracting paraphrase patterns from bilingual corpora. [sent-580, score-0.377]
wordName wordTfidf (topN-words)
[('paraphrasing', 0.392), ('paraphrase', 0.377), ('paraphrases', 0.306), ('preservation', 0.277), ('svr', 0.26), ('grammaticality', 0.238), ('zhao', 0.176), ('ranker', 0.153), ('hs', 0.149), ('malakasiotis', 0.13), ('judges', 0.123), ('diversity', 0.117), ('paraphraser', 0.112), ('rules', 0.11), ('ci', 0.104), ('ranking', 0.102), ('xi', 0.098), ('pivot', 0.094), ('recognizer', 0.093), ('engines', 0.091), ('candidate', 0.086), ('androutsopoulos', 0.081), ('generators', 0.076), ('meaning', 0.072), ('aquaint', 0.065), ('invoking', 0.065), ('component', 0.058), ('szpektor', 0.056), ('dataset', 0.056), ('rule', 0.055), ('translation', 0.054), ('scores', 0.05), ('monolingual', 0.049), ('duboue', 0.049), ('unacceptable', 0.049), ('candidates', 0.048), ('parallel', 0.048), ('pin', 0.046), ('collections', 0.045), ('entailment', 0.042), ('accompanied', 0.042), ('overall', 0.039), ('replaced', 0.039), ('kok', 0.038), ('levenshtein', 0.038), ('dagan', 0.038), ('quality', 0.038), ('hence', 0.037), ('combinations', 0.037), ('hereafter', 0.036), ('art', 0.036), ('madnani', 0.035), ('constructed', 0.034), ('generator', 0.033), ('commercial', 0.033), ('riezler', 0.033), ('contextinsensitive', 0.033), ('indigo', 0.033), ('lepage', 0.033), ('mirkin', 0.033), ('msr', 0.033), ('npin', 0.033), ('oiefs', 0.033), ('phrasing', 0.033), ('plenty', 0.033), ('svrrec', 0.033), ('wondered', 0.033), ('classifier', 0.032), ('assessing', 0.03), ('brockett', 0.03), ('generation', 0.03), ('pairs', 0.03), ('degree', 0.029), ('bannard', 0.029), ('slots', 0.029), ('assess', 0.028), ('source', 0.028), ('comm', 0.028), ('kauchak', 0.028), ('nelken', 0.028), ('xpi', 0.028), ('preserve', 0.027), ('extent', 0.027), ('tuned', 0.027), ('distance', 0.026), ('input', 0.026), ('edit', 0.025), ('evaluates', 0.025), ('boxes', 0.025), ('codes', 0.025), ('soundex', 0.025), ('agreement', 0.025), ('methodology', 0.025), ('speaking', 0.025), ('features', 0.024), ('economics', 0.024), ('marton', 0.024), ('preserves', 0.024), ('wrote', 0.024), ('textual', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing
Author: Prodromos Malakasiotis ; Ion Androutsopoulos
Abstract: We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind.
2 0.41885543 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Author: Juri Ganitkevitch ; Chris Callison-Burch ; Courtney Napoles ; Benjamin Van Durme
Abstract: Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization. We discuss how our model can be adapted to many text generation tasks by augmenting its feature set, development data, and parameter estimation routine. We illustrate this adaptation by using our paraphrase model for the task of sentence compression and achieve results competitive with state-of-the-art compression systems.
3 0.13426964 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
Author: Tim Van de Cruys ; Thierry Poibeau ; Anna Korhonen
Abstract: This paper presents a novel method for the computation of word meaning in context. We make use of a factorization model in which words, together with their window-based context words and their dependency relations, are linked to latent dimensions. The factorization model allows us to determine which dimensions are important for a particular context, and adapt the dependency-based feature vector of the word accordingly. The evaluation on a lexical substitution task carried out for both English and French – indicates that our approach is able to reach better results than state-of-the-art methods in lexical substitution, while at the same time providing more accurate meaning representations. –
Author: Nobuhiro Kaji ; Masaru Kitsuregawa
Abstract: Word boundaries within noun compounds are not marked by white spaces in a number of languages, unlike in English, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words (i.e., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often outof-vocabulary. To overcome this difficulty, we propose using monolingual and bilingual paraphrases of katakana noun compounds for identifying word boundaries. Experiments demonstrated that splitting accuracy is substantially improved by extracting such paraphrases from unlabeled textual data, the Web in our case, and then using that information for constructing splitting models.
5 0.10470638 42 emnlp-2011-Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora
Author: Matteo Negri ; Luisa Bentivogli ; Yashar Mehdad ; Danilo Giampiccolo ; Alessandro Marchetti
Abstract: We address the creation of cross-lingual textual entailment corpora by means of crowdsourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes the manual work done by expert annotators, without resorting to preprocessing tools or already annotated monolingual datasets. In line with recent works emphasizing the need of large-scale annotation efforts for textual entailment, our work aims to: i) tackle the scarcity of data available to train and evaluate systems, and ii) promote the recourse to crowdsourcing as an effective way to reduce the costs of data collection without sacrificing quality. We show that a complex data creation task, for which even experts usually feature low agreement scores, can be effectively decomposed into simple subtasks assigned to non-expert annotators. The resulting dataset, obtained from a pipeline of different jobs routed to Amazon Mechanical Turk, contains more than 1,600 aligned pairs for each combination of texts-hypotheses in English, Italian and German.
6 0.099530034 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases
7 0.099324271 29 emnlp-2011-Collaborative Ranking: A Case Study on Entity Linking
8 0.092875451 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
9 0.089853957 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
10 0.088683546 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
11 0.078198113 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
12 0.077903055 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
13 0.075707838 78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
14 0.072721891 107 emnlp-2011-Probabilistic models of similarity in syntactic context
15 0.068650968 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries
16 0.065801203 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
17 0.062541686 38 emnlp-2011-Data-Driven Response Generation in Social Media
18 0.060189582 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
19 0.056675926 138 emnlp-2011-Tuning as Ranking
20 0.056374747 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
topicId topicWeight
[(0, 0.226), (1, 0.05), (2, 0.033), (3, -0.246), (4, 0.019), (5, -0.172), (6, -0.182), (7, 0.254), (8, 0.04), (9, 0.243), (10, -0.108), (11, -0.108), (12, 0.244), (13, 0.028), (14, 0.115), (15, -0.161), (16, 0.039), (17, -0.104), (18, 0.049), (19, -0.003), (20, 0.021), (21, 0.114), (22, 0.095), (23, -0.14), (24, 0.092), (25, -0.015), (26, -0.123), (27, -0.135), (28, 0.041), (29, -0.145), (30, 0.004), (31, -0.061), (32, -0.086), (33, 0.08), (34, 0.061), (35, -0.052), (36, -0.009), (37, -0.01), (38, 0.004), (39, -0.008), (40, 0.115), (41, 0.024), (42, 0.086), (43, -0.143), (44, 0.073), (45, 0.087), (46, -0.0), (47, -0.027), (48, 0.003), (49, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.95317918 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing
Author: Prodromos Malakasiotis ; Ion Androutsopoulos
Abstract: We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind.
2 0.82723504 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
Author: Juri Ganitkevitch ; Chris Callison-Burch ; Courtney Napoles ; Benjamin Van Durme
Abstract: Previous work has shown that high quality phrasal paraphrases can be extracted from bilingual parallel corpora. However, it is not clear whether bitexts are an appropriate resource for extracting more sophisticated sentential paraphrases, which are more obviously learnable from monolingual parallel corpora. We extend bilingual paraphrase extraction to syntactic paraphrases and demonstrate its ability to learn a variety of general paraphrastic transformations, including passivization, dative shift, and topicalization. We discuss how our model can be adapted to many text generation tasks by augmenting its feature set, development data, and parameter estimation routine. We illustrate this adaptation by using our paraphrase model for the task of sentence compression and achieve results competitive with state-of-the-art compression systems.
Author: Nobuhiro Kaji ; Masaru Kitsuregawa
Abstract: Word boundaries within noun compounds are not marked by white spaces in a number of languages, unlike in English, and it is beneficial for various NLP applications to split such noun compounds. In the case of Japanese, noun compounds made up of katakana words (i.e., transliterated foreign words) are particularly difficult to split, because katakana words are highly productive and are often outof-vocabulary. To overcome this difficulty, we propose using monolingual and bilingual paraphrases of katakana noun compounds for identifying word boundaries. Experiments demonstrated that splitting accuracy is substantially improved by extracting such paraphrases from unlabeled textual data, the Web in our case, and then using that information for constructing splitting models.
4 0.4294295 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
Author: Kristian Woodsend ; Mirella Lapata
Abstract: Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most previous work simplifies sentences using handcrafted rules aimed at splitting long sentences, or substitutes difficult words using a predefined dictionary. This paper presents a datadriven model based on quasi-synchronous grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. We describe how such a grammar can be induced from Wikipedia and propose an integer linear programming model for selecting the most appropriate simplification from the space of possible rewrites generated by the grammar. We show experimentally that our method creates simplifications that significantly reduce the reading difficulty ofthe input, while maintaining grammaticality and preserving its meaning.
5 0.39288902 42 emnlp-2011-Divide and Conquer: Crowdsourcing the Creation of Cross-Lingual Textual Entailment Corpora
Author: Matteo Negri ; Luisa Bentivogli ; Yashar Mehdad ; Danilo Giampiccolo ; Alessandro Marchetti
Abstract: We address the creation of cross-lingual textual entailment corpora by means of crowdsourcing. Our goal is to define a cheap and replicable data collection methodology that minimizes the manual work done by expert annotators, without resorting to preprocessing tools or already annotated monolingual datasets. In line with recent works emphasizing the need of large-scale annotation efforts for textual entailment, our work aims to: i) tackle the scarcity of data available to train and evaluate systems, and ii) promote the recourse to crowdsourcing as an effective way to reduce the costs of data collection without sacrificing quality. We show that a complex data creation task, for which even experts usually feature low agreement scores, can be effectively decomposed into simple subtasks assigned to non-expert annotators. The resulting dataset, obtained from a pipeline of different jobs routed to Amazon Mechanical Turk, contains more than 1,600 aligned pairs for each combination of texts-hypotheses in English, Italian and German.
6 0.34185177 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
7 0.3165026 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases
8 0.25449732 38 emnlp-2011-Data-Driven Response Generation in Social Media
9 0.25449672 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
10 0.25001943 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
11 0.24886818 78 emnlp-2011-Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus
12 0.24623473 29 emnlp-2011-Collaborative Ranking: A Case Study on Entity Linking
13 0.24082632 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
14 0.22774236 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries
15 0.22543991 107 emnlp-2011-Probabilistic models of similarity in syntactic context
16 0.21656545 89 emnlp-2011-Linguistic Redundancy in Twitter
17 0.20948115 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
18 0.20835508 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
19 0.2073326 61 emnlp-2011-Generating Aspect-oriented Multi-Document Summarization with Event-aspect model
20 0.20631388 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
topicId topicWeight
[(23, 0.165), (36, 0.031), (37, 0.025), (45, 0.059), (53, 0.027), (54, 0.039), (57, 0.027), (58, 0.24), (62, 0.028), (64, 0.033), (66, 0.031), (69, 0.016), (79, 0.049), (82, 0.037), (96, 0.043), (98, 0.053)]
simIndex simValue paperId paperTitle
same-paper 1 0.77652735 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing
Author: Prodromos Malakasiotis ; Ion Androutsopoulos
Abstract: We present a method that paraphrases a given sentence by first generating candidate paraphrases and then ranking (or classifying) them. The candidates are generated by applying existing paraphrasing rules extracted from parallel corpora. The ranking component considers not only the overall quality of the rules that produced each candidate, but also the extent to which they preserve grammaticality and meaning in the particular context of the input sentence, as well as the degree to which the candidate differs from the input. We experimented with both a Maximum Entropy classifier and an SVR ranker. Experimental results show that incorporating features from an existing paraphrase recognizer in the ranking component improves performance, and that our overall method compares well against a state of the art paraphrase generator, when paraphrasing rules apply to the input sentences. We also propose a new methodology to evaluate the ranking components of generate-and-rank paraphrase generators, which evaluates them across different combinations of weights for grammaticality, meaning preservation, and diversity. The paper is accompanied by a paraphrasing dataset we constructed for evaluations of this kind.
2 0.65311205 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
3 0.63802838 136 emnlp-2011-Training a Parser for Machine Translation Reordering
Author: Jason Katz-Brown ; Slav Petrov ; Ryan McDonald ; Franz Och ; David Talbot ; Hiroshi Ichikawa ; Masakazu Seno ; Hideto Kazawa
Abstract: We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.
4 0.63757014 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
Author: Burr Settles
Abstract: This paper describes DUALIST, an active learning annotation paradigm which solicits and learns from labels on both features (e.g., words) and instances (e.g., documents). We present a novel semi-supervised training algorithm developed for this setting, which is (1) fast enough to support real-time interactive speeds, and (2) at least as accurate as preexisting methods for learning with mixed feature and instance labels. Human annotators in user studies were able to produce near-stateof-the-art classifiers—on several corpora in a variety of application domains—with only a few minutes of effort.
5 0.63733369 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
6 0.63440245 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
7 0.63292456 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
8 0.63070035 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases
9 0.62868637 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
10 0.62789756 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
11 0.62510616 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
12 0.62466347 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
13 0.62203425 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
14 0.62201047 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
15 0.62076181 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
16 0.62051129 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
17 0.61842829 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
18 0.61738515 70 emnlp-2011-Identifying Relations for Open Information Extraction
19 0.61676872 23 emnlp-2011-Bootstrapped Named Entity Recognition for Product Attribute Extraction
20 0.6164881 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction