acl acl2011 acl2011-37 knowledge-graph by maker-knowledge-mining

37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques


Source: pdf

Author: Donald Metzler ; Eduard Hovy ; Chunliang Zhang

Abstract: Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 of Southern California Marina del Rey, CA, USA met z le r@ i i edu s . [sent-2, score-0.048]

2 Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. [sent-4, score-0.265]

3 Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. [sent-5, score-0.143]

4 This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. [sent-6, score-0.744]

5 Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy. [sent-7, score-0.209]

6 1 Introduction A popular idiom states that “variety is the spice of life”. [sent-8, score-0.069]

7 As with life, variety also adds spice and appeal to language. [sent-9, score-0.106]

8 While variety prevents language from being overly rigid and boring, it also makes it difficult to algorithmically determine if two phrases or sentences express the same meaning. [sent-11, score-0.123]

9 of Southern California Marina del Rey, CA, USA Marina del Rey, CA, USA hovy@ i i s . [sent-14, score-0.096]

10 , 2003; Pasca and Dienes, 2005) Many data-driven approaches to the paraphrase problem have been proposed. [sent-17, score-0.464]

11 The approaches vastly differ in their complexity and the amount of NLP resources that they rely on. [sent-18, score-0.16]

12 At one end of the spectrum are approaches that generate paraphrases from a large monolingual corpus and minimally rely on NLP tools. [sent-19, score-0.729]

13 Such approaches typically make use of statistical co-occurrences, which act as a rather crude proxy for semantics. [sent-20, score-0.105]

14 At the other end of the spectrum are more complex approaches that require access to bilingual parallel corpora and may also rely on part-of-speech (POS) taggers, chunkers, parsers, and statistical machine translation tools. [sent-21, score-0.343]

15 Constructing large comparable and bilingual corpora is expensive and, in some cases, impossible. [sent-22, score-0.059]

16 Despite all of the previous research, there have not been any evaluations comparing the quality of simple and sophisticated data-driven approaches for generating paraphrases. [sent-23, score-0.327]

17 Evaluation is not only important from a practical perspective, but also from a methodological standpoint, as well, since it is often more fruitful to devote attention to building upon the current state-of-the-art as opposed to improving upon less effective approaches. [sent-24, score-0.073]

18 Although the more sophisticated approaches have garnered considerably more attention from researchers, from a practical perspective, simplicity, quality, and flexibility are the most important properties. [sent-25, score-0.269]

19 Our evaluation helps develop a better understanding of the strengths and weaknesses of each type of approach. [sent-31, score-0.084]

20 The evaluation also brings to light additional properties, including the number of redundant paraphrases generated, that future approaches and evaluations may want to consider more carefully. [sent-32, score-0.851]

21 2 Related Work Instead of exhaustively covering the entire spectrum of previously proposed paraphrasing techniques, our evaluation focuses on two families ofdata-driven approaches that are widely studied and used. [sent-33, score-0.35]

22 More comprehensive surveys of data-driven paraphrasing techniques can be found in Androutsopoulos and Malakasiotis (2010) and Madnani and Dorr (2010). [sent-34, score-0.116]

23 The first family of approaches that we consider harvests paraphrases from monolingual corpora using distributional similarity. [sent-35, score-0.683]

24 The DIRT algorithm, proposed by Lin and Pantel (2001), uses parse tree paths as contexts for computing distributional similarity. [sent-36, score-0.132]

25 In this way, two phrases were considered similar if they occurred in similar contexts within many sentences. [sent-37, score-0.128]

26 Given the simplicity of the approach, the authors were able to harvest paraphrases from a very large collection of news articles. [sent-40, score-0.58]

27 Bhagat and Ravichandran (2008) proposed a similar approach that used noun phrase chunks as contexts and locality sensitive hashing to reduce the dimensionality of the context vectors. [sent-41, score-0.246]

28 Despite their simplicity, such techniques are susceptible to a number of issues stemming from the distributional assumption. [sent-42, score-0.09]

29 For example, such approaches have a propensity to assign large scores to antonyms and other semantically irrelevant phrases. [sent-43, score-0.105]

30 The second line of research uses comparable or bilingual corpora as the ‘pivot’ that binds paraphrases together (Barzilay and McKeown, 2001 ; Barzilay and Lee, 2003; Bannard and Callison547 Burch, 2005; Callison-Burch, 2008; Pang et al. [sent-44, score-0.577]

31 , 2009) uses a generalization of DIRTstyle patterns to generate paraphrases from a bilingual parallel corpus. [sent-49, score-0.59]

32 The primary drawback of these type of approaches is that they require a considerable amount of resource engineering that may not be available for all languages, domains, or applications. [sent-50, score-0.105]

33 3 Experimental Evaluation The goal of our experimental evaluation is to analyze the effectiveness of a variety of paraphrase generation techniques, ranging from simple to sophisticated. [sent-51, score-0.49]

34 Our evaluation focuses on generating paraphrases for verb phrases, which tend to exhibit more variation than other types of phrases. [sent-52, score-0.684]

35 Furthermore, our interest in paraphrase generation was initially inspired by challenges encountered during research related to machine reading (Barker et al. [sent-53, score-0.453]

36 Information extraction systems, which are key component ofmachine reading systems, can use paraphrase technology to automatically expand seed sets of relation triggers, which are commonly verb phrases. [sent-55, score-0.519]

37 1 Systems Our evaluation compares the effectiveness of the following paraphrase harvesting approaches: PD: The basic distributional similarity-inspired approach proposed by Pasca and Dienes (2005) that uses variable-length n-gram contexts and overlap-based scoring. [sent-57, score-0.62]

38 The context of a phrase is defined as the concatenation of the n-grams immediately to the left and right of the phrase. [sent-58, score-0.049]

39 BR: The distributional similarity approach proposed by Bhagat and Ravichandran (2008) that uses noun phrase chunks as contexts and locality sensitive hashing to reduce the dimensionality of the contextual vectors. [sent-61, score-0.336]

40 BCB-S: An extension of the Bannard CallisonBurch (Bannard and Callison-Burch, 2005) approach that constrains the paraphrases to have the same syntactic type as the original phrase (CallisonBurch, 2008). [sent-62, score-0.537]

41 We chose these three particular systems because they span the spectrum of paraphrase approaches, in that the PD approach is simple and does not rely on any NLP resources while the BCB-S approach is sophisticated and makes heavy use of NLP resources. [sent-64, score-0.586]

42 For the two distributional similarity approaches (PD and BR), paraphrases were harvested from the English Gigaword Fourth Edition corpus and scored using the cosine similarity between PMI weighted contextual vectors. [sent-65, score-0.683]

43 2 Evaluation Methodology We randomly sampled 50 verb phrases from 1000 news articles about terrorism and another 50 verb phrases from 500 news articles about American football. [sent-68, score-0.396]

44 Individual occurrences of verb phrases were sampled, which means that more common verb phrases were more likely to be selected and that a given phrase could be selected multiple times. [sent-69, score-0.445]

45 To obtain a richer class of phrases beyond basic verb groups, we defined verb phrases to be contiguous sequences of tokens that matched the following POS tag pattern: (TO | IN | RB | MD | VB)+. [sent-71, score-0.396]

46 Following the methodology used in previous paraphrase evaluations (Bannard and CallisonBurch, 2005; Callison-Burch, 2008; Kok and Brockett, 2010), we presented annotators with two sentences. [sent-72, score-0.495]

47 The first sentence was randomly selected from amongst all of the sentences in the evaluation corpus that contain the original phrase. [sent-73, score-0.113]

48 The second sentence was the same as the first, except the original phrase is replaced with the system generated paraphrase. [sent-74, score-0.049]

49 548 grammatically incorrect; and 2) Same meaning; revised is grammatically correct. [sent-79, score-0.163]

50 For each paraphrase system, we retrieve (up to) 10 paraphrases for each phrase in the evaluation set. [sent-82, score-0.944]

51 We automatically rejected any HITs where the worker failed either ofthese hidden tests. [sent-86, score-0.125]

52 We also rejected all work from annotators who failed at least 25% of their hidden tests. [sent-87, score-0.166]

53 We rejected 65% of the annotations based on the hidden test filtering just described, leaving 18,150 annotations for our evaluation. [sent-89, score-0.179]

54 The raw agreement of the annotators (after filtering) was 77% and the Fleiss’ Kappa was 0. [sent-91, score-0.072]

55 The systems were evaluated in terms of coverage and expected precision at k. [sent-93, score-0.177]

56 Coverage is defined as the percentage of phrases for which the system returned at least one paraphrase. [sent-94, score-0.176]

57 Expected precision at k is the expected number of correct paraphrases amongst the top k returned, and is computed as: E[p@k] =k1Xi=k1pi where pi is the proportion of positive annotations for item i. [sent-95, score-0.688]

58 When computing the mean expected precision over a set of input phrases, only those phrases that generate one or more paraphrases is considered in the mean. [sent-96, score-0.666]

59 Hence, if precision were to be averaged over all 100 phrases, then systems with poor coverage would perform significantly worse. [sent-97, score-0.14]

60 Thus, one should take a holistic view of the results, rather than focus on coverage or precision in isolation, but consider them, and their respective tradeoffs, together. [sent-98, score-0.14]

61 generated paraphrase filled in (denoted by the bold text). [sent-99, score-0.395]

62 The lenient criterion allows for grammatical errors in the paraphrased sentence, while the strict criterion does not. [sent-108, score-0.485]

63 For this evaluation, all 100 verb phrases were run through each system. [sent-111, score-0.198]

64 The paraphrases returned by the systems were then ranked (ordered) in descending order of their score, thus placing the highest scoring item at rank 1. [sent-112, score-0.578]

65 As expected, the results show that the systems perform significantly worse under the strict evaluation criteria, which requires the paraphrased sentences to be grammatically correct. [sent-114, score-0.288]

66 None of the approaches tested used any information from the evaluation sentences (other than the fact a verb phrase was to be filled in). [sent-115, score-0.35]

67 Recent work showed that using language models and/or syntactic clues from the evaluation sentence can improve the grammaticality of the paraphrased sentences (Callison-Burch, 549 TableMB3C:ePtDBhREo-xSdpect. [sent-116, score-0.12]

68 n10s5 iderng redundancy under lenient and strict evaluation criteria. [sent-122, score-0.495]

69 Such approaches could likely be used to improve the quality of all of the approaches under the strict evaluation criteria. [sent-124, score-0.366]

70 In terms of coverage, the distributional similarity approaches performed the best. [sent-125, score-0.195]

71 In another set of experiments, we used the PD method to harvest paraphrases from a large Web corpus, and found that the coverage was 98%. [sent-126, score-0.626]

72 Achieving similar coverage with resource-dependent approaches would likely require more human and machine effort. [sent-127, score-0.19]

73 4 Redundancy After manually inspecting the results returned by the various paraphrase systems, we noticed that some approaches returned highly redundant paraphrases that were of limited practical use. [sent-129, score-1.289]

74 For example, for the phrase “were losing”, the BR system returned “are losing”, “have been losing”, “have lost”, “lose”, “might lose”, “had lost”, “stand to lose”, “who have lost” and “would lose” within the top 10 paraphrases. [sent-130, score-0.139]

75 All of these are simple variants that contain different forms of the verb “lose”. [sent-131, score-0.112]

76 Under the lenient evaluation criterion almost all of these paraphrases would be marked as correct, since the same verb is being returned with some grammatical modifications. [sent-132, score-1.003]

77 While highly redundant output of this form may be useful for some tasks, for others (such as information extraction) is it more useful to identify paraphrases that contain a diverse, nonredundant set of verbs. [sent-133, score-0.603]

78 Therefore, we carried out another evaluation aimed at penalizing highly redundant outputs. [sent-134, score-0.163]

79 For each approach, we manually identified all of the paraphrases that contained the same verb as the main verb in the original phrase. [sent-135, score-0.712]

80 During evalua- tion, these “redundant” paraphrases were regarded as non-related. [sent-136, score-0.488]

81 The results are dramatically different compared to those in Table 2, suggesting that evaluations that do not consider this type of redundancy may over-estimate actual system quality. [sent-138, score-0.209]

82 The percentage of results marked as redundant for the BCB-S, BR, and PD approaches were 22. [sent-139, score-0.22]

83 Thus, the BR system, which appeared to have excellent (lenient) precision in our initial evaluation, returns a very large number of redundant paraphrases. [sent-143, score-0.17]

84 The BCBS and PD approaches return a comparable number of redundant results. [sent-147, score-0.22]

85 As with our previous evaluation, the BCB-S approach tends to perform better under the lenient evaluation, while PD is better under the strict evaluation. [sent-148, score-0.333]

86 Estimated 95% confidence intervals show all differences between BCB-S and PD are statistically significant, except for lenient P10. [sent-149, score-0.225]

87 Of course, existing paraphrasing approaches do not explicitly account for redundancy, and hence this evaluation is not completely fair. [sent-150, score-0.269]

88 However, these findings suggest that redundancy may be an important issue to consider when developing and evaluating data-driven paraphrase approaches. [sent-151, score-0.473]

89 There are likely other characteristics, beyond redundancy, that may also be important for developing robust, effective paraphrasing techniques. [sent-152, score-0.116]

90 5 Discussion In all of our evaluations, we found that the simple approaches are surprisingly effective in terms of pre550 cision, coverage, and redundancy, making them a reasonable choice for an “out of the box” approach for this particular task. [sent-155, score-0.105]

91 However, additional taskdependent comparative evaluations are necessary to develop even deeper insights into the pros and cons of the different types of approaches. [sent-156, score-0.095]

92 From a high level perspective, it is also important to note that the precision of these widely used, commonly studied paraphrase generation approaches is still extremely poor. [sent-157, score-0.565]

93 After accounting for redun- dancy, the best approaches achieve a precision at 1 of less than 20% using the strict criteria and less than 26% when using the lenient criteria. [sent-158, score-0.529]

94 4 Conclusions and Future Work This paper examined the tradeoffs between simple paraphrasing approaches that do not make use of any NLP resources and more sophisticated approaches that use a variety of such resources. [sent-160, score-0.526]

95 Our evaluation demonstrated that simple harvesting approaches fare well against more sophisticated approaches, achieving state-of-the-art precision, good coverage, and relatively low redundancy. [sent-161, score-0.381]

96 In the future, we would like to see more empirical evaluations and detailed studies comparing the practical merits of various paraphrase generation techniques. [sent-162, score-0.59]

97 As Madnani and Dorr (Madnani and Dorr, 2010) suggested, it would be beneficial to the research community to develop a standard, shared evaluation that would act to catalyze further advances and encourage more meaningful compara- tive evaluations of such approaches moving forward. [sent-163, score-0.248]

98 Large scale acquisition of paraphrases for learning surface patterns. [sent-197, score-0.488]

99 Syntactic constraints on paraphrases extracted from parallel corpora. [sent-202, score-0.531]

100 Syntax-based alignment of multiple translations: extracting paraphrases and generating new sentences. [sent-239, score-0.524]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('paraphrases', 0.488), ('paraphrase', 0.359), ('lenient', 0.225), ('pd', 0.2), ('bannard', 0.151), ('lose', 0.12), ('paraphrasing', 0.116), ('redundant', 0.115), ('redundancy', 0.114), ('verb', 0.112), ('strict', 0.108), ('br', 0.108), ('approaches', 0.105), ('pasca', 0.1), ('evaluations', 0.095), ('sophisticated', 0.091), ('returned', 0.09), ('distributional', 0.09), ('bhagat', 0.089), ('madnani', 0.086), ('phrases', 0.086), ('coverage', 0.085), ('barzilay', 0.084), ('dienes', 0.084), ('spectrum', 0.081), ('harvesting', 0.081), ('kok', 0.079), ('callisonburch', 0.072), ('paraphrased', 0.072), ('tradeoffs', 0.072), ('spice', 0.069), ('losing', 0.069), ('amongst', 0.065), ('nj', 0.064), ('rejected', 0.061), ('southern', 0.061), ('grammatically', 0.06), ('marina', 0.06), ('morristown', 0.059), ('bilingual', 0.059), ('ravichandran', 0.058), ('androutsopoulos', 0.056), ('barker', 0.056), ('fare', 0.056), ('precision', 0.055), ('rely', 0.055), ('rey', 0.054), ('harvest', 0.053), ('lost', 0.05), ('landis', 0.05), ('turk', 0.05), ('phrase', 0.049), ('evaluation', 0.048), ('del', 0.048), ('fleiss', 0.048), ('merits', 0.048), ('reading', 0.048), ('generation', 0.046), ('mechanical', 0.046), ('hashing', 0.044), ('zhao', 0.044), ('parallel', 0.043), ('pang', 0.043), ('revised', 0.043), ('annotations', 0.043), ('practical', 0.042), ('contexts', 0.042), ('dorr', 0.042), ('annotators', 0.041), ('hovy', 0.041), ('criterion', 0.04), ('locality', 0.04), ('afrl', 0.04), ('simplicity', 0.039), ('dimensionality', 0.039), ('brockett', 0.039), ('perspective', 0.038), ('nlp', 0.038), ('expected', 0.037), ('variety', 0.037), ('criteria', 0.036), ('filled', 0.036), ('generating', 0.036), ('life', 0.036), ('strengths', 0.036), ('hit', 0.035), ('sciences', 0.034), ('failed', 0.032), ('california', 0.032), ('meaning', 0.032), ('chunks', 0.032), ('hidden', 0.032), ('attention', 0.031), ('agreement', 0.031), ('eduard', 0.031), ('pantel', 0.031), ('binds', 0.03), ('boring', 0.03), ('ccb', 0.03), ('chunliang', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

Author: Donald Metzler ; Eduard Hovy ; Chunliang Zhang

Abstract: Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.

2 0.45363155 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs

Author: Houda Bouamor ; Aurelien Max ; Anne Vilnat

Abstract: In this paper, we present a novel way of tackling the monolingual alignment problem on pairs of sentential paraphrases by means of edit rate computation. In order to inform the edit rate, information in the form of subsentential paraphrases is provided by a range of techniques built for different purposes. We show that the tunable TER-PLUS metric from Machine Translation evaluation can achieve good performance on this task and that it can effectively exploit information coming from complementary sources.

3 0.42666358 132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web

Author: Chikara Hashimoto ; Kentaro Torisawa ; Stijn De Saeger ; Jun'ichi Kazama ; Sadao Kurohashi

Abstract: ¶ kuro@i . We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. Experimental results indicated that with our method it was possible to extract about 300,000 paraphrases from 6 Web docu3m0e0n,t0s0 w0i ptha a precision oramte 6 6o ×f a 1b0out 94%. 108

4 0.36112565 72 acl-2011-Collecting Highly Parallel Data for Paraphrase Evaluation

Author: David Chen ; William Dolan

Abstract: A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates. In addition to being simple and efficient to compute, experiments show that these metrics correlate highly with human judgments.

5 0.25547624 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico

Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.

6 0.1713247 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

7 0.10820309 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

8 0.096445903 333 acl-2011-Web-Scale Features for Full-Scale Parsing

9 0.087808564 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style

10 0.084812306 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

11 0.071550034 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

12 0.071458749 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

13 0.069938272 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction

14 0.068367928 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

15 0.065971673 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents

16 0.064811386 174 acl-2011-Insights from Network Structure for Text Mining

17 0.063850254 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

18 0.063142933 302 acl-2011-They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems

19 0.060858201 47 acl-2011-Automatic Assessment of Coverage Quality in Intelligence Reports

20 0.056437191 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.206), (1, -0.03), (2, -0.01), (3, 0.13), (4, -0.012), (5, 0.038), (6, 0.232), (7, 0.007), (8, 0.027), (9, -0.478), (10, -0.055), (11, 0.292), (12, 0.003), (13, 0.031), (14, 0.143), (15, 0.003), (16, -0.12), (17, 0.088), (18, -0.002), (19, 0.109), (20, -0.062), (21, -0.027), (22, 0.073), (23, 0.012), (24, 0.006), (25, 0.01), (26, 0.086), (27, 0.004), (28, -0.008), (29, -0.089), (30, -0.034), (31, 0.022), (32, -0.034), (33, 0.08), (34, 0.053), (35, 0.005), (36, 0.004), (37, -0.011), (38, -0.006), (39, -0.056), (40, -0.008), (41, -0.074), (42, -0.066), (43, 0.012), (44, 0.014), (45, -0.043), (46, -0.037), (47, 0.008), (48, -0.044), (49, -0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95002502 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

Author: Donald Metzler ; Eduard Hovy ; Chunliang Zhang

Abstract: Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.

2 0.9153446 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs

Author: Houda Bouamor ; Aurelien Max ; Anne Vilnat

Abstract: In this paper, we present a novel way of tackling the monolingual alignment problem on pairs of sentential paraphrases by means of edit rate computation. In order to inform the edit rate, information in the form of subsentential paraphrases is provided by a range of techniques built for different purposes. We show that the tunable TER-PLUS metric from Machine Translation evaluation can achieve good performance on this task and that it can effectively exploit information coming from complementary sources.

3 0.89461815 132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web

Author: Chikara Hashimoto ; Kentaro Torisawa ; Stijn De Saeger ; Jun'ichi Kazama ; Sadao Kurohashi

Abstract: ¶ kuro@i . We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. Experimental results indicated that with our method it was possible to extract about 300,000 paraphrases from 6 Web docu3m0e0n,t0s0 w0i ptha a precision oramte 6 6o ×f a 1b0out 94%. 108

4 0.79571968 72 acl-2011-Collecting Highly Parallel Data for Paraphrase Evaluation

Author: David Chen ; William Dolan

Abstract: A lack of standard datasets and evaluation metrics has prevented the field of paraphrasing from making the kind of rapid progress enjoyed by the machine translation community over the last 15 years. We address both problems by presenting a novel data collection framework that produces highly parallel text data relatively inexpensively and on a large scale. The highly parallel nature of this data allows us to use simple n-gram comparisons to measure both the semantic adequacy and lexical dissimilarity of paraphrase candidates. In addition to being simple and efficient to compute, experiments show that these metrics correlate highly with human judgments.

5 0.63699561 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico

Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.

6 0.47539124 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

7 0.39635298 306 acl-2011-Towards Style Transformation from Written-Style to Audio-Style

8 0.3499763 74 acl-2011-Combining Indicators of Allophony

9 0.34494689 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

10 0.29934222 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

11 0.27909121 231 acl-2011-Nonlinear Evidence Fusion and Propagation for Hyponymy Relation Mining

12 0.27537745 333 acl-2011-Web-Scale Features for Full-Scale Parsing

13 0.26484743 297 acl-2011-That's What She Said: Double Entendre Identification

14 0.26292542 115 acl-2011-Engkoo: Mining the Web for Language Learning

15 0.25799811 321 acl-2011-Unsupervised Discovery of Rhyme Schemes

16 0.25472662 302 acl-2011-They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems

17 0.25348598 99 acl-2011-Discrete vs. Continuous Rating Scales for Language Evaluation in NLP

18 0.24819967 252 acl-2011-Prototyping virtual instructors from human-human corpora

19 0.24392155 288 acl-2011-Subjective Natural Language Problems: Motivations, Applications, Characterizations, and Implications

20 0.24348146 320 acl-2011-Unsupervised Discovery of Domain-Specific Knowledge from Text


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.032), (17, 0.061), (26, 0.028), (31, 0.018), (37, 0.065), (39, 0.042), (41, 0.078), (53, 0.068), (55, 0.035), (59, 0.073), (72, 0.04), (73, 0.174), (91, 0.06), (96, 0.159)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86847353 160 acl-2011-Identifying Sarcasm in Twitter: A Closer Look

Author: Roberto Gonzalez-Ibanez ; Smaranda Muresan ; Nina Wacholder

Abstract: Sarcasm transforms the polarity of an apparently positive or negative utterance into its opposite. We report on a method for constructing a corpus of sarcastic Twitter messages in which determination of the sarcasm of each message has been made by its author. We use this reliable corpus to compare sarcastic utterances in Twitter to utterances that express positive or negative attitudes without sarcasm. We investigate the impact of lexical and pragmatic factors on machine learning effectiveness for identifying sarcastic utterances and we compare the performance of machine learning techniques and human judges on this task. Perhaps unsurprisingly, neither the human judges nor the machine learning techniques perform very well. 1

same-paper 2 0.85216248 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques

Author: Donald Metzler ; Eduard Hovy ; Chunliang Zhang

Abstract: Paraphrase generation is an important task that has received a great deal of interest recently. Proposed data-driven solutions to the problem have ranged from simple approaches that make minimal use of NLP tools to more complex approaches that rely on numerous language-dependent resources. Despite all of the attention, there have been very few direct empirical evaluations comparing the merits of the different approaches. This paper empirically examines the tradeoffs between simple and sophisticated paraphrase harvesting approaches to help shed light on their strengths and weaknesses. Our evaluation reveals that very simple approaches fare surprisingly well and have a number of distinct advantages, including strong precision, good coverage, and low redundancy.

3 0.82544196 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

Author: Preslav Nakov ; Hwee Tou Ng

Abstract: We propose a novel approach to translating from a morphologically complex language. Unlike previous research, which has targeted word inflections and concatenations, we focus on the pairwise relationship between morphologically related words, which we treat as potential paraphrases and handle using paraphrasing techniques at the word, phrase, and sentence level. An important advantage of this framework is that it can cope with derivational morphology, which has so far remained largely beyond the capabilities of statistical machine translation systems. Our experiments translating from Malay, whose morphology is mostly derivational, into English show signif- icant improvements over rivaling approaches based on five automatic evaluation measures (for 320,000 sentence pairs; 9.5 million English word tokens).

4 0.82107532 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

Author: Nathan Bodenstab ; Aaron Dunlop ; Keith Hall ; Brian Roark

Abstract: Efficient decoding for syntactic parsing has become a necessary research area as statistical grammars grow in accuracy and size and as more NLP applications leverage syntactic analyses. We review prior methods for pruning and then present a new framework that unifies their strengths into a single approach. Using a log linear model, we learn the optimal beam-search pruning parameters for each CYK chart cell, effectively predicting the most promising areas of the model space to explore. We demonstrate that our method is faster than coarse-to-fine pruning, exemplified in both the Charniak and Berkeley parsers, by empirically comparing our parser to the Berkeley parser using the same grammar and under identical operating conditions.

5 0.76925004 132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web

Author: Chikara Hashimoto ; Kentaro Torisawa ; Stijn De Saeger ; Jun'ichi Kazama ; Sadao Kurohashi

Abstract: ¶ kuro@i . We propose an automatic method of extracting paraphrases from definition sentences, which are also automatically acquired from the Web. We observe that a huge number of concepts are defined in Web documents, and that the sentences that define the same concept tend to convey mostly the same information using different expressions and thus contain many paraphrases. We show that a large number of paraphrases can be automatically extracted with high precision by regarding the sentences that define the same concept as parallel corpora. Experimental results indicated that with our method it was possible to extract about 300,000 paraphrases from 6 Web docu3m0e0n,t0s0 w0i ptha a precision oramte 6 6o ×f a 1b0out 94%. 108

6 0.76614499 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

7 0.76456445 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

8 0.75912619 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections

9 0.75571108 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

10 0.75441313 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

11 0.75304109 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs

12 0.75221133 66 acl-2011-Chinese sentence segmentation as comma classification

13 0.75041151 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

14 0.74843931 159 acl-2011-Identifying Noun Product Features that Imply Opinions

15 0.74713933 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling

16 0.74652267 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters

17 0.74633706 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

18 0.74606538 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

19 0.74585736 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

20 0.74501741 274 acl-2011-Semi-Supervised Frame-Semantic Parsing for Unknown Predicates