acl acl2012 acl2012-125 knowledge-graph by maker-knowledge-mining

125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation


Source: pdf

Author: Hong Sun ; Ming Zhou

Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. [sent-3, score-0.831]

2 The resulting sentences can be used as candidate paraphrases ofthe source sentence. [sent-4, score-0.192]

3 Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. [sent-5, score-0.691]

4 Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. [sent-6, score-0.707]

5 In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. [sent-7, score-0.741]

6 In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. [sent-8, score-1.037]

7 Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity. [sent-9, score-0.901]

8 1 Introduction Paraphrasing (at word, phrase, and sentence levels) is a procedure for generating alternative expressions with an identical or similar meaning to the original text. [sent-10, score-0.08]

9 Paraphrasing technology has been applied in many NLP applications, such as machine translation (MT), question answering (QA), and natural language generation (NLG). [sent-11, score-0.186]

10 1This work has been done while the author crosoft Research Asia. [sent-12, score-0.023]

11 com As paraphrasing can be viewed as a translation process between the original expression (as input) and the paraphrase results (as output), both in the same language, statistical machine translation (SMT) has been used for this task. [sent-14, score-1.099]

12 (2004) build a monolingual translation system using a corpus of sentence pairs extracted from news articles describing same events. [sent-16, score-0.179]

13 , thesaurus) and further extend the method by generating different paraphrase in different applications (Zhao et al. [sent-20, score-0.651]

14 Performance of the monolingual MT-based method in paraphrase generation is limited by the large-scale paraphrase corpus it relies on as the corpus is not readily available (Zhao et al. [sent-22, score-1.35]

15 In contrast, bilingual parallel data is in abundance and has been used in extracting paraphrase (Bannard and Callison-Burch, 2005; Zhao et al. [sent-24, score-0.717]

16 Thus researchers leverage bilingual parallel data for this task and apply two SMT systems (dual SMT system) to translate the original sentences into another pivot language and then translate them back into the original language. [sent-28, score-0.372]

17 For question expansion, Dubou ´e and Chu-Carroll (2006) paraphrase the questions with multiple MT engines and select the best paraphrase result considering cosine distance, length, etc. [sent-29, score-1.324]

18 Max (2009) generates paraphrase for a given segment by forcing the segment being translated independently in both of the translation processes. [sent-30, score-0.759]

19 Context features are added into the SMT system to improve translation correctness against polysemous. [sent-31, score-0.108]

20 (2010) propose combining the results of multiple machine translation engines’ by performing MBR (Minimum Bayes Risk) (Kumar and Byrne, 2004) decoding on the N-best translation candidates. [sent-35, score-0.216]

21 The work presented in this paper belongs to the pivot language method for paraphrase generation. [sent-36, score-0.862]

22 Previous work employs two separately trained SMT systems the parameters of which are tuned for SMT scheme and therefore cannot directly optimize the paraphrase purposes, for example, optimize the diversity against the input. [sent-37, score-0.808]

23 Another problem comes from the contradiction between two criteria in paraphrase generation: adequacy measuring the semantic equivalency and paraphrase rate mea- suring the surface dissimilarity. [sent-38, score-1.691]

24 As they are incompatible (Zhao and Wang, 2010), the question arises how to adapt between them to fit different application scenarios. [sent-39, score-0.128]

25 To address these issues, in this paper, we propose a joint learning method of two SMT systems for paraphrase generation. [sent-40, score-0.701]

26 The jointly-learned dual SMT system: (1) Adapts the SMT systems so that they are tuned specifically for paraphrase generation purposes, e. [sent-41, score-0.796]

27 , to increase the dissimilarity; (2) Employs a revised BLEU score (named iBLEU, as it’s an input-aware BLEU metric) that measures adequacy and dissimilarity of the paraphrase results at the same time. [sent-43, score-1.111]

28 With both automatic and human evaluations, the results show that the proposed method effectively balance between adequacy and dissimilarity. [sent-45, score-0.301]

29 2 Paraphrasing with a Dual SMT System We focus on sentence level paraphrasing and leverage homogeneous machine translation systems for this task bi-directionally. [sent-46, score-0.467]

30 Generating sentential paraphrase with the SMT system is done by first trans- lating a source sentence into another pivot language, and then back into the source. [sent-47, score-1.008]

31 Here, we call these two procedures a dual SMT system. [sent-48, score-0.1]

32 Given an English sentence es, there could be n candidate translations in another language F, each translation could have m candidates } which may contain potential paraphrases fidora es. [sent-49, score-0.305]

33 Oeu}r w tahsickh i sm atoy l coocnattaei nth peo cteanntdiaildate that best fit in the demands of paraphrasing. [sent-50, score-0.039]

34 1 Joint Inference of Dual SMT System During the translation process, it is needed to select a translation from the hypothesis based on the quality of the candidates. [sent-52, score-0.255]

35 Each candidate’s quality can be expressed by log-linear model considering different SMT features such as translation model and language model. [sent-53, score-0.147]

36 t is an indicator function equals to 1when e0 is translated from f and 0 otherwise. [sent-55, score-0.028]

37 S is the development set for training the parameters and for each source sentence several human translations rs are listed as references. [sent-59, score-0.197]

38 2 Paraphrase Evaluation Metrics The joint inference method with MERT enables the dual SMT system to be optimized towards the quality of paraphrasing results. [sent-61, score-0.471]

39 Different application scenarios of paraphrase have different demands on the paraphrasing results and up to now, the widely mentioned criteria include (Zhao et al. [sent-62, score-0.992]

40 However, as pointed out by (Chen and Dolan, 2011), there is the lack of automatic metric that is capable to measure all the three criteria in paraphrase generation. [sent-67, score-0.71]

41 Two issues are also raised in (Zhao and Wang, 2010) about using automatic metrics: paraphrase changes less gets larger BLEU score and the evaluations of paraphrase quality and rate tend to be incompatible. [sent-68, score-1.401]

42 To address the above problems, we propose a metric for tuning parameters and evaluating the quality of each candidate paraphrase c : iBLEU(s, rs, c) = αBLEU(c, rs) −(1 − α)BLEU(c, s) (3) where s is the input sentence, rs represents the reference paraphrases. [sent-69, score-0.838]

43 BLEU(c, rs) captures the semantic equivalency between the candidates and the references (Finch et al. [sent-70, score-0.131]

44 (2005) have shown the capability for measuring semantic equivalency using BLEU score); BLEU(c, s) is the BLEU score computed between the candidate and the source sentence to measure the dissimilarity. [sent-71, score-0.273]

45 α is a parameter taking balance between adequacy and dissimilarity, smaller α value indicates larger punishment on selfparaphrase. [sent-72, score-0.299]

46 Fluency is not explicitly presented because there is high correlation between fluency and adequacy (Zhao et al. [sent-73, score-0.324]

47 By using iBLEU, we aim at adapting paraphrasing performance to different application needs by adjusting α value. [sent-75, score-0.338]

48 1 Experiment Setup For English sentence paraphrasing task, we utilize Chinese as the pivot language, our experiments are built on English and Chinese bi-directional translation. [sent-77, score-0.511]

49 NIST Chinese-to-English evaluation data offers four English human translations for every Chinese sentence. [sent-79, score-0.023]

50 For each sentence pair, we choose one English sentence e1 as source and use the three left sentences e2, e3 and e4 as references. [sent-80, score-0.147]

51 The English-Chinese and Chinese-English systems are built on bilingual parallel corpus contain40 TaJboliαNne t=1 le:Joai01rn. [sent-81, score-0.091]

52 Language model is trained on 2,007,955 sentences for Chinese and 8,681,899 sentences for English. [sent-87, score-0.06]

53 10-best lists are used in both of the translation processes. [sent-89, score-0.108]

54 2 Paraphrase Evaluation Results The results of paraphrasing are illustrated in Table 1. [sent-91, score-0.257]

55 We show the BLEU score (computed against references) to measure the adequacy and self-BLEU (computed against source sentence) to evaluate the dissimilarity (lower is better). [sent-92, score-0.46]

56 By “No Joint”, it means two independently trained SMT systems are employed in translating sentences from English to Chinese and then back into English. [sent-93, score-0.141]

57 This result is listed to indicate the performance when we do not involve joint learning to control the quality of paraphrase results. [sent-94, score-0.715]

58 From the results we can see that, when the value of α decreases to address more penalty on selfparaphrase, the self-BLEU score rapidly decays while the consequence effect is that BLEU score computed against references also drops seriously. [sent-97, score-0.14]

59 6 we observe the sentences become completely incomprehensible (this is the reason why we leave out showing the results of α under 0. [sent-99, score-0.083]

60 The best balance is achieved when α is between 0. [sent-101, score-0.049]

61 9, where both of the sentence quality and variety are relatively preserved. [sent-103, score-0.082]

62 As α value is manually defined and not specially tuned, the exper- Table 3: Example of the Paraphrase Results iments only achieve comparable results with no joint learning when α equals 0. [sent-104, score-0.101]

63 However, the results show that our method is able to effectively control the self-paraphrase rate and lower down the score of self-BLEU, this is done by both of the process of joint learning and introducing the metric of iBLEU to avoid trivial self-paraphrase. [sent-106, score-0.184]

64 It is not capable with no joint learning or with the traditional BLEU score does not take self-paraphrase into consideration. [sent-107, score-0.084]

65 We randomly choose 100 sentences from testing data. [sent-109, score-0.03]

66 For each setting, two annotators are asked to give scores about semantic adequacy, fluency, variety and overall quality. [sent-110, score-0.025]

67 The scales are 0 (meaning changed; incomprehensible; almost same; cannot be used), 1 (almost same meaning; little flaws; containing different words; may be useful) and 2 (same meaning; good sentence; different sentential form; could be used). [sent-111, score-0.057]

68 From the results we can see that human evaluations are quite consistent with the automatic evaluation, where higher BLEU scores correspond to larger number of good adequacy and fluency labels, and higher self-BLEU results tend to get lower human evaluations over dissimilarity. [sent-117, score-0.404]

69 In our observation, we found that adequacy and fluency are relatively easy to be kept especially for short sentences. [sent-118, score-0.349]

70 This is because the translation tables are used bi-directionally so lots of source sentences’ fragments present in the paraphrasing results. [sent-120, score-0.396]

71 We show an example of the paraphrase results under different settings. [sent-121, score-0.626]

72 All the results’ sentential 41 forms are not changed comparing with the input sentence and also well-formed. [sent-122, score-0.1]

73 This is due to the short length of the source sentence. [sent-123, score-0.031]

74 Also, with smaller value of α, more variations show up in the paraphrase results. [sent-124, score-0.649]

75 1 SMT Systems and Pivot Languages We have test our method by using homogeneous SMT systems and a single pivot language. [sent-126, score-0.295]

76 As the method highly depends on machine translation, a natural question arises to what is the impact when using different pivots or SMT systems. [sent-127, score-0.089]

77 The joint learning method works by combining both of the processes to concentrate on the final objective so it is not affected by the selection of language or SMT model. [sent-128, score-0.102]

78 In addition, our method is not limited to a homogeneous SMT model or a single pivot language. [sent-129, score-0.295]

79 As long as the models’ translation candidates can be scored with a log-linear model, the joint learning process can tune the parameters at the same time. [sent-130, score-0.185]

80 When dealing with multiple pivot languages or heterogeneous SMT systems, our method will take effect by optimizing parameters from both the forward and backward translation processes, together with the final combination feature vector, to get optimal paraphrase results. [sent-131, score-0.997]

81 The first part of iBLEU, which is the traditional BLEU score, helps to ensure the quality of the machine translation results. [sent-134, score-0.147]

82 Further, it also helps to keep the semantic equivalency. [sent-135, score-0.025]

83 These two roles unify the goals of optimizing translation and paraphrase adequacy in the training process. [sent-136, score-0.961]

84 Another contribution from iBLEU is its ability to balance between adequacy and dissimilarity as the two aspects in paraphrasing are incompatible (Zhao and Wang, 2010). [sent-137, score-0.74]

85 This is not difficult to explain because when we change many words, the meaning and the sentence quality are hard to preserve. [sent-138, score-0.119]

86 As the paraphrasing task is not self-contained and will be employed by different applications, the two mea- sures should be given different priorities based on the application scenario. [sent-139, score-0.305]

87 Lower α value is preferred but should be kept in a certain range as significant change may lead to the loss of constraints presented in the original sentence. [sent-141, score-0.048]

88 The advantage of the proposed method is reflected in its ability to adapt to different application requirements by adjusting the value of α in a reasonable range. [sent-142, score-0.129]

89 5 Conclusion We propose a joint learning method for pivot language-based paraphrase generation. [sent-143, score-0.912]

90 The jointly learned dual SMT system which combines the training processes of two SMT systems in paraphrase generation, enables optimization of the final paraphrase quality. [sent-144, score-1.379]

91 Furthermore, a revised BLEU score that balances between paraphrase adequacy and dissimilarity is proposed in our training process. [sent-145, score-1.111]

92 In the future, we plan to go a step further to see whether we can enhance dissimilarity with penalizing phrase tables used in both of the translation processes. [sent-146, score-0.3]

93 Answering the question you wish they had asked: The impact of paraphrasing for question answering. [sent-167, score-0.323]

94 Using machine translation evaluation techniques to determine sentence-level semantic equivalence. [sent-171, score-0.133]

95 Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation. [sent-175, score-0.245]

96 Phrase clustering for smoothing tm probabilities - or, how to extract paraphrases from phrase tables. [sent-184, score-0.097]

97 In Proceedings of the 2009 Workshop on Applied Textual Inference, ACLIJCNLP, pages 18–26. [sent-198, score-0.033]

98 Pivot approach for extracting paraphrase patterns from bilingual corpora. [sent-223, score-0.669]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('paraphrase', 0.626), ('smt', 0.257), ('paraphrasing', 0.257), ('ibleu', 0.238), ('adequacy', 0.227), ('pivot', 0.211), ('dissimilarity', 0.168), ('zhao', 0.161), ('bleu', 0.135), ('shiqi', 0.115), ('translation', 0.108), ('equivalency', 0.106), ('dual', 0.1), ('fluency', 0.097), ('paraphrases', 0.097), ('rs', 0.073), ('es', 0.066), ('nist', 0.062), ('homogeneous', 0.059), ('ting', 0.057), ('sentential', 0.057), ('revised', 0.056), ('adjusting', 0.056), ('mert', 0.054), ('haifeng', 0.053), ('dubou', 0.053), ('incomprehensible', 0.053), ('sheng', 0.051), ('chris', 0.05), ('joint', 0.05), ('balance', 0.049), ('parallel', 0.048), ('translating', 0.046), ('generation', 0.045), ('criteria', 0.045), ('sentence', 0.043), ('bilingual', 0.043), ('kok', 0.042), ('back', 0.04), ('optimize', 0.04), ('evaluations', 0.04), ('engines', 0.039), ('quality', 0.039), ('metzler', 0.039), ('demands', 0.039), ('lan', 0.039), ('incompatible', 0.039), ('finch', 0.039), ('brockett', 0.039), ('metric', 0.039), ('bannard', 0.037), ('meaning', 0.037), ('rate', 0.036), ('score', 0.034), ('xiang', 0.034), ('ganitkevitch', 0.034), ('candidate', 0.034), ('pages', 0.033), ('question', 0.033), ('dolan', 0.032), ('kuhn', 0.032), ('arises', 0.031), ('william', 0.031), ('source', 0.031), ('kumar', 0.03), ('sentences', 0.03), ('mt', 0.029), ('qa', 0.028), ('monolingual', 0.028), ('wang', 0.028), ('quirk', 0.028), ('equals', 0.028), ('minimum', 0.027), ('processes', 0.027), ('chinese', 0.027), ('parameters', 0.027), ('ming', 0.026), ('drops', 0.026), ('employs', 0.026), ('chen', 0.025), ('kept', 0.025), ('independently', 0.025), ('application', 0.025), ('method', 0.025), ('semantic', 0.025), ('liu', 0.025), ('tuned', 0.025), ('diversity', 0.024), ('enhance', 0.024), ('translations', 0.023), ('value', 0.023), ('mhm', 0.023), ('napoles', 0.023), ('sures', 0.023), ('decays', 0.023), ('ky', 0.023), ('aagnde', 0.023), ('chunliang', 0.023), ('crosoft', 0.023), ('mbr', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

Author: Hong Sun ; Ming Zhou

Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.

2 0.56948429 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu

Abstract: unkown-abstract

3 0.19451284 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System

Author: MeiHua Chen ; ShihTing Huang ; HungTing Hsieh ; TingHui Kao ; Jason S. Chang

Abstract: Writing in English might be one of the most difficult tasks for EFL (English as a Foreign Language) learners. This paper presents FLOW, a writing assistance system. It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. 1.

4 0.17284854 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer

Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.

5 0.1633117 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

Author: Xiaodong He ; Li Deng

Abstract: This paper proposes a new discriminative training method in constructing phrase and lexicon translation models. In order to reliably learn a myriad of parameters in these models, we propose an expected BLEU score-based utility function with KL regularization as the objective, and train the models on a large parallel dataset. For training, we derive growth transformations for phrase and lexicon translation probabilities to iteratively improve the objective. The proposed method, evaluated on the Europarl German-to-English dataset, leads to a 1.1 BLEU point improvement over a state-of-the-art baseline translation system. In IWSLT 201 1 Benchmark, our system using the proposed method achieves the best Chinese-to-English translation result on the task of translating TED talks.

6 0.16162774 184 acl-2012-String Re-writing Kernel

7 0.12508619 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages

8 0.12300408 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

9 0.10762279 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information

10 0.10251807 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation

11 0.099504329 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

12 0.097481772 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

13 0.097476624 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

14 0.096960172 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

15 0.095148534 69 acl-2012-Deep Learning for NLP (without Magic)

16 0.093474954 140 acl-2012-Machine Translation without Words through Substring Alignment

17 0.092617683 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

18 0.092241324 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

19 0.088095337 199 acl-2012-Topic Models for Dynamic Translation Model Adaptation

20 0.084690258 131 acl-2012-Learning Translation Consensus with Structured Label Propagation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.217), (1, -0.216), (2, 0.137), (3, 0.073), (4, 0.117), (5, -0.026), (6, -0.035), (7, 0.134), (8, -0.099), (9, -0.015), (10, -0.052), (11, 0.203), (12, 0.019), (13, 0.24), (14, 0.163), (15, 0.237), (16, 0.083), (17, -0.224), (18, -0.31), (19, 0.214), (20, 0.171), (21, 0.082), (22, -0.185), (23, -0.097), (24, -0.026), (25, -0.024), (26, 0.095), (27, 0.055), (28, -0.039), (29, 0.018), (30, -0.001), (31, 0.038), (32, -0.035), (33, 0.074), (34, 0.033), (35, -0.07), (36, -0.025), (37, -0.021), (38, -0.042), (39, -0.017), (40, -0.061), (41, -0.016), (42, 0.017), (43, -0.015), (44, -0.006), (45, 0.001), (46, -0.019), (47, -0.017), (48, -0.049), (49, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95107156 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

Author: Hong Sun ; Ming Zhou

Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.

2 0.90265036 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu

Abstract: unkown-abstract

3 0.69254953 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System

Author: MeiHua Chen ; ShihTing Huang ; HungTing Hsieh ; TingHui Kao ; Jason S. Chang

Abstract: Writing in English might be one of the most difficult tasks for EFL (English as a Foreign Language) learners. This paper presents FLOW, a writing assistance system. It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. 1.

4 0.52954906 184 acl-2012-String Re-writing Kernel

Author: Fan Bu ; Hang Li ; Xiaoyan Zhu

Abstract: Learning for sentence re-writing is a fundamental task in natural language processing and information retrieval. In this paper, we propose a new class of kernel functions, referred to as string re-writing kernel, to address the problem. A string re-writing kernel measures the similarity between two pairs of strings, each pair representing re-writing of a string. It can capture the lexical and structural similarity between two pairs of sentences without the need of constructing syntactic trees. We further propose an instance of string rewriting kernel which can be computed efficiently. Experimental results on benchmark datasets show that our method can achieve better results than state-of-the-art methods on two sentence re-writing learning tasks: paraphrase identification and recognizing textual entailment.

5 0.49142373 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer

Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.

6 0.3774659 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation

7 0.36120903 54 acl-2012-Combining Word-Level and Character-Level Models for Machine Translation Between Closely-Related Languages

8 0.35539249 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

9 0.35245731 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

10 0.34568873 164 acl-2012-Private Access to Phrase Tables for Statistical Machine Translation

11 0.34491158 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

12 0.32843557 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

13 0.32478711 163 acl-2012-Prediction of Learning Curves in Machine Translation

14 0.31853333 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries

15 0.2854875 136 acl-2012-Learning to Translate with Multiple Objectives

16 0.28122336 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora

17 0.27670744 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information

18 0.27464843 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation

19 0.2727024 65 acl-2012-Crowdsourcing Inference-Rule Evaluation

20 0.26581752 35 acl-2012-Automatically Mining Question Reformulation Patterns from Search Log Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.024), (26, 0.032), (28, 0.068), (30, 0.027), (37, 0.034), (39, 0.028), (48, 0.285), (57, 0.025), (74, 0.066), (82, 0.019), (84, 0.022), (85, 0.036), (90, 0.126), (92, 0.038), (94, 0.052), (99, 0.04)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87930793 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

Author: Alessandro Moschitti

Abstract: unkown-abstract

2 0.80509955 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System

Author: MeiHua Chen ; ShihTing Huang ; HungTing Hsieh ; TingHui Kao ; Jason S. Chang

Abstract: Writing in English might be one of the most difficult tasks for EFL (English as a Foreign Language) learners. This paper presents FLOW, a writing assistance system. It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. 1.

same-paper 3 0.76787573 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

Author: Hong Sun ; Ming Zhou

Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.

4 0.57241535 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu

Abstract: unkown-abstract

5 0.53782499 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.

6 0.52516979 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

7 0.52248603 184 acl-2012-String Re-writing Kernel

8 0.51797628 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

9 0.51729542 140 acl-2012-Machine Translation without Words through Substring Alignment

10 0.51439613 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing

11 0.51332968 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning

12 0.51317573 136 acl-2012-Learning to Translate with Multiple Objectives

13 0.51286209 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

14 0.50971937 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

15 0.50953555 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

16 0.50484341 118 acl-2012-Improving the IBM Alignment Models Using Variational Bayes

17 0.50404775 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents

18 0.50373465 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

19 0.50276268 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information

20 0.50239116 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?