acl acl2012 acl2012-92 knowledge-graph by maker-knowledge-mining

92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System


Source: pdf

Author: MeiHua Chen ; ShihTing Huang ; HungTing Hsieh ; TingHui Kao ; Jason S. Chang

Abstract: Writing in English might be one of the most difficult tasks for EFL (English as a Foreign Language) learners. This paper presents FLOW, a writing assistance system. It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. 1.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. [sent-10, score-0.768]

2 FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. [sent-11, score-1.206]

3 Introduction Writing in a second language (L2) is a challenging and complex process for foreign language learners. [sent-13, score-0.083]

4 Insufficient lexical knowledge and limited exposure to English might interrupt their writing flow (Silva, 1993). [sent-14, score-0.821]

5 Numerous writing instructions have been proposed (Kroll, 1990) as well as writing handbooks have been available for learners. [sent-15, score-0.708]

6 Studies have revealed that during the writing process, EFL learners show the inclination to rely on their native languages (Wolfersberger, 2003) to prevent a breakdown in the writing process (Arndt, 1987; Cumming, 1989). [sent-16, score-0.8]

7 However, existing writing courses and instruction materials, almost second-language-oriented, seem unable to directly assist EFL writers while writing. [sent-17, score-0.781]

8 This paper presents FLOW1 (Figure 1), an interactive system for assisting EFL writers in 1 FLOW: http:// flowacldemo. [sent-18, score-0.43]

9 Different from existing tools, its context-sensitive and firstlanguage-oriented features enable EFL writers to concentrate on their ideas and thoughts without being hampered by the limited lexical resources. [sent-21, score-0.438]

10 Based on the studies that first language use can positively affect second language composing, FLOW attempts to meet such needs. [sent-22, score-0.03]

11 Given any L1 input, FLOW displays appropriate suggestions including translation, paraphrases, and n-grams during composing and revising processes. [sent-23, score-0.669]

12 We use the following example sentences to illustrate these two functionalities. [sent-24, score-0.025]

13 During the composing stage, suppose a writer is unsure of the phrase “solve the problem”, he could write “解決問題”, a corresponding word in his native language, like “We propose a method to 解決問題“. [sent-26, score-0.415]

14 The writer’s input in the writing area of FLOW actively triggers a set of translation suggestions such as “solve the problem” and “tackle the problem” for him/her to complete the sentence. [sent-27, score-0.765]

15 In the revising stage, the writer intends to improve or correct the content. [sent-28, score-0.347]

16 He/She is likely to change the sentence illustrated above into “We try all means to solve the problem. [sent-29, score-0.056]

17 ” He would select the phrase “propose a method” in the original sentence and input a L1 phrase 盡力” , which specifies the meaning he prefers. [sent-30, score-0.168]

18 The L1 input triggers a set of context-aware suggestions corresponding to the translations such as “try our best” and “do our best” rather than “try your best” and “do your best”. [sent-31, score-0.346]

19 The system is able to do that mainly by taking a context-sensitive approach. [sent-32, score-0.026]

20 FLOW then inserts the phrase the writer selects into the sentence. [sent-33, score-0.195]

21 Screenshot of FLOW In this paper, we propose a context-sensitive disambiguation model which aims to automatically choose the appropriate phrases in different contexts when performing n-gram prediction, paraphrase suggestion and translation tasks. [sent-37, score-0.628]

22 As described in (Carpuat and Wu, 2007), the disambiguation model plays an important role in the machine translation task. [sent-38, score-0.241]

23 Similar to their work, we further integrate the multi-word phrasal lexical disambiguation model to the n-gram prediction model, paraphrase model and translation model of our system. [sent-39, score-0.698]

24 With the phrasal disambiguation model, the output of the system is sensitive to the context the writer is working on. [sent-40, score-0.407]

25 The context-sensitive feature helps writers find the appropriate phrase while composing and revising. [sent-41, score-0.677]

26 1 Sub-sentential paraphrases A variety of data-driven paraphrase extraction techniques have been proposed in the literature. [sent-49, score-0.408]

27 One of the most popular methods leveraging bilingual parallel corpora is proposed by Bannard and Callison-Burch (2005). [sent-50, score-0.086]

28 They identify paraphrases using a phrase in another language as a pivot. [sent-51, score-0.307]

29 Using bilingual parallel corpora for 158 paraphrasing demonstrates the strength of semantic equivalence. [sent-52, score-0.152]

30 Another line of research further considers context information to improve the performance. [sent-53, score-0.025]

31 Instead of addressing the issue of local paraphrase acquisition, Max (2009) utilizes the source and target contexts to extract subsentential paraphrases by using pivot SMT systems. [sent-54, score-0.56]

32 2 N-gram suggestions After a survey of several existing writing tools, we focus on reviewing two systems closely related to our study. [sent-56, score-0.553]

33 PENS (Liu et al, 2000), a machine-aided English writing system, provides translations of the corresponding English words or phrases for writers’ reference. [sent-57, score-0.456]

34 Different from PENS, FLOW further suggests paraphrases to help writers revise their writing tasks. [sent-58, score-1.013]

35 While revising, writers would alter the use of language to express their thoughts. [sent-59, score-0.346]

36 The suggestions of paraphrases could meet their need, and they can reproduce their thoughts more fluently. [sent-60, score-0.53]

37 Another tool, TransType (Foster, 2002), a text editor, provides translators with appropriate translation suggestions utilizing trigram language model. [sent-61, score-0.334]

38 The differences between our system and TransType lie in the purpose and the input. [sent-62, score-0.026]

39 FLOW aims to assist EFL writers whereas TransType is a tool for skilled translators. [sent-63, score-0.398]

40 On the other hand, in TransType, the human translator types translation of a given source text, whereas in FLOW the input, either a word or a phrase, could be source or target languages. [sent-64, score-0.115]

41 3 Multi-word phrasal lexical disambiguation In the study more closely related to our work, Carpuat and Wu (2007) propose a novel method to train a phrasal lexical disambiguation model to benefit translation candidates selection in machine translation. [sent-66, score-0.641]

42 They find a way to integrate the stateof-the-art Word Sense Disambiguation (WSD) model into phrase-based statistical machine translation. [sent-67, score-0.025]

43 Instead of using predefined senses drawn from manually constructed sense inventories, their model directly disambiguates between all phrasal translation candidates seen during SMT training. [sent-68, score-0.291]

44 In this paper, we also use the phrasal lexical disambiguation model; however, apart from using disambiguation model to help machine translation, we extend the disambiguation model. [sent-69, score-0.569]

45 With the help of the phrasal lexical disambiguation model, we build three models: a context-sensitive n-gram prediction model, a paraphrase suggestion model, and a translation model which are introduced in the following sections. [sent-70, score-0.82]

46 Overview of FLOW The FLOW system helps language learners in two ways: predicting n-grams in the composing stage and suggesting paraphrases in the revising stage (Figure 2). [sent-72, score-0.893]

47 1 System architecture Composing Stage During the composing process, a user inputs S. [sent-74, score-0.257]

48 If not, FLOW takes the last k words to predict the best matching following n-grams. [sent-76, score-0.038]

49 Otherwise, the system uses the last k words as the query to predict the corresponding n-gram translation. [sent-77, score-0.064]

50 With a set of prediction (either translations or n-grams), the user could choose an appropriate suggestion to complete the sentence in the writing area. [sent-78, score-0.778]

51 Overall Architecture of FLOW in writing and revising processes Revising Stage In the revising stage, given an input I and the user selected words K, FLOW obtains the word sequences L and R surrounding K as reference for prediction. [sent-80, score-0.944]

52 Next, the system suggests subsentential paraphrases for K based on the information of L and R. [sent-81, score-0.356]

53 2 N-gram prediction In the n-gram prediction task, our model takes the last k words with m English words and n foreign language words, {e1, e2, em, f1, f2 fn}, of the source sentences S as the input. [sent-84, score-0.343]

54 2 … … … … Context-Sensitive N-gram Prediction (CS-NP) The CS-NP model is triggered to predict a following n-gram when a user composes sentences consisted of only English words with no foreign language words, namely, n is equal to 0. [sent-87, score-0.234]

55 The goal of the CS-NP model is to find the English phrase e that maximizes the language model probability of the word sequence, {e1, e2, em, e} : … … ? [sent-88, score-0.114]

56 Translation-based N-gram Prediction (TB-NP) When a user types a set of L1 expression f = { f1, f2 fn }, following the English sentences S, the FLOW system will predict the possible translations of f. [sent-135, score-0.261]

57 A simple way to predict the translations is to find the bilingual phrase alignments T(f) using the method proposed by (Och and Ney, 2003). [sent-136, score-0.217]

58 Thus, we use the context {e1, e2, em} proceeding f to fix the prediction of the translation. [sent-138, score-0.13]

59 Predicting the translation e can be treated as a subsentential translation task: … … … 2 In this paper, m = 5. [sent-139, score-0.267]

60 , where we use the user-composed context {e1, e2, em} to disambiguate the translation of f. [sent-158, score-0.156]

61 Although there exist more sophisticated models which could make a better prediction, a simple naïve-Bayes model is shown to be accurate and efficient in the lexical disambiguation task according to (Yarowsky and Florian, 2002). [sent-159, score-0.185]

62 Therefore, in this paper, a naïve-Bayes model is used to disambiguate the translation of f. [sent-160, score-0.156]

63 In addition to the context-word feature, we also use the context-syntax feature, namely surrounding POS tag Pos, to constrain the syntactic structure of the prediction. [sent-161, score-0.025]

64 The TB-NP model could be represented in the following equation: … … ? [sent-162, score-0.025]

65 The probabilities can be estimated using a parallel corpus, which is also used to obtain bilingual phrase alignment. [sent-229, score-0.15]

66 3 Paraphrase Suggestion Unlike the N-gram prediction, in the paraphrase suggestion task, the user selects k words, {e1, e2, ek}, which he/she wants to paraphrase. [sent-231, score-0.409]

67 The model takes the m words {r1, r2, rm} and n words {l1, l2, ln} in the right and left side of the userselected k words respectively. [sent-232, score-0.025]

68 The system also accepts an additional foreign language input, {f1,f2, fl}, which helps limit the meaning of suggested paraphrases to what the user really wants. [sent-233, score-0.443]

69 The output would be a set of paraphrase suggestions that the user-selected phrases can be replaced by those paraphrases precisely. [sent-234, score-0.641]

70 … … … … … … … … Context-Sensitive Paraphrase Suggestion (CSPS) The CS-PS model first finds a set of local paraphrases P of the input phrase K using the pivot-based method proposed by Bannard and Callison-Burch (2005). [sent-235, score-0.401]

71 Although the pivot-based method has been proved efficient and effective in finding local paraphrases, the local paraphrase suggestions may not fit different contexts. [sent-236, score-0.422]

72 Similar to the previous n-gram prediction task, we use the naïve-Bayes approach to disambiguate these local paraphrases. [sent-237, score-0.175]

73 The task is to find the best e such that e with the highest probability for the given context R and L. [sent-238, score-0.025]

74 We further require paraphrases to have similar syntactic structures to the user-selected phrase in terms of POS tags, Pos. [sent-239, score-0.307]

75 Translation-based Paraphrase Suggestion (TBPS) After the user selects a phrase for paraphrasing, with a L1 phrase F as an additional input, the suggestion problem will be: ? [sent-261, score-0.372]

76 The TB-PS model disambiguates paraphrases from the translations of F instead of paraphrases P. [sent-289, score-0.63]

77 Instead of training a whole machine translation using toolkits such as Moses (Koehn et. [sent-292, score-0.09]

78 al, 2007), we used only bilingual phrase alignment as translations to prevent from the noise produced by the machine translation decoder. [sent-293, score-0.293]

79 Word alignments were produced using Giza++ toolkit (Och and Ney, 2003), over a set of 2,220,570 Chinese-English sentence pairs in Hong Kong Parallel Text (LDC2004T08) with sentences segmented using the CKIP Chinese word segmentation system (Ma and Chen, 2003). [sent-294, score-0.051]

80 In training the phrasal lexical disambiguation model, we used the English part of Hong Kong Parallel Text as our training data. [sent-295, score-0.263]

81 To assess the effectiveness of FLOW, we selected 10 Chinese sentences and asked two students to translate the Chinese sentences to English sentences using FLOW. [sent-296, score-0.117]

82 We kept track of the sentences the two students entered. [sent-297, score-0.067]

83 161 Both of the paraphrase models CS-PS and TB-PS perform quite well in assisting the user in the writing task. [sent-299, score-0.64]

84 Besides, although we used the POS tags as features, the syntactic structures of the suggestions are still not consistent to an input or selected phrases. [sent-303, score-0.239]

85 The CS-NP and the TB-NP model also perform a good task. [sent-304, score-0.025]

86 However, the suggested phrases are usually too short to be a semantic unit. [sent-305, score-0.034]

87 The disambiguation model tends to produce shorter phrases because they have more common context features. [sent-306, score-0.21]

88 Conclusion and Future Work In this paper, we presented FLOW, an interactive writing assistance system, aimed at helping EFL writers compose and revise without interrupting their writing flow. [sent-308, score-1.131]

89 Based on the studies on second language writing that EFL writers tend to use their native language to produce texts and then translate into English, the first-language-oriented function provides writers with appropriate translation suggestions. [sent-310, score-1.222]

90 On the other hand, due to the fact that selection of words or phrases is sensitive to syntax and context, our system provides suggestions depending on the contexts. [sent-311, score-0.293]

91 Both functions are expected to improve EFL writers’ writing performance. [sent-312, score-0.354]

92 In future work, we will conduct experiments to gain a deeper understanding of EFL writers’ writing improvement with the help of FLOW, such as integrating FLOW into the writing courses to observe the quality and quantity of students’ writing performance. [sent-313, score-1.142]

93 For example, we are interested in integrating the error detection and correction functions into FLOW to actively help EFL writers achieve better writing success and further motivate EFL writers to write with confidence. [sent-315, score-1.118]

94 Six writers in search of texts: A protocol based study of L1 and L2 writing. [sent-318, score-0.346]

95 In Proceedings of the 2009 Workshop on Applied Textual Inference, ACLIJCNLP, pp 18-26. [sent-350, score-0.032]

96 L1 to L2 writing process and strategy transfer: a look at lower proficiency writers. [sent-361, score-0.377]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('flow', 0.385), ('writing', 0.354), ('writers', 0.346), ('efl', 0.329), ('paraphrases', 0.243), ('revising', 0.231), ('suggestions', 0.199), ('composing', 0.194), ('paraphrase', 0.165), ('transtype', 0.145), ('suggestion', 0.143), ('disambiguation', 0.126), ('prediction', 0.105), ('phrasal', 0.103), ('writer', 0.093), ('translation', 0.09), ('subsentential', 0.087), ('foreign', 0.083), ('stage', 0.072), ('pens', 0.069), ('translations', 0.068), ('paraphrasing', 0.066), ('phrase', 0.064), ('user', 0.063), ('carpuat', 0.061), ('bannard', 0.061), ('assisting', 0.058), ('thoughts', 0.058), ('courses', 0.051), ('disambiguates', 0.051), ('bilingual', 0.047), ('appropriate', 0.045), ('actively', 0.043), ('students', 0.042), ('native', 0.041), ('disambiguate', 0.041), ('english', 0.041), ('fn', 0.041), ('revise', 0.041), ('input', 0.04), ('parallel', 0.039), ('triggers', 0.039), ('na', 0.038), ('selects', 0.038), ('em', 0.038), ('predict', 0.038), ('assistance', 0.036), ('pivot', 0.036), ('sensitive', 0.034), ('phrases', 0.034), ('lexical', 0.034), ('insufficient', 0.033), ('pp', 0.032), ('meet', 0.03), ('assist', 0.03), ('help', 0.029), ('kong', 0.029), ('local', 0.029), ('try', 0.029), ('helps', 0.028), ('foster', 0.028), ('alexandra', 0.028), ('chris', 0.027), ('learners', 0.027), ('solve', 0.027), ('system', 0.026), ('tony', 0.025), ('tesol', 0.025), ('arndt', 0.025), ('ckip', 0.025), ('langlais', 0.025), ('philippe', 0.025), ('aur', 0.025), ('hsinchu', 0.025), ('interrupt', 0.025), ('interrupted', 0.025), ('translator', 0.025), ('tsing', 0.025), ('hong', 0.025), ('context', 0.025), ('surrounding', 0.025), ('model', 0.025), ('sentences', 0.025), ('forward', 0.024), ('prevent', 0.024), ('chinese', 0.024), ('moses', 0.023), ('elt', 0.023), ('florian', 0.023), ('intends', 0.023), ('assistant', 0.023), ('proficiency', 0.023), ('unsure', 0.023), ('exposure', 0.023), ('changning', 0.023), ('max', 0.022), ('sense', 0.022), ('marine', 0.022), ('skilled', 0.022), ('aclijcnlp', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999976 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System

Author: MeiHua Chen ; ShihTing Huang ; HungTing Hsieh ; TingHui Kao ; Jason S. Chang

Abstract: Writing in English might be one of the most difficult tasks for EFL (English as a Foreign Language) learners. This paper presents FLOW, a writing assistance system. It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. 1.

2 0.24095339 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu

Abstract: unkown-abstract

3 0.19451284 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

Author: Hong Sun ; Ming Zhou

Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.

4 0.11171304 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing

Author: John Lee ; Jonathan Webster

Abstract: This paper describes the creation of the first large-scale corpus containing drafts and final versions of essays written by non-native speakers, with the sentences aligned across different versions. Furthermore, the sentences in the drafts are annotated with comments from teachers. The corpus is intended to support research on textual revision by language learners, and how it is influenced by feedback. This corpus has been converted into an XML format conforming to the standards of the Text Encoding Initiative (TEI).

5 0.094479695 134 acl-2012-Learning to Find Translations and Transliterations on the Web

Author: Joseph Z. Chang ; Jason S. Chang ; Roger Jyh-Shing Jang

Abstract: Jason S. Chang Department of Computer Science, National Tsing Hua University 101, Kuangfu Road, Hsinchu, 300, Taiwan j s chang@ c s .nthu . edu .tw Jyh-Shing Roger Jang Department of Computer Science, National Tsing Hua University 101, Kuangfu Road, Hsinchu, 300, Taiwan j ang@ c s .nthu .edu .tw identifying such translation counterparts Web, we can cope with the OOV problem. In this paper, we present a new method on the for learning to finding translations and transliterations on the Web for a given term. The approach involves using a small set of terms and translations to obtain mixed-code snippets from a search engine, and automatically annotating the snippets with tags and features for training a conditional random field model. At runtime, the model is used to extracting translation candidates for a given term. Preliminary experiments and evaluation show our method cleanly combining various features, resulting in a system that outperforms previous work. 1

6 0.083833933 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

7 0.081827223 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

8 0.077550463 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

9 0.07538303 150 acl-2012-Multilingual Named Entity Recognition using Parallel Data and Metadata from Wikipedia

10 0.073669672 140 acl-2012-Machine Translation without Words through Substring Alignment

11 0.072463699 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information

12 0.072316043 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors

13 0.070495531 192 acl-2012-Tense and Aspect Error Correction for ESL Learners Using Global Context

14 0.063225001 153 acl-2012-Named Entity Disambiguation in Streaming Data

15 0.063156709 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

16 0.06083497 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

17 0.060061432 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

18 0.059906118 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

19 0.059557915 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

20 0.057329256 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.172), (1, -0.103), (2, 0.068), (3, 0.048), (4, 0.096), (5, 0.037), (6, -0.014), (7, 0.063), (8, -0.064), (9, -0.025), (10, 0.018), (11, 0.123), (12, -0.0), (13, 0.217), (14, 0.073), (15, 0.061), (16, 0.064), (17, -0.157), (18, -0.165), (19, 0.076), (20, -0.018), (21, 0.014), (22, -0.135), (23, -0.061), (24, -0.037), (25, -0.004), (26, 0.04), (27, 0.077), (28, 0.006), (29, -0.057), (30, -0.027), (31, -0.013), (32, -0.063), (33, 0.082), (34, -0.024), (35, 0.082), (36, -0.03), (37, 0.129), (38, -0.033), (39, 0.053), (40, -0.048), (41, 0.003), (42, -0.087), (43, 0.088), (44, 0.026), (45, 0.019), (46, -0.052), (47, -0.056), (48, 0.067), (49, -0.112)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.92332017 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System

Author: MeiHua Chen ; ShihTing Huang ; HungTing Hsieh ; TingHui Kao ; Jason S. Chang

Abstract: Writing in English might be one of the most difficult tasks for EFL (English as a Foreign Language) learners. This paper presents FLOW, a writing assistance system. It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. 1.

2 0.74296284 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu

Abstract: unkown-abstract

3 0.72957247 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

Author: Hong Sun ; Ming Zhou

Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.

4 0.48070967 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing

Author: John Lee ; Jonathan Webster

Abstract: This paper describes the creation of the first large-scale corpus containing drafts and final versions of essays written by non-native speakers, with the sentences aligned across different versions. Furthermore, the sentences in the drafts are annotated with comments from teachers. The corpus is intended to support research on textual revision by language learners, and how it is influenced by feedback. This corpus has been converted into an XML format conforming to the standards of the Text Encoding Initiative (TEI).

5 0.46522158 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

Author: Sander Wubben ; Antal van den Bosch ; Emiel Krahmer

Abstract: In this paper we describe a method for simplifying sentences using Phrase Based Machine Translation, augmented with a re-ranking heuristic based on dissimilarity, and trained on a monolingual parallel corpus. We compare our system to a word-substitution baseline and two state-of-the-art systems, all trained and tested on paired sentences from the English part of Wikipedia and Simple Wikipedia. Human test subjects judge the output of the different systems. Analysing the judgements shows that by relatively careful phrase-based paraphrasing our model achieves similar sim- a. plification results to state-of-the-art systems, while generating better formed output. We also argue that text readability metrics such as the Flesch-Kincaid grade level should be used with caution when evaluating the output of simplification systems.

6 0.40272948 184 acl-2012-String Re-writing Kernel

7 0.37914532 134 acl-2012-Learning to Find Translations and Transliterations on the Web

8 0.37705228 39 acl-2012-Beefmoves: Dissemination, Diversity, and Dynamics of English Borrowings in a German Hip Hop Forum

9 0.36626714 66 acl-2012-DOMCAT: A Bilingual Concordancer for Domain-Specific Computer Assisted Translation

10 0.36624834 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

11 0.36322817 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence

12 0.33421484 153 acl-2012-Named Entity Disambiguation in Streaming Data

13 0.33316603 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

14 0.3318179 195 acl-2012-The Creation of a Corpus of English Metalanguage

15 0.31679237 67 acl-2012-Deciphering Foreign Language by Combining Language Models and Context Vectors

16 0.31615335 160 acl-2012-Personalized Normalization for a Multilingual Chat System

17 0.31350046 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

18 0.30884531 140 acl-2012-Machine Translation without Words through Substring Alignment

19 0.30666697 190 acl-2012-Syntactic Stylometry for Deception Detection

20 0.30243838 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(26, 0.041), (28, 0.061), (30, 0.018), (37, 0.027), (39, 0.047), (48, 0.34), (57, 0.035), (74, 0.063), (82, 0.024), (84, 0.022), (85, 0.033), (90, 0.085), (92, 0.047), (94, 0.021), (99, 0.046)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.82439059 183 acl-2012-State-of-the-Art Kernels for Natural Language Processing

Author: Alessandro Moschitti

Abstract: unkown-abstract

same-paper 2 0.713166 92 acl-2012-FLOW: A First-Language-Oriented Writing Assistant System

Author: MeiHua Chen ; ShihTing Huang ; HungTing Hsieh ; TingHui Kao ; Jason S. Chang

Abstract: Writing in English might be one of the most difficult tasks for EFL (English as a Foreign Language) learners. This paper presents FLOW, a writing assistance system. It is built based on first-language-oriented input function and context sensitive approach, aiming at providing immediate and appropriate suggestions including translations, paraphrases, and n-grams during composing and revising processes. FLOW is expected to help EFL writers achieve their writing flow without being interrupted by their insufficient lexical knowledge. 1.

3 0.64777821 125 acl-2012-Joint Learning of a Dual SMT System for Paraphrase Generation

Author: Hong Sun ; Ming Zhou

Abstract: SMT has been used in paraphrase generation by translating a source sentence into another (pivot) language and then back into the source. The resulting sentences can be used as candidate paraphrases ofthe source sentence. Existing work that uses two independently trained SMT systems cannot directly optimize the paraphrase results. Paraphrase criteria especially the paraphrase rate is not able to be ensured in that way. In this paper, we propose a joint learning method of two SMT systems to optimize the process of paraphrase generation. In addition, a revised BLEU score (called iBLEU) which measures the adequacy and diversity of the generated paraphrase sentence is proposed for tuning parameters in SMT systems. Our experiments on NIST 2008 testing data with automatic evaluation as well as human judgments suggest that the proposed method is able to enhance the paraphrase quality by adjusting between semantic equivalency and surface dissimilarity.

4 0.44486099 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

Author: Wei He ; Hua Wu ; Haifeng Wang ; Ting Liu

Abstract: unkown-abstract

5 0.41759139 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures

Author: Danilo Croce ; Alessandro Moschitti ; Roberto Basili ; Martha Palmer

Abstract: In this paper, we propose innovative representations for automatic classification of verbs according to mainstream linguistic theories, namely VerbNet and FrameNet. First, syntactic and semantic structures capturing essential lexical and syntactic properties of verbs are defined. Then, we design advanced similarity functions between such structures, i.e., semantic tree kernel functions, for exploiting distributional and grammatical information in Support Vector Machines. The extensive empirical analysis on VerbNet class and frame detection shows that our models capture mean- ingful syntactic/semantic structures, which allows for improving the state-of-the-art.

6 0.41195038 184 acl-2012-String Re-writing Kernel

7 0.40403512 8 acl-2012-A Corpus of Textual Revisions in Second Language Writing

8 0.38973895 178 acl-2012-Sentence Simplification by Monolingual Machine Translation

9 0.38491395 83 acl-2012-Error Mining on Dependency Trees

10 0.38178331 218 acl-2012-You Had Me at Hello: How Phrasing Affects Memorability

11 0.38046479 111 acl-2012-How Are Spelling Errors Generated and Corrected? A Study of Corrected and Uncorrected Spelling Errors Using Keystroke Logs

12 0.38022718 136 acl-2012-Learning to Translate with Multiple Objectives

13 0.37905914 97 acl-2012-Fast and Scalable Decoding with Language Model Look-Ahead for Phrase-based Statistical Machine Translation

14 0.37771231 61 acl-2012-Cross-Domain Co-Extraction of Sentiment and Topic Lexicons

15 0.37739328 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities

16 0.37733296 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

17 0.37718147 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

18 0.37668574 110 acl-2012-Historical Analysis of Legal Opinions with a Sparse Mixed-Effects Latent Variable Model

19 0.3758406 140 acl-2012-Machine Translation without Words through Substring Alignment

20 0.3752003 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents