acl acl2011 acl2011-108 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chung-chi Huang ; Mei-hua Chen ; Shih-ting Huang ; Jason S. Chang
Abstract: We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e.g., “play ~ role in Ving” and “look forward to Ving”). At runtime, the given passage submitted by the learner is matched using an extended Levenshtein algorithm against the set of pattern rules in order to detect errors and provide suggestions. We present a prototype implementation of the proposed method, EdIt, that can handle a broad range of errors. Promising results are illustrated with three common types of errors in nonnative writing. 1
Reference: text
sentIndex sentText sentNum sentScore
1 ccoomm , , , Abstract We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. [sent-11, score-0.294]
2 The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e. [sent-12, score-0.237]
3 , “play ~ role in Ving” and “look forward to Ving”). [sent-14, score-0.242]
4 At runtime, the given passage submitted by the learner is matched using an extended Levenshtein algorithm against the set of pattern rules in order to detect errors and provide suggestions. [sent-15, score-0.824]
5 Promising results are illustrated with three common types of errors in nonnative writing. [sent-17, score-0.151]
6 1 Introduction Recently, an increasing number of research has targeted language learners’ need in editorial assistance including detecting and correcting grammar and usage errors in texts written in a second language. [sent-18, score-0.661]
7 Much of the research in this area depends on hand-crafted rules and focuses on certain error types. [sent-20, score-0.235]
8 Very little research provides a general 26 framework for detecting and correcting all types of errors. [sent-21, score-0.19]
9 However, in the sentences of ESL writing, there may be more than one errors and one error may affect the performance of handling other errors. [sent-22, score-0.182]
10 Erroneous sentences could be more efficiently identified and corrected if a grammar checker handles all errors at once, using a set of pattern rules that reflect the predominant usage of the English language. [sent-23, score-0.932]
11 Consider the sentences, “He play an important roles to close this deals. [sent-24, score-0.307]
12 Good responses to these writing errors might be (a) Use “played” instead of “play. [sent-31, score-0.189]
13 ” These suggestions can be offered by learning the patterns rules related to “play ~ role” and “look forward” based on analysis of ngrams and collocations in a very large-scale reference corpus. [sent-33, score-0.626]
14 With corpus statistics, we could learn the needed phraseological tendency in the form of pattern rules such as “play ~ role in V-ing) and “look forward to V-ing. [sent-34, score-0.654]
15 ” The use of such pattern rules is in line with the recent theory of Pattern Grammar put forward by Hunston and Francis (2000). [sent-35, score-0.565]
16 We present a system, EdIt, that automatically learns to provide suggestions for rare/wrong usages in non-native writing. [sent-36, score-0.238]
17 EdIt has retrieved the related pattern grammar of some ngram and collocation sequences given the input (e. [sent-38, score-0.583]
18 , “play ~ role in V-ing1 ”, and “look forward to V-ing”). [sent-40, score-0.242]
19 EdIt learns these patterns during pattern extraction process by syntactically analyzing a collection of well-formed, published texts. [sent-41, score-0.308]
20 , “He play an important roles to close ”) submitted by the L2 learner. [sent-44, score-0.307]
21 And EdIt tag the passage with part of speech information, and compares the tagged sentence against the pattern rules anchored at certain collocations (e. [sent-45, score-0.683]
22 Finally, EdIt finds the minimum-edit-cost patterns matching the passages using an extended Levenshtein’s algorithm (Levenshtein, 1966). [sent-48, score-0.127]
23 The system then highlights the edits and displays the pattern rules as suggestions for correction. [sent-49, score-0.64]
24 In our prototype, EdIt returns the preferred word form and preposition usages to the user directly (see Figure 1); alternatively, the actual surface words (e. [sent-50, score-0.194]
25 Many methods, rule-oriented or datadriven, have been proposed to tackle the problem of detecting and correcting incorrect grammatical and usage errors in learner texts. [sent-58, score-0.627]
26 , article and preposition error), a number of interesting rulebased systems have been proposed. [sent-63, score-0.135]
27 (2009) leverage heuristic rules for detecting Basque determiner and Korean particle errors, respectively. [sent-66, score-0.275]
28 (2009) bases some of the modules in ESL Assistant on rules derived from manually inspecting learner data. [sent-68, score-0.335]
29 Our pattern rules, however, are automatically derived from readily available well-formed data, but nevertheless very helpful for correcting errors in non-native writing. [sent-69, score-0.487]
30 More recently, statistical approaches to developing grammar checkers have prevailed. [sent-70, score-0.278]
31 (2008) and Gamon and Leacock (2010) both use Web as a corpus to detect errors in non-native writing. [sent-73, score-0.148]
32 On the other hand, supervised models, typically treating error detection/correction as a classification problem, may train on well-formed texts as in the methods by De Felice and Pulman (2008) and Tetreault et al. [sent-74, score-0.119]
33 (2010), or with additional learner texts as in the method proposed by Brockett et al. [sent-75, score-0.219]
34 (2007) describes a method for constructing a supervised detection system trained on raw well-formed and learner texts without error annotation. [sent-78, score-0.295]
35 Recent work has been done on incorporating word class information into grammar checkers. [sent-79, score-0.182]
36 In an approach similar to our work, Tsao and Wible (2009) use a combined ngrams of words forms, lemmas, and part-of-speech tags for research into constructional phenomena. [sent-82, score-0.142]
37 The main differences are that we anchored each pattern rule in lexical collocation so as to avoid deriving rules that is may have two 1 In the pattern rules, we translate the part-of-speech tag to labels that are commonly used in learner dictionaries. [sent-83, score-1.082]
38 The pattern rules we have derived are more specific and can be effectively used in detecting and correcting errors. [sent-87, score-0.602]
39 In contrast to the previous research, we introduce a broad-coverage grammar checker that accommodates edits such as substitution, insertion and deletion, as well as replacing word forms or prepositions using pattern rules automatically derived from very large-scale corpora of well-formed texts. [sent-88, score-0.956]
40 3 The EdIt System Using supervised training on a learner corpus is not very feasible due to the limited availability of large-scale annotated non-native writing. [sent-89, score-0.176]
41 Existing systems trained on learner data tend to offer high precision but low recall. [sent-90, score-0.176]
42 Broad coverage grammar checkers may be developed using readily available large-scale corpora. [sent-91, score-0.278]
43 To detect and correct errors in non-native writing, a promising approach is to automatically extract lexico-syntactical pattern rules that are expected to distinguish correct and in correct sentences. [sent-92, score-0.56]
44 1 Problem Statement We focus on correcting grammatical and usage errors by exploiting pattern rules of specific collocation (elastic or rigid such as “play ~ rule” or “look forward”). [sent-94, score-0.901]
45 EdIt provides suggestions to common writing errors2 of the following correlated with essay scores3. [sent-96, score-0.294]
46 (1) wrong word form (A) singular determiner preceding plural noun (B) wrong verb form: concerning modal verbs (e. [sent-97, score-0.231]
47 , “look forward to see you” and “in an attempt to helping you”) (2) wrong preposition (or infinitive-to) (A) wrong preposition (e. [sent-103, score-0.549]
48 , “to depends of it”) (B) wrong preposition and verb form (e. [sent-105, score-0.249]
49 , “to play an important role to close this deal”) (3) transitivity errors (A) transitive verb (e. [sent-107, score-0.538]
50 , “to discuss about the matter” and “to affect to his decision”) (B) intransitive verb (e. [sent-109, score-0.133]
51 The system is designed to find pattern rules related to the errors and return suggestionst. [sent-112, score-0.518]
52 Our goal is to detect grammatical and usage errors in T and provide suggestions for correction. [sent-115, score-0.482]
53 For this, we extract a set of pattern rules, u1,… um from C such that the rules reflect the predominant usage and are likely to distinguish most errors in nonnative writing. [sent-116, score-0.709]
54 First, we define a strategy for identifying predominant phraseology of frequent ngrams and collocations in Section 3. [sent-118, score-0.287]
55 Afer that, we show how EdIt proposes grammar correc… …, tionsedits to non-native writing at run-time in Section 3. [sent-120, score-0.265]
56 , “play ~ role in V-ing”) from C expected to represent the immediate context of collocations (e. [sent-125, score-0.219]
57 Lemmatization and POS tagging both help to produce more general pattern rules from ngrams or collocations. [sent-131, score-0.515]
58 In the second stage of the training process, we calculate ngrams and collocations in C, and pass the frequent ngrams and collocations to Stage 4. [sent-135, score-0.558]
59 In the third stage in the training procedure, we build up inverted files for the lemmas in C for quick access in Stage 4. [sent-139, score-0.264]
60 These sentences are then transformed into a set of n-gram of words and POS tags, which are subsequently counted and ranked to produce pattern rules with high frequencies. [sent-148, score-0.412]
61 3 Run-Time Error Correction Once the patterns rules are derived from a corpus of well-formed texts, EdIt utilizes them to check grammaticality and provide suggestions for a given text via the procedure in Figure 2. [sent-150, score-0.393]
62 In Step (1) of the procedure, we initiate a set Suggestions to collect grammar suggestions to the user text T according to the bank of pattern grammar PatternGrammarBank. [sent-151, score-0.796]
63 Since EdIt system focuses on grammar checking at sentence level, T is heuristically split (Step (2)). [sent-152, score-0.221]
64 He looks forword to hear you”, we extract ngram such as he V DET, play an JJ NNS, play ~ roles to V, this NNS, look forward to VB, and hear Pron. [sent-155, score-1.097]
65 For each userUsage, we first access the pattern rules related to the word and collocation within (e. [sent-156, score-0.512]
66 , play-role patterns for “play ~ role to close”) Step (4). [sent-158, score-0.144]
67 And then we compare userUsage against these rules (from Step (5) to (7)). [sent-159, score-0.159]
68 We use the extended Levenshtein’s algorithm shown in Figure 3 to compare userUsage and pattern rules. [sent-160, score-0.285]
69 We use minEditedCost and minEditedSug to contrain the patterns rules found for error suggestions (Step (5)). [sent-162, score-0.469]
70 Afterwards, the algorithm defines the cost of performing substitution (Step (2)), deletion (Step (3)) and insertion (Step (4)) at i-indexed userUsage and j-indexed pattern. [sent-165, score-0.21]
71 , “DT” and “Pron$”), no edit is needed, hence, no cost (Step (2a)). [sent-170, score-0.332]
72 On the other hand, since learners tend to select wrong word form and preposition, we set a lower cost for substitution among different word forms of the same lemma or lemmas with the same POS tag (e. [sent-171, score-0.383]
73 In addition to the conventional deletion and insertion (Step (3b) and (4b) respectively), we look ahead to the elements userUsage[i+1] and pattern[j+1] considering the fact that “with or without preposition” and “transitive or intransitive verb” often puzzles EFL learners (Step (3a) and (4a)). [sent-174, score-0.362]
74 Only a small edit cost is counted if the next elements in userUsage and Pattern are “equal”. [sent-175, score-0.332]
75 In Step (6) the extended Levenshtein’s algorithm returns the minimum edit cost of revising userUsage using pattern. [sent-176, score-0.412]
76 Once we obtain the costs to transform the userUsage into a similar, frequent pattern rules, we propose the minimum-cost rules as suggestions for correction (e. [sent-177, score-0.665]
77 , “play ~ role in V-ing” for revising “play ~ role to V”) (Step (8) in Figure 2), if its minimum edit cost is greater than zero. [sent-179, score-0.558]
78 Example input and editorial suggestions returned to the user are shown in Figure 1. [sent-182, score-0.227]
79 Note that pattern rules involved flexible collocations are designed to take care of long distance dependencies that might be always possible to cover with limited ngram (for n less than 6). [sent-183, score-0.59]
80 In addition, the long patter rules can be useful even when it is not clear whether there is an error when looking at a very narrow context. [sent-184, score-0.235]
81 For example, “hear” can be either be transitive or intransitive depending on context. [sent-185, score-0.125]
82 In the context of “look forward to” and person noun object, it is should be intransitive and require the preposition “from” as suggested in the results provided by EdIt (see Figure 1). [sent-186, score-0.37]
83 In existing grammar checkers, there are typically many modules examining different types of errors and different module may have different priority and conflict with one another. [sent-187, score-0.288]
84 Let us note that this general framework for error detection and correction is an original contribution of our work. [sent-188, score-0.15]
85 In addition, we incorporate probabilities conditioned on word positions in order to weigh edit costs. [sent-189, score-0.335]
86 For example, the conditional probability of V to immediately follow “look forward to” is virtually 0, while the probability of V-ing to do so is approximates 0. [sent-190, score-0.153]
87 Since our goal is to provide to learners a means to efficient broadcoverage grammar checking, EdIt is web-based and the acquisition of the pattern grammar in use is offline. [sent-195, score-0.697]
88 80/theSite/EdIt_demo2/ 30 ing inverted files and used as examples for GRASP to infer lexicalized pattern grammar. [sent-203, score-0.355]
89 2 Results We examined three types of errors and the mixture of them for our correction system (see Table 1). [sent-209, score-0.18]
90 In this table, results of ESL Assistant are shown for comparison, and grammatical suggestions are underscored. [sent-210, score-0.242]
91 For example, we could augment pattern grammar with lexemes’ PoS information in that the contexts of a word of different PoS tags vary. [sent-213, score-0.474]
92 The present tense verb discuss is often followed by determiners and nouns while the passive is by the preposition in as in “… is discussed in Chapter one. [sent-215, score-0.186]
93 ” Additionally, an interesting direction to explore is enriching pattern grammar with semantic role labels (Chen et al. [sent-216, score-0.524]
94 In summary, we have introduced a method for correcting errors in learner text based on its lexical and PoS evidence. [sent-218, score-0.41]
95 We have implemented the method and shown that the pattern grammar and extended Levenshtein algorithm in this method are promising in grammar checking. [sent-219, score-0.649]
96 A classifer-based approach to preposition and determiner error correction in L2 English. [sent-259, score-0.339]
97 Search right and thou shalt find using web queries for learner error detection. [sent-285, score-0.252]
98 Pattern grammar: a corpusdriven approach to the lexical grammar of English. [sent-298, score-0.182]
99 Binary codes capable of correcting deletions, insertions and reversals. [sent-312, score-0.128]
100 The Cambridge Learner Corpus – error coding and analysis for writing dictionaries and other books for English Learners. [sent-322, score-0.159]
wordName wordTfidf (topN-words)
[('userusage', 0.357), ('edit', 0.276), ('pattern', 0.253), ('play', 0.205), ('grammar', 0.182), ('suggestions', 0.179), ('learner', 0.176), ('hear', 0.169), ('rules', 0.159), ('forward', 0.153), ('levenshtein', 0.151), ('preposition', 0.135), ('collocations', 0.13), ('correcting', 0.128), ('esl', 0.126), ('leacock', 0.126), ('errors', 0.106), ('ngrams', 0.103), ('collocation', 0.1), ('checkers', 0.096), ('stage', 0.092), ('usage', 0.092), ('pron', 0.091), ('look', 0.09), ('chodorow', 0.09), ('tsao', 0.089), ('role', 0.089), ('checker', 0.086), ('writing', 0.083), ('intransitive', 0.082), ('learners', 0.08), ('gamon', 0.08), ('error', 0.076), ('correction', 0.074), ('pos', 0.074), ('efl', 0.072), ('lemmas', 0.07), ('inverted', 0.065), ('assistant', 0.065), ('wrong', 0.063), ('grammatical', 0.063), ('detecting', 0.062), ('step', 0.059), ('accommodates', 0.059), ('hsinchu', 0.059), ('hunston', 0.059), ('tsing', 0.059), ('ving', 0.059), ('weigh', 0.059), ('wible', 0.059), ('usages', 0.059), ('roles', 0.058), ('deletion', 0.057), ('cost', 0.056), ('passage', 0.056), ('deriving', 0.056), ('patterns', 0.055), ('determiner', 0.054), ('predominant', 0.054), ('insertion', 0.053), ('hermet', 0.052), ('vbg', 0.052), ('verb', 0.051), ('brockett', 0.05), ('edits', 0.049), ('ngram', 0.048), ('editorial', 0.048), ('revising', 0.048), ('anchored', 0.048), ('vb', 0.048), ('nonnative', 0.045), ('close', 0.044), ('substitution', 0.044), ('transitive', 0.043), ('texts', 0.043), ('detect', 0.042), ('replacing', 0.041), ('prepositions', 0.041), ('felice', 0.041), ('nns', 0.04), ('taiwan', 0.04), ('hua', 0.04), ('passages', 0.04), ('checking', 0.039), ('chen', 0.039), ('tags', 0.039), ('vn', 0.038), ('burstein', 0.038), ('closing', 0.038), ('tetreault', 0.037), ('files', 0.037), ('tag', 0.037), ('huang', 0.035), ('prototype', 0.034), ('korean', 0.034), ('broad', 0.034), ('forms', 0.033), ('sun', 0.033), ('extended', 0.032), ('essay', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
Author: Chung-chi Huang ; Mei-hua Chen ; Shih-ting Huang ; Jason S. Chang
Abstract: We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e.g., “play ~ role in Ving” and “look forward to Ving”). At runtime, the given passage submitted by the learner is matched using an extended Levenshtein algorithm against the set of pattern rules in order to detect errors and provide suggestions. We present a prototype implementation of the proposed method, EdIt, that can handle a broad range of errors. Promising results are illustrated with three common types of errors in nonnative writing. 1
2 0.24558227 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
Author: Ryo Nagata ; Edward Whittaker ; Vera Sheinman
Abstract: The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowparsed. This corpus is available for research and educational purposes on the web. In this paper, we describe it in detail together with its data-collection method and annotation schemes. Another contribution of this paper is that we take the first step toward evaluating the performance of existing POStagging/chunking techniques on learner corpora using the created corpus. These contributions will facilitate further research in related areas such as grammatical error detection and automated essay scoring.
3 0.20141846 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
Author: Alla Rozovskaya ; Dan Roth
Abstract: We consider the problem of correcting errors made by English as a Second Language (ESL) writers and address two issues that are essential to making progress in ESL error correction - algorithm selection and model adaptation to the first language of the ESL learner. A variety of learning algorithms have been applied to correct ESL mistakes, but often comparisons were made between incomparable data sets. We conduct an extensive, fair comparison of four popular learning methods for the task, reversing conclusions from earlier evaluations. Our results hold for different training sets, genres, and feature sets. A second key issue in ESL error correction is the adaptation of a model to the first language ofthe writer. Errors made by non-native speakers exhibit certain regularities and, as we show, models perform much better when they use knowledge about error patterns of the nonnative writers. We propose a novel way to adapt a learned algorithm to the first language of the writer that is both cheaper to implement and performs better than other adaptation methods.
4 0.20118327 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model
Author: Y. Albert Park ; Roger Levy
Abstract: Automated grammar correction techniques have seen improvement over the years, but there is still much room for increased performance. Current correction techniques mainly focus on identifying and correcting a specific type of error, such as verb form misuse or preposition misuse, which restricts the corrections to a limited scope. We introduce a novel technique, based on a noisy channel model, which can utilize the whole sentence context to determine proper corrections. We show how to use the EM algorithm to learn the parameters of the noise model, using only a data set of erroneous sentences, given the proper language model. This frees us from the burden of acquiring a large corpora of corrected sentences. We also present a cheap and efficient way to provide automated evaluation re- sults for grammar corrections by using BLEU and METEOR, in contrast to the commonly used manual evaluations.
5 0.18967223 147 acl-2011-Grammatical Error Correction with Alternating Structure Optimization
Author: Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present a novel approach to grammatical error correction based on Alternating Structure Optimization. As part of our work, we introduce the NUS Corpus of Learner English (NUCLE), a fully annotated one million words corpus of learner English available for research purposes. We conduct an extensive evaluation for article and preposition errors using various feature sets. Our experiments show that our approach outperforms two baselines trained on non-learner text and learner text, respectively. Our approach also outperforms two commercial grammar checking software packages.
6 0.15708172 302 acl-2011-They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems
7 0.098695017 11 acl-2011-A Fast and Accurate Method for Approximate String Search
8 0.093001947 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
9 0.088240407 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
10 0.087035239 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
11 0.085522339 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
12 0.083024919 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar
13 0.082703091 20 acl-2011-A New Dataset and Method for Automatically Grading ESOL Texts
14 0.079701774 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
15 0.079476036 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks
16 0.078627884 266 acl-2011-Reordering with Source Language Collocations
17 0.076046422 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering
18 0.072015055 333 acl-2011-Web-Scale Features for Full-Scale Parsing
19 0.068252988 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
20 0.065784886 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output
topicId topicWeight
[(0, 0.194), (1, -0.038), (2, -0.033), (3, -0.04), (4, -0.061), (5, -0.002), (6, 0.05), (7, -0.104), (8, -0.027), (9, -0.069), (10, -0.228), (11, -0.132), (12, -0.01), (13, 0.188), (14, -0.02), (15, 0.276), (16, 0.024), (17, -0.072), (18, -0.074), (19, 0.055), (20, -0.068), (21, 0.007), (22, -0.023), (23, -0.079), (24, -0.024), (25, 0.026), (26, -0.01), (27, 0.011), (28, 0.03), (29, -0.031), (30, 0.048), (31, 0.002), (32, 0.021), (33, -0.042), (34, -0.037), (35, -0.034), (36, 0.003), (37, -0.02), (38, -0.065), (39, 0.051), (40, -0.049), (41, -0.007), (42, -0.029), (43, 0.001), (44, 0.076), (45, -0.027), (46, 0.01), (47, -0.015), (48, 0.014), (49, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.96866667 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
Author: Chung-chi Huang ; Mei-hua Chen ; Shih-ting Huang ; Jason S. Chang
Abstract: We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e.g., “play ~ role in Ving” and “look forward to Ving”). At runtime, the given passage submitted by the learner is matched using an extended Levenshtein algorithm against the set of pattern rules in order to detect errors and provide suggestions. We present a prototype implementation of the proposed method, EdIt, that can handle a broad range of errors. Promising results are illustrated with three common types of errors in nonnative writing. 1
2 0.86169529 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
Author: Ryo Nagata ; Edward Whittaker ; Vera Sheinman
Abstract: The availability of learner corpora, especially those which have been manually error-tagged or shallow-parsed, is still limited. This means that researchers do not have a common development and test set for natural language processing of learner English such as for grammatical error detection. Given this background, we created a novel learner corpus that was manually error-tagged and shallowparsed. This corpus is available for research and educational purposes on the web. In this paper, we describe it in detail together with its data-collection method and annotation schemes. Another contribution of this paper is that we take the first step toward evaluating the performance of existing POStagging/chunking techniques on learner corpora using the created corpus. These contributions will facilitate further research in related areas such as grammatical error detection and automated essay scoring.
3 0.83998722 147 acl-2011-Grammatical Error Correction with Alternating Structure Optimization
Author: Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present a novel approach to grammatical error correction based on Alternating Structure Optimization. As part of our work, we introduce the NUS Corpus of Learner English (NUCLE), a fully annotated one million words corpus of learner English available for research purposes. We conduct an extensive evaluation for article and preposition errors using various feature sets. Our experiments show that our approach outperforms two baselines trained on non-learner text and learner text, respectively. Our approach also outperforms two commercial grammar checking software packages.
4 0.78044319 46 acl-2011-Automated Whole Sentence Grammar Correction Using a Noisy Channel Model
Author: Y. Albert Park ; Roger Levy
Abstract: Automated grammar correction techniques have seen improvement over the years, but there is still much room for increased performance. Current correction techniques mainly focus on identifying and correcting a specific type of error, such as verb form misuse or preposition misuse, which restricts the corrections to a limited scope. We introduce a novel technique, based on a noisy channel model, which can utilize the whole sentence context to determine proper corrections. We show how to use the EM algorithm to learn the parameters of the noise model, using only a data set of erroneous sentences, given the proper language model. This frees us from the burden of acquiring a large corpora of corrected sentences. We also present a cheap and efficient way to provide automated evaluation re- sults for grammar corrections by using BLEU and METEOR, in contrast to the commonly used manual evaluations.
5 0.77175432 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
Author: Alla Rozovskaya ; Dan Roth
Abstract: We consider the problem of correcting errors made by English as a Second Language (ESL) writers and address two issues that are essential to making progress in ESL error correction - algorithm selection and model adaptation to the first language of the ESL learner. A variety of learning algorithms have been applied to correct ESL mistakes, but often comparisons were made between incomparable data sets. We conduct an extensive, fair comparison of four popular learning methods for the task, reversing conclusions from earlier evaluations. Our results hold for different training sets, genres, and feature sets. A second key issue in ESL error correction is the adaptation of a model to the first language ofthe writer. Errors made by non-native speakers exhibit certain regularities and, as we show, models perform much better when they use knowledge about error patterns of the nonnative writers. We propose a novel way to adapt a learned algorithm to the first language of the writer that is both cheaper to implement and performs better than other adaptation methods.
6 0.74160463 302 acl-2011-They Can Help: Using Crowdsourcing to Improve the Evaluation of Grammatical Error Detection Systems
7 0.57515007 11 acl-2011-A Fast and Accurate Method for Approximate String Search
8 0.52547091 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars
9 0.47689748 239 acl-2011-P11-5002 k2opt.pdf
10 0.46595958 20 acl-2011-A New Dataset and Method for Automatically Grading ESOL Texts
11 0.46303439 13 acl-2011-A Graph Approach to Spelling Correction in Domain-Centric Search
13 0.45447436 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
14 0.44815823 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output
15 0.44325632 336 acl-2011-Why Press Backspace? Understanding User Input Behaviors in Chinese Pinyin Input Method
17 0.42245743 42 acl-2011-An Interface for Rapid Natural Language Processing Development in UIMA
18 0.4101136 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks
19 0.41003209 329 acl-2011-Using Deep Morphology to Improve Automatic Error Detection in Arabic Handwriting Recognition
20 0.38486889 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge
topicId topicWeight
[(5, 0.039), (17, 0.074), (26, 0.039), (31, 0.012), (37, 0.061), (39, 0.036), (41, 0.053), (53, 0.012), (55, 0.027), (59, 0.048), (72, 0.095), (88, 0.16), (91, 0.15), (96, 0.101), (97, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.86032313 108 acl-2011-EdIt: A Broad-Coverage Grammar Checker Using Pattern Grammar
Author: Chung-chi Huang ; Mei-hua Chen ; Shih-ting Huang ; Jason S. Chang
Abstract: We introduce a new method for learning to detect grammatical errors in learner’s writing and provide suggestions. The method involves parsing a reference corpus and inferring grammar patterns in the form of a sequence of content words, function words, and parts-of-speech (e.g., “play ~ role in Ving” and “look forward to Ving”). At runtime, the given passage submitted by the learner is matched using an extended Levenshtein algorithm against the set of pattern rules in order to detect errors and provide suggestions. We present a prototype implementation of the proposed method, EdIt, that can handle a broad range of errors. Promising results are illustrated with three common types of errors in nonnative writing. 1
2 0.77062649 194 acl-2011-Language Use: What can it tell us?
Author: Marjorie Freedman ; Alex Baron ; Vasin Punyakanok ; Ralph Weischedel
Abstract: For 20 years, information extraction has focused on facts expressed in text. In contrast, this paper is a snapshot of research in progress on inferring properties and relationships among participants in dialogs, even though these properties/relationships need not be expressed as facts. For instance, can a machine detect that someone is attempting to persuade another to action or to change beliefs or is asserting their credibility? We report results on both English and Arabic discussion forums. 1
3 0.73700786 140 acl-2011-Fully Unsupervised Word Segmentation with BVE and MDL
Author: Daniel Hewlett ; Paul Cohen
Abstract: Several results in the word segmentation literature suggest that description length provides a useful estimate of segmentation quality in fully unsupervised settings. However, since the space of potential segmentations grows exponentially with the length of the corpus, no tractable algorithm follows directly from the Minimum Description Length (MDL) principle. Therefore, it is necessary to generate a set of candidate segmentations and select between them according to the MDL principle. We evaluate several algorithms for generating these candidate segmentations on a range of natural language corpora, and show that the Bootstrapped Voting Experts algorithm consistently outperforms other methods when paired with MDL.
4 0.7365061 148 acl-2011-HITS-based Seed Selection and Stop List Construction for Bootstrapping
Author: Tetsuo Kiso ; Masashi Shimbo ; Mamoru Komachi ; Yuji Matsumoto
Abstract: In bootstrapping (seed set expansion), selecting good seeds and creating stop lists are two effective ways to reduce semantic drift, but these methods generally need human supervision. In this paper, we propose a graphbased approach to helping editors choose effective seeds and stop list instances, applicable to Pantel and Pennacchiotti’s Espresso bootstrapping algorithm. The idea is to select seeds and create a stop list using the rankings of instances and patterns computed by Kleinberg’s HITS algorithm. Experimental results on a variation of the lexical sample task show the effectiveness of our method.
5 0.73471224 79 acl-2011-Confidence Driven Unsupervised Semantic Parsing
Author: Dan Goldwasser ; Roi Reichart ; James Clarke ; Dan Roth
Abstract: Current approaches for semantic parsing take a supervised approach requiring a considerable amount of training data which is expensive and difficult to obtain. This supervision bottleneck is one of the major difficulties in scaling up semantic parsing. We argue that a semantic parser can be trained effectively without annotated data, and introduce an unsupervised learning algorithm. The algorithm takes a self training approach driven by confidence estimation. Evaluated over Geoquery, a standard dataset for this task, our system achieved 66% accuracy, compared to 80% of its fully supervised counterpart, demonstrating the promise of unsupervised approaches for this task.
6 0.73393524 80 acl-2011-ConsentCanvas: Automatic Texturing for Improved Readability in End-User License Agreements
7 0.71427178 313 acl-2011-Two Easy Improvements to Lexical Weighting
8 0.70132118 86 acl-2011-Coreference for Learning to Extract Relations: Yes Virginia, Coreference Matters
9 0.69982004 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation
10 0.6922031 145 acl-2011-Good Seed Makes a Good Crop: Accelerating Active Learning Using Language Modeling
11 0.68172961 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus
12 0.68073875 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
13 0.6747601 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning
14 0.67270982 262 acl-2011-Relation Guided Bootstrapping of Semantic Lexicons
15 0.66574329 261 acl-2011-Recognizing Named Entities in Tweets
16 0.66462624 304 acl-2011-Together We Can: Bilingual Bootstrapping for WSD
17 0.66362369 252 acl-2011-Prototyping virtual instructors from human-human corpora
19 0.65854585 91 acl-2011-Data-oriented Monologue-to-Dialogue Generation
20 0.65716124 121 acl-2011-Event Discovery in Social Media Feeds