acl acl2010 acl2010-36 knowledge-graph by maker-knowledge-mining

36 acl-2010-Automatic Collocation Suggestion in Academic Writing


Source: pdf

Author: Jian-Cheng Wu ; Yu-Chia Chang ; Teruko Mitamura ; Jason S. Chang

Abstract: In recent years, collocation has been widely acknowledged as an essential characteristic to distinguish native speakers from non-native speakers. Research on academic writing has also shown that collocations are not only common but serve a particularly important discourse function within the academic community. In our study, we propose a machine learning approach to implementing an online collocation writing assistant. We use a data-driven classifier to provide collocation suggestions to improve word choices, based on the result of classifica- tion. The system generates and ranks suggestions to assist learners’ collocation usages in their academic writing with satisfactory results. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Automatic Collocation Suggestion in Academic Writing Wu1 Chang1,* Jian-Cheng Yu-Chia National Tsing Hua University Hsinchu, Taiwan { wuj c 8 6 richtrf j as on j s chang } @ gmai l com 1 , , . [sent-1, score-0.154]

2 Abstract In recent years, collocation has been widely acknowledged as an essential characteristic to distinguish native speakers from non-native speakers. [sent-3, score-0.767]

3 Research on academic writing has also shown that collocations are not only common but serve a particularly important discourse function within the academic community. [sent-4, score-0.782]

4 In our study, we propose a machine learning approach to implementing an online collocation writing assistant. [sent-5, score-0.796]

5 We use a data-driven classifier to provide collocation suggestions to improve word choices, based on the result of classifica- tion. [sent-6, score-0.971]

6 The system generates and ranks suggestions to assist learners’ collocation usages in their academic writing with satisfactory results. [sent-7, score-1.229]

7 1 Introduction The notion of collocation has been widely discussed in the field of language teaching for decades. [sent-8, score-0.682]

8 It has been shown that collocation, a successive common usage of words in a chain, is important in helping language learners achieve native-like fluency. [sent-9, score-0.198]

9 In the field of English for Academic Purpose, more and more researchers are also recognizing this important feature in academic writing. [sent-10, score-0.162]

10 It is often argued that collocation can influence the effectiveness of a piece of writing and the lack of such knowledge might cause cumulative loss of precision (Howarth, 1998). [sent-11, score-0.796]

11 Many researchers have discussed the function of collocations in the highly conventionalized and specialized writing used within academia. [sent-12, score-0.537]

12 Research also identified noticeable increases in the quantity and quality of collocational usage by * Corresponding author: Yu-chia Chang (Email address: richtrf@gmail. [sent-13, score-0.089]

13 Carnegie Mellon University Pittsburgh, United States teruko @ cs . [sent-15, score-0.057]

14 Granger (1998) reported that learners underuse native-like collocations and overuse atypical word combinations. [sent-18, score-0.497]

15 This disparity in collocation usage between native and non-native speakers is clear and should receive more attention from the language technology community. [sent-19, score-0.779]

16 To tackle such word usage problems, traditional language technology often employs a database of the learners' common errors that are manually tagged by teachers or specialists (e. [sent-20, score-0.146]

17 Compiling the database is time-consuming and not easily maintainable, and the usefulness is limited by the manual collection of pre-stored suggestions. [sent-24, score-0.027]

18 Therefore, it is beneficial if a system can mainly use untagged data from a corpus containing correct language usages rather than the errortagged data from a learner corpus. [sent-25, score-0.188]

19 A large corpus of correct language usages is more readily available and useful than a small labeled corpus of incorrect language usages. [sent-26, score-0.052]

20 For this suggestion task, the large corpus not only provides us with a rich set of common collocations but also provides the context within which these collocations appear. [sent-27, score-1.007]

21 Intuitively, we can take account of such context of collocation to generate more suitable suggestions. [sent-28, score-0.695]

22 Contextual information in this sense often entails more linguistic clues to provide suggestions within sentences or paragraph. [sent-29, score-0.302]

23 However, the contextual information is messy and complex and thus has long been overlooked or ignored. [sent-30, score-0.094]

24 To date, most fashionable suggestion methods still rely upon the linguistic components within collocations as well as the linguistic relationship between misused words and their correct counterparts (Chang et al. [sent-31, score-0.714]

25 In contrast to other research, we employ contextual information to automate suggestions for verb-noun lexical collocation. [sent-33, score-0.323]

26 Verb-noun collocations are recognized as presenting the most 115 UppsalaP,r Sowce ed ein ,g 1s1 o-f16 th Jeu AlyC 2L0 210 1. [sent-34, score-0.342]

27 More specifically, in this preliminary study we start by focusing on the word choice of verbs in collocations which are considered as the most difficult ones for learners to master (Liu, 2002; Chang, 2008). [sent-37, score-0.555]

28 The experiment confirms that our collocation writing assistant proves the feasibility of using machine learning methods to automatically prompt learners with collocation suggestions in academic writing. [sent-38, score-2.0]

29 2 Collocation Checking and Suggestion This study aims to develop a web service, Collocation Inspector (shown in Figure 1) that accepts sentences as input and generates the related candidates for learners. [sent-39, score-0.115]

30 In this paper, we focus on automatically pro- viding academic collocation suggestions when users are writing up their abstracts. [sent-40, score-1.167]

31 After an abstract is submitted, the system extracts linguistic features from the user’s text for machine learning model. [sent-41, score-0.049]

32 By using a corpus of published academic texts, we hope to match contextual linguistic clues from users’ text to help elicit the most relevant suggestions. [sent-42, score-0.214]

33 We now formally state the problem that we are addressing: Problem Statement: Given a sentence S written by a learner and a reference corpus RC, our goal is to output a set of most probable suggestion candidates c1, c2, . [sent-43, score-0.48]

34 For this, we train a classifier MC to map the context (represented as feature set f1, f2, . [sent-47, score-0.168]

35 At run-time, we predict these collocations for S as suggestions. [sent-51, score-0.342]

36 1 Academic Collocation ing Procedures Checker Train- Sentence Parsing and Collocation Extraction: We start by collecting a large number of abstracts from the Web to develop a reference corpus for collocation suggestion. [sent-53, score-0.731]

37 And we continue to identify collocations in each sentence for the subsequent processing. [sent-54, score-0.373]

38 Collocation extraction is an essential step in preprocessing data. [sent-55, score-0.055]

39 We only expect to extract the collocation which comprises components having a syntactic relationship with one another. [sent-56, score-0.714]

40 Take the following scholarly sentence from the reference corpus as an example (example (1)): ( 1) We introduce a nove l method fo r learning to find do cument s on the web . [sent-58, score-0.297]

41 f_(iDnemotdhrp- 9(ohedi,nu-t5dcore,-na8d2),cu3nymoepWv-a2trh,slo1in4ed)ga-5rofinEgxa7m)ple(1 Traditionally, through part-of-speech tagging, we can obtain a tagged sentence as follows (example (2)). [sent-60, score-0.084]

42 We can observe that the desired collocation “introduce method”, conforming to “VERB+NOUN” relationship, exists within the sentence. [sent-61, score-0.715]

43 Heuristically writing patterns to extract such verb and noun might not be effective. [sent-63, score-0.165]

44 ( 2 ) We / PRP introduce /VB a / DT nove l JJ method/NN / fo r/ IN learning/VBG to / TO find/VB document s /NNS on / IN the / DT web /NN . [sent-68, score-0.142]

45 ( 3 ) We propo s ed that the webbased model would be more e ffect ive than corpu s -ba s ed one . [sent-70, score-0.098]

46 A natural language parser can facilitate the extraction of the target type of collocations. [sent-71, score-0.024]

47 Such parser is a program that works out the grammatical structure of sentences, for instance, by identifying which group of words go together or which 116 word is the subject or object of a verb. [sent-72, score-0.024]

48 In our study, we take advantage of a dependency parser, Stanford Parser, which extracts typed dependencies for certain grammatical relations (shown in Figure 2). [sent-73, score-0.024]

49 Within the parsed sentence of example (1), we can notice that the extracted dependency “dobj (introduce-2, method-4)” meets the criterion. [sent-74, score-0.031]

50 Using a Classifier for the Suggestion task: A classifier is a function generally to take a set of attributes as an input and to provide a tagged class as an output. [sent-75, score-0.188]

51 The basic way to build a classifier is to derive a regression formula from a set of tagged examples. [sent-76, score-0.155]

52 And this trained classifier can thus make predication and assign a tag to any input data. [sent-77, score-0.102]

53 The suggestion task in this study will be seen as a classification problem. [sent-78, score-0.275]

54 We treat the collocation extracted from each sentence as the class tag (see examples in Table 1). [sent-79, score-0.688]

55 Hopefully, the system can learn the rules between tagged classes (i. [sent-80, score-0.053]

56 scholarly sentences) and can predict which collocation is the most appropriate one given attributes extracted from the sentences. [sent-84, score-0.782]

57 Another advantage of using a classifier to automate suggestion is to provide alternatives with regard to the similar attributes shared by sentences. [sent-85, score-0.475]

58 In Table 1, we can observe that these collocations exhibit a similar discourse function and can thus become interchangeable in these sentences. [sent-86, score-0.399]

59 For our task, we can use a supervised method to automatically learn the relationship between collocations and example sentences. [sent-89, score-0.372]

60 We choose Maximum Entropy (ME) as our training algorithm to build a collocation suggestion classifier. [sent-90, score-0.909]

61 One advantage of an ME classifier is that in addition to assigning a classification it can provide the probability of each assignment. [sent-91, score-0.102]

62 Such constraints are derived from the training data, expressing relationships between features and outcomes. [sent-93, score-0.025]

63 Moreover, an effective feature selection can increase the precision of machine learning. [sent-94, score-0.028]

64 In our study, we employ the contextual features which Table 1. [sent-95, score-0.071]

65 Example sentences and class tags (collocations) Example Sentence Class tag tWo fei n dtr do dcu cme aen ntso voenl tmhe twhoedb . [sent-96, score-0.023]

66 In this paper, we will describe a method of identifying the syntactic role of antecedescribe dents, which consists of two phases In this paper, we suggest a method that pau sto fmroamti ctahley w ceobn tsotr ubcet su asend N fEor tlaegagrneidn gc o rf- suggest NER systems. [sent-98, score-0.025]

67 consist of two elements, the head and the ngram of context words: Head: Each collocation comprises two parts, collocate and head. [sent-99, score-0.859]

68 For example, in a given verbnoun collocation, the verb is the collocate as well as the target for which we provide suggestions; the noun serves as the head of collocation and convey the essential meaning of the collocation. [sent-100, score-0.841]

69 We use the head as a feature to condition the classifier to generate candidates relevant to a given head. [sent-101, score-0.224]

70 Ngram: We use the context words around the target collocation by considering the corresponding unigrams and bigrams words within the sentence. [sent-102, score-0.728]

71 Moreover, to ensure the relevance, those context words, before and after the punctuation marks enclosing the collocation in question, will be excluded. [sent-103, score-0.718]

72 Uni and Bi indicate the unigram and bigram context words of window size two respectively. [sent-105, score-0.038]

73 V and N differentiate the contexts related to verb or noun. [sent-106, score-0.026]

74 2 Automatic Run-time Collocation Suggestion at After the ME classifier is automatically trained, the model is used to find out the best collocation suggestion. [sent-109, score-0.759]

75 Figure 3 shows the algorithm of producing suggestions for a given sentence. [sent-110, score-0.212]

76 The input is a learner’s sentence in an abstract, along with an ME model trained from the reference corpus. [sent-111, score-0.063]

77 In Step (1) of the algorithm, we parse the sentence for data preprocessing. [sent-112, score-0.031]

78 Based on the parser output, we extract the collocation from a given sentence as well as generate features sets in Step (2) and (3). [sent-113, score-0.737]

79 After that in Step (4), with the trained machine-learning model, we obtain a set of likely collocates with probability as predicted by the ME model. [sent-114, score-0.1]

80 In Step (5), SuggestionFilter singles out the valid collocation and returns the best collocation suggestion as output in Step (6). [sent-115, score-1.566]

81 For example, if a learner inputs the sentence like Example (5), the features and output candidates are shown in Table 2. [sent-116, score-0.198]

82 To train a Maximum Entropy classifier, 46,255 collocations are extracted and 790 verbal collocates are identified as tagged classes for collocation suggestions. [sent-119, score-1.152]

83 We tested the classifier on scholarly sentences in place of authentic student writings which were not available at the time of this pilot study. [sent-120, score-0.288]

84 We extracted 364 collocations among 600 randomly selected sentences as the held out test data not overlapping with the training set. [sent-121, score-0.365]

85 To automate the evaluation, we blank out the verb collocates within these sentences and treat these verbs directly as the only correct suggestions in question, although two or more suggestions may be interchangeable or at least appropriate. [sent-122, score-0.728]

86 MRR for different feature sets Feature Sets Included In Classifier MRR Features of HEAD Features of CONTEXT Features of HEAD+CONTEXT 0. [sent-127, score-0.028]

87 The results indicate that on average users could easily find answers (exactly reproduction of the blanked out collocates) in the first two to three ranking of suggestions. [sent-133, score-0.048]

88 It is very likely that we get a much higher MMR value if we would go through the lists and evaluate each suggestion by hand. [sent-134, score-0.252]

89 Moreover, in Table 3, we can further notice that contextual features are quite informative in comparison with the baseline feature set containing merely the feature of HEAD. [sent-135, score-0.127]

90 Also the integrated feature set of HEAD and CONTEXT together achieves a more satisfactory suggestion result. [sent-136, score-0.315]

91 For example, we need to carry out the experiment on authentic learners’ texts. [sent-138, score-0.046]

92 We will conduct a user study to investigate whether our sys- would improve a learner’s writing in a real setting. [sent-139, score-0.162]

93 Additionally, adding classifier features based on the translation of misused words in learners’ text could be beneficial (Chang et al. [sent-140, score-0.22]

94 The translation can help to resolve prevalent collocation misuses influenced by a learner's native language. [sent-142, score-0.729]

95 In summary, we have presented an unsupervised method for suggesting collocations based on a corpus of abstracts collected from the Web. [sent-144, score-0.384]

96 The method involves selecting features from the reference corpus of the scholarly texts. [sent-145, score-0.149]

97 Then a classifier is automatically trained to determine the most probable collocation candidates with regard to the given context. [sent-146, score-0.847]

98 The preliminary results show that it is beneficial to use classifiers for identifying and ranking collocation suggestions based on the context features. [sent-147, score-0.943]

99 An automatic collocation writing assistant for Taiwanese EFL learners: A case of corpus-based NLP technology. [sent-154, score-0.842]

100 Prefabricated patterns in advanced EFL writing: collocations and formulae. [sent-159, score-0.342]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('collocation', 0.657), ('collocations', 0.342), ('suggestion', 0.252), ('suggestions', 0.212), ('learners', 0.155), ('writing', 0.139), ('academic', 0.134), ('nove', 0.115), ('phraseology', 0.115), ('classifier', 0.102), ('learner', 0.1), ('collocates', 0.1), ('chang', 0.097), ('scholarly', 0.092), ('howarth', 0.086), ('unin', 0.086), ('univ', 0.086), ('bin', 0.078), ('mrr', 0.075), ('automate', 0.065), ('assisted', 0.057), ('biv', 0.057), ('cowie', 0.057), ('efl', 0.057), ('interchangeable', 0.057), ('misused', 0.057), ('richtrf', 0.057), ('teruko', 0.057), ('tagged', 0.053), ('head', 0.052), ('usages', 0.052), ('collocate', 0.05), ('shei', 0.05), ('native', 0.047), ('oxford', 0.047), ('contextual', 0.046), ('assistant', 0.046), ('authentic', 0.046), ('collocational', 0.046), ('liu', 0.043), ('usage', 0.043), ('abstracts', 0.042), ('candidates', 0.042), ('transfer', 0.038), ('context', 0.038), ('taiwan', 0.037), ('cn', 0.037), ('beneficial', 0.036), ('ngram', 0.035), ('satisfactory', 0.035), ('master', 0.035), ('clues', 0.034), ('within', 0.033), ('tem', 0.033), ('attributes', 0.033), ('reference', 0.032), ('rc', 0.032), ('speakers', 0.032), ('essential', 0.031), ('sentence', 0.031), ('relationship', 0.03), ('feature', 0.028), ('database', 0.027), ('web', 0.027), ('comprises', 0.027), ('verb', 0.026), ('nrt', 0.025), ('conforming', 0.025), ('corpu', 0.025), ('ffect', 0.025), ('hawking', 0.025), ('hsinchu', 0.025), ('ive', 0.025), ('messy', 0.025), ('misuses', 0.025), ('pain', 0.025), ('teaching', 0.025), ('tsotr', 0.025), ('unpublished', 0.025), ('verbnoun', 0.025), ('writings', 0.025), ('users', 0.025), ('features', 0.025), ('step', 0.024), ('dt', 0.024), ('extracts', 0.024), ('parser', 0.024), ('study', 0.023), ('reproduction', 0.023), ('conventionalized', 0.023), ('mc', 0.023), ('enclosing', 0.023), ('overlooked', 0.023), ('spe', 0.023), ('teachers', 0.023), ('webbased', 0.023), ('sentences', 0.023), ('probable', 0.023), ('regard', 0.023), ('internet', 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 36 acl-2010-Automatic Collocation Suggestion in Academic Writing

Author: Jian-Cheng Wu ; Yu-Chia Chang ; Teruko Mitamura ; Jason S. Chang

Abstract: In recent years, collocation has been widely acknowledged as an essential characteristic to distinguish native speakers from non-native speakers. Research on academic writing has also shown that collocations are not only common but serve a particularly important discourse function within the academic community. In our study, we propose a machine learning approach to implementing an online collocation writing assistant. We use a data-driven classifier to provide collocation suggestions to improve word choices, based on the result of classifica- tion. The system generates and ranks suggestions to assist learners’ collocation usages in their academic writing with satisfactory results. 1

2 0.51920515 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Sheng Li

Abstract: This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system. 1

3 0.27417192 60 acl-2010-Collocation Extraction beyond the Independence Assumption

Author: Gerlof Bouma

Abstract: In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three collocation gold standards, we find the new association measures vary in their effectiveness.

4 0.12421623 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

Author: Omri Abend ; Ari Rappoport

Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.

5 0.066040784 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

Author: Rohit J. Kate ; Yuk Wah Wong

Abstract: unkown-abstract

6 0.055121798 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

7 0.053221218 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

8 0.047967121 152 acl-2010-It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text

9 0.047374554 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

10 0.046302389 216 acl-2010-Starting from Scratch in Semantic Role Labeling

11 0.046001107 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

12 0.043791816 217 acl-2010-String Extension Learning

13 0.042788144 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

14 0.042275563 129 acl-2010-Growing Related Words from Seed via User Behaviors: A Re-Ranking Based Approach

15 0.041783713 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

16 0.041469917 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

17 0.04043613 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

18 0.040084671 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

19 0.038816746 171 acl-2010-Metadata-Aware Measures for Answer Summarization in Community Question Answering

20 0.03834822 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.146), (1, -0.019), (2, -0.019), (3, 0.006), (4, 0.062), (5, 0.013), (6, -0.083), (7, 0.004), (8, 0.071), (9, 0.017), (10, -0.004), (11, 0.08), (12, -0.055), (13, 0.097), (14, 0.261), (15, 0.059), (16, -0.062), (17, -0.29), (18, 0.23), (19, 0.471), (20, -0.191), (21, 0.001), (22, -0.306), (23, 0.096), (24, -0.017), (25, 0.083), (26, 0.074), (27, -0.027), (28, -0.036), (29, -0.099), (30, 0.064), (31, 0.016), (32, -0.017), (33, -0.016), (34, -0.04), (35, 0.039), (36, -0.071), (37, -0.047), (38, -0.01), (39, -0.016), (40, -0.006), (41, -0.018), (42, 0.045), (43, -0.04), (44, 0.021), (45, 0.029), (46, 0.015), (47, -0.004), (48, -0.024), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94467068 36 acl-2010-Automatic Collocation Suggestion in Academic Writing

Author: Jian-Cheng Wu ; Yu-Chia Chang ; Teruko Mitamura ; Jason S. Chang

Abstract: In recent years, collocation has been widely acknowledged as an essential characteristic to distinguish native speakers from non-native speakers. Research on academic writing has also shown that collocations are not only common but serve a particularly important discourse function within the academic community. In our study, we propose a machine learning approach to implementing an online collocation writing assistant. We use a data-driven classifier to provide collocation suggestions to improve word choices, based on the result of classifica- tion. The system generates and ranks suggestions to assist learners’ collocation usages in their academic writing with satisfactory results. 1

2 0.85556138 60 acl-2010-Collocation Extraction beyond the Independence Assumption

Author: Gerlof Bouma

Abstract: In this paper we start to explore two-part collocation extraction association measures that do not estimate expected probabilities on the basis of the independence assumption. We propose two new measures based upon the well-known measures of mutual information and pointwise mutual information. Expected probabilities are derived from automatically trained Aggregate Markov Models. On three collocation gold standards, we find the new association measures vary in their effectiveness.

3 0.62539899 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation

Author: Zhanyi Liu ; Haifeng Wang ; Hua Wu ; Sheng Li

Abstract: This paper proposes to use monolingual collocations to improve Statistical Machine Translation (SMT). We make use of the collocation probabilities, which are estimated from monolingual corpora, in two aspects, namely improving word alignment for various kinds of SMT systems and improving phrase table for phrase-based SMT. The experimental results show that our method improves the performance of both word alignment and translation quality significantly. As compared to baseline systems, we achieve absolute improvements of 2.40 BLEU score on a phrase-based SMT system and 1.76 BLEU score on a parsing-based SMT system. 1

4 0.26843545 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

Author: Omri Abend ; Ari Rappoport

Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.

5 0.22452809 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

Author: Joel Tetreault ; Jennifer Foster ; Martin Chodorow

Abstract: Jennifer Foster NCLT Dublin City University Ireland j fo st er@ comput ing . dcu . ie Martin Chodorow Hunter College of CUNY New York, NY, USA martin . chodorow @hunter . cuny . edu We recreate a state-of-the-art preposition usage system (Tetreault and Chodorow (2008), henceWe evaluate the effect of adding parse features to a leading model of preposition us- age. Results show a significant improvement in the preposition selection task on native speaker text and a modest increment in precision and recall in an ESL error detection task. Analysis of the parser output indicates that it is robust enough in the face of noisy non-native writing to extract useful information.

6 0.19331206 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

7 0.18453941 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

8 0.1697748 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

9 0.16319124 111 acl-2010-Extracting Sequences from the Web

10 0.16138655 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

11 0.15978286 189 acl-2010-Optimizing Question Answering Accuracy by Maximizing Log-Likelihood

12 0.15911613 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

13 0.1582772 61 acl-2010-Combining Data and Mathematical Models of Language Change

14 0.15481968 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

15 0.15440029 139 acl-2010-Identifying Generic Noun Phrases

16 0.15326248 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

17 0.15270513 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing

18 0.15201172 122 acl-2010-Generating Fine-Grained Reviews of Songs from Album Reviews

19 0.14938821 248 acl-2010-Unsupervised Ontology Induction from Text

20 0.14840071 130 acl-2010-Hard Constraints for Grammatical Function Labelling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(11, 0.298), (14, 0.015), (25, 0.049), (33, 0.038), (39, 0.02), (42, 0.04), (44, 0.014), (59, 0.064), (73, 0.058), (78, 0.015), (83, 0.108), (84, 0.024), (98, 0.149)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76074564 36 acl-2010-Automatic Collocation Suggestion in Academic Writing

Author: Jian-Cheng Wu ; Yu-Chia Chang ; Teruko Mitamura ; Jason S. Chang

Abstract: In recent years, collocation has been widely acknowledged as an essential characteristic to distinguish native speakers from non-native speakers. Research on academic writing has also shown that collocations are not only common but serve a particularly important discourse function within the academic community. In our study, we propose a machine learning approach to implementing an online collocation writing assistant. We use a data-driven classifier to provide collocation suggestions to improve word choices, based on the result of classifica- tion. The system generates and ranks suggestions to assist learners’ collocation usages in their academic writing with satisfactory results. 1

2 0.73625362 140 acl-2010-Identifying Non-Explicit Citing Sentences for Citation-Based Summarization.

Author: Vahed Qazvinian ; Dragomir R. Radev

Abstract: Identifying background (context) information in scientific articles can help scholars understand major contributions in their research area more easily. In this paper, we propose a general framework based on probabilistic inference to extract such context information from scientific papers. We model the sentences in an article and their lexical similarities as a Markov Random Field tuned to detect the patterns that context data create, and employ a Belief Propagation mechanism to detect likely context sentences. We also address the problem of generating surveys of scientific papers. Our experiments show greater pyramid scores for surveys generated using such context information rather than citation sentences alone.

3 0.56839597 14 acl-2010-A Risk Minimization Framework for Extractive Speech Summarization

Author: Shih-Hsiang Lin ; Berlin Chen

Abstract: In this paper, we formulate extractive summarization as a risk minimization problem and propose a unified probabilistic framework that naturally combines supervised and unsupervised summarization models to inherit their individual merits as well as to overcome their inherent limitations. In addition, the introduction of various loss functions also provides the summarization framework with a flexible but systematic way to render the redundancy and coherence relationships among sentences and between sentences and the whole document, respectively. Experiments on speech summarization show that the methods deduced from our framework are very competitive with existing summarization approaches. 1

4 0.56526983 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

Author: Deyi Xiong ; Min Zhang ; Haizhou Li

Abstract: Automatic error detection is desired in the post-processing to improve machine translation quality. The previous work is largely based on confidence estimation using system-based features, such as word posterior probabilities calculated from Nbest lists or word lattices. We propose to incorporate two groups of linguistic features, which convey information from outside machine translation systems, into error detection: lexical and syntactic features. We use a maximum entropy classifier to predict translation errors by integrating word posterior probability feature and linguistic features. The experimental results show that 1) linguistic features alone outperform word posterior probability based confidence estimation in error detection; and 2) linguistic features can further provide complementary information when combined with word confidence scores, which collectively reduce the classification error rate by 18.52% and improve the F measure by 16.37%.

5 0.56476903 251 acl-2010-Using Anaphora Resolution to Improve Opinion Target Identification in Movie Reviews

Author: Niklas Jakob ; Iryna Gurevych

Abstract: unkown-abstract

6 0.56266963 185 acl-2010-Open Information Extraction Using Wikipedia

7 0.56124336 188 acl-2010-Optimizing Informativeness and Readability for Sentiment Summarization

8 0.56077015 39 acl-2010-Automatic Generation of Story Highlights

9 0.55965722 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

10 0.55943358 133 acl-2010-Hierarchical Search for Word Alignment

11 0.5578661 71 acl-2010-Convolution Kernel over Packed Parse Forest

12 0.5578506 214 acl-2010-Sparsity in Dependency Grammar Induction

13 0.55689186 5 acl-2010-A Framework for Figurative Language Detection Based on Sense Differentiation

14 0.55678099 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

15 0.55673552 22 acl-2010-A Unified Graph Model for Sentence-Based Opinion Retrieval

16 0.55576414 208 acl-2010-Sentence and Expression Level Annotation of Opinions in User-Generated Discourse

17 0.55488092 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

18 0.55445296 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

19 0.55392909 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries

20 0.55340683 42 acl-2010-Automatically Generating Annotator Rationales to Improve Sentiment Classification