acl acl2010 acl2010-76 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Shane Bergsma ; Emily Pitler ; Dekang Lin
Abstract: In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
Reference: text
sentIndex sentText sentNum sentScore
1 We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. [sent-7, score-0.424]
2 We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. [sent-8, score-0.669]
3 More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance. [sent-9, score-0.216]
4 1 Introduction Many NLP systems use web-scale N-gram counts (Keller and Lapata, 2003; Nakov and Hearst, 2005; Brants et al. [sent-10, score-0.124]
5 They show web counts are superior to counts from a large corpus. [sent-13, score-0.338]
6 (2009) propose unsupervised and supervised systems that use counts from Google’s N-gram corpus (Brants and Franz, 2006). [sent-15, score-0.244]
7 Is there a benefit in combining web-scale counts with the features used in state-of-theart supervised approaches? [sent-18, score-0.27]
8 While previous work has combined web-scale features with other features in specific classification problems (Modjeska et al. [sent-22, score-0.178]
9 For example, for the task of prenominal adjective ordering (Section 3), a system that needs to describe a ball that is both big and red can simply check that big red is more common on the web than red big, and order the adjectives accordingly. [sent-27, score-0.779]
10 For example, ordering adjectives by direct web evidence performs 7% worse than our best supervised system (Section 3. [sent-29, score-0.424]
11 For example, there are currently no pages indexed by Google with the preferred adjective ordering for bedraggled 56-year-old [professor]. [sent-32, score-0.426]
12 Systems trained on labeled data can learn the domain usage and leverage other regularities, such as suffixes and transitivity for adjective ordering. [sent-34, score-0.378]
13 How well do supervised and unsupervised NLP systems perform when used uncustomized, out-of-the-box on new domains, and how can we best design our systems for robust open-domain performance? [sent-45, score-0.125]
14 For our supervised approaches, we represent the examples as feature vectors, and learn a classifier on the training vectors. [sent-56, score-0.228]
15 N-GM features are real-valued features giving the log-count of a particular N-gram in the auxiliary web corpus. [sent-58, score-0.309]
16 LEX features are binary features that indicate the presence or absence of a particular string at a given position in the input. [sent-59, score-0.178]
17 We plot learning curves to measure the accuracy of the classifier when the num- ber of labeled training examples varies. [sent-65, score-0.222]
18 The size of the N-gram data and its counts remain constant. [sent-66, score-0.124]
19 2 Tasks and Labeled Data We study two generation tasks: prenominal adjective ordering (Section 3) and context-sensitive spelling correction (Section 4), followed by two analysis tasks: noun compound bracketing (Section 5) and verb part-of-speech disambiguation (Section 6). [sent-70, score-1.107]
20 We describe how labeled adjective and spelling examples are created from these corpora in the corresponding sections. [sent-76, score-0.448]
21 The third enhancement is especially relevant here, as we can use the POS distribution to collect counts for N-grams of mixed words and tags. [sent-95, score-0.124]
22 For example, we have developed an N-gram search engine that can count how often the adjective unprecedented precedes another adjective in our web corpus (113K times) and how often it follows one (11K times). [sent-96, score-0.624]
23 Thus, even if we haven’t seen a particular adjective pair directly, we can use the positional preferences of each adjective to order them. [sent-97, score-0.454]
24 3 Prenominal Adjective Ordering Prenominal adjective ordering strongly affects text readability. [sent-100, score-0.426]
25 For example, while the unprecedented statistical revolution is fluent, the statistical unprecedented revolution is not. [sent-101, score-0.168]
26 Many NLP systems need to handle adjective ordering robustly. [sent-102, score-0.426]
27 In machine translation, if a noun has two adjective modifiers, they must be ordered correctly in the target language. [sent-103, score-0.361]
28 Adjective ordering is also needed in Natural Language Generation systems that produce information from databases; for example, to convey information (in sentences) about medical patients (Shaw and Hatzivassiloglou, 1999). [sent-104, score-0.199]
29 We focus on the task of ordering a pair of adjectives independently of the noun they modify and achieve good performance in this setting. [sent-105, score-0.382]
30 Following the set-up of Malouf (2000), we experiment on the 263K adjective pairs Malouf extracted from the British National Corpus (BNC). [sent-106, score-0.258]
31 We create examples from all sequences of two adjectives followed by a noun. [sent-111, score-0.182]
32 Since the N-gram data includes case, we merge counts from the upper and lower case combinations. [sent-117, score-0.124]
33 1 LEX features Our adjective ordering model with LEX features is a novel contribution of this paper. [sent-124, score-0.604]
34 We begin with two features for each pair: an indicator feature for a1, which gets a feature value of +1, and an indicator feature for a2, which gets a feature value of −1. [sent-125, score-0.155]
35 If the alphabetic ordering is correct, the weight on a1 should be higher than the weight on a2, so that the classifier returns a positive score. [sent-128, score-0.251]
36 If the reverse ordering is preferred, a2 should receive a higher weight. [sent-129, score-0.199]
37 Training the model in this setting is a matter of assigning weights to all the observed adjectives such that the training pairs are maximally ordered correctly. [sent-130, score-0.181]
38 The feature weights thus implicitly produce a linear ordering of all observed adjectives. [sent-131, score-0.199]
39 wn to improve adjective ordering, there are many conflicting pairs that make a strict linear ordering of adjectives impossible (Malouf, 2000). [sent-135, score-0.535]
40 Finally, we also have features for all suffixes of length 1-to-4 letters, as these encode useful information about adjective class (Malouf, 2000). [sent-138, score-0.316]
41 Like the adjective features, the suffix features receive a value of +1 for adjectives in the first position and −1 for those in the second. [sent-139, score-0.394]
42 2 N-GM features Lapata and Keller (2005) propose a web-based approach to adjective ordering: take the most867 TSwaMbyVealsMbtoe1cmuw(: faiA1(t2,hd0ajLN2e0)Ec-XGtviMsef. [sent-142, score-0.316]
43 We merge the counts for the adjectives occurring contiguously and separated by a comma. [sent-150, score-0.202]
44 These are indubitably the most important N-GM features; we include them but also other, tag-based counts from Google V2. [sent-151, score-0.124]
45 Raw counts include cases where one of the adjectives is not used as a modifier: “the special present was” vs. [sent-152, score-0.202]
46 We also include features for the log-counts of each adjective preceded or followed by a word matching an adjective-tag: c(a1 J. [sent-159, score-0.344]
47 The more frequent adjective occurs first 57% of the time. [sent-166, score-0.227]
48 As in all tasks, the counts are features in a classifier, so the importance of the different patterns is weighted discriminatively during training. [sent-167, score-0.246]
49 With fewer training examples, the systems with N-GM features strongly outperform the LEX-only system. [sent-178, score-0.132]
50 Number of training examples Figure 1: In-domain learning curve of adjective ordering classifiers on BNC. [sent-180, score-0.658]
51 Number of training examples Figure 2: Out-of-domain learning curve of adjective ordering classifiers on Gutenberg. [sent-181, score-0.658]
52 While other ordering models have also achieved “very poor results” out-of-domain (Mitchell, 2009), we expected our expanded set of LEX features to provide good generalization on new data. [sent-187, score-0.288]
53 N-GM features do not rely on specific pairs in training data, and thus remain fairly robust crossdomain. [sent-189, score-0.196]
54 Across the three test sets, 84-89% of examples had the correct ordering appear at least once on the web. [sent-190, score-0.275]
55 The system disregards the robust N-gram counts as it is more and more confident in the LEX features, and it suffers the consequences. [sent-199, score-0.157]
56 4 Context-Sensitive Spelling Correction We now turn to the generation problem of contextsensitive spelling correction. [sent-200, score-0.122]
57 There are 100K training, 10K development, and 10K test examples for each confusion set. [sent-207, score-0.155]
58 3) The baseline predicts the most frequent member of each confusion set, based on frequencies in the NYT training data. [sent-217, score-0.122]
59 Number of training examples Figure 3: In-domain learning curve of spelling correction classifiers on NYT. [sent-219, score-0.39]
60 1 Supervised Spelling Correction Our LEX features are typical disambiguation features that flag specific aspects of the context. [sent-221, score-0.216]
61 We have features for the words at all positions in a 9-word window (called collocation features by Golding and Roth (1999)), plus indicators for a particular word preceding or following the confusable word. [sent-222, score-0.247]
62 We include the log-counts of all N-grams that span the confusable word, with each word in the confusion set filling the N-gram pattern. [sent-226, score-0.148]
63 (2009), we get N-gram counts using the original Google N-gram Corpus. [sent-229, score-0.124]
64 Note the learning curves for N-GM+LEX on Gutenberg and Medline (not shown) do not display the decrease that we observed in adjective ordering (Figure 2). [sent-244, score-0.426]
65 5 Noun Compound Bracketing About 70% of web queries are noun phrases (Barr et al. [sent-249, score-0.195]
66 For example, a web query for zebra hair straightener should be bracketed as (zebra (hair straightener)), a stylish hair straightener with zebra print, rather than ((zebra hair) straightener), a useless product since the fur of zebras is already quite straight. [sent-251, score-0.532]
67 The noun compound (NC) bracketing task is usually cast as a decision whether a 3-word NC has a left or right bracketing. [sent-252, score-0.271]
68 Number of labeled examples Figure 4: In-domain NC-bracketer learning curve from sections 0-22 of the Treebank as training, 72 from section 24 for development and 95 from section 23 as a test set. [sent-263, score-0.194]
69 1 Supervised Noun Bracketing Our LEX features indicate the specific noun at each position in the compound, plus the three pairs of nouns and the full noun triple. [sent-267, score-0.33]
70 Following Nakov and Hearst (2005), we also include counts of noun pairs collapsed into a single token; if a pair occurs often on the web as a single unit, it strongly indicates the pair is a constituent. [sent-271, score-0.35]
71 The absence of a sufficient amount of labeled data explains why NC-bracketing is generally regarded as a task where corpus counts are crucial. [sent-285, score-0.203]
72 With little training data and crossdomain usage, N-gram features are essential. [sent-290, score-0.171]
73 For example, in the troops stationed in Iraq, the verb stationed is a VBN; troops is the head of the phrase. [sent-293, score-0.389]
74 On the other hand, for the troops vacationed in Iraq, the verb vacationed is a VBD and also the head. [sent-294, score-0.285]
75 , the global lexical relation between the noun and verb (E. [sent-300, score-0.168]
76 , troops tends to be the object of stationed but the subject of vacationed). [sent-302, score-0.163]
77 For out-of-domain data, we get 21K 6HMM-style taggers, like the fast TnT tagger used on our web corpus, do not use bilexical features, and so perform especially poorly on these cases. [sent-306, score-0.138]
78 examples from the Brown portion of the Treebank and 6296 examples from tagged Medline abstracts in the PennBioIE corpus (Kulick et al. [sent-308, score-0.18]
79 1 LEX features For 1), we use indicators for the noun and verb, the noun-verb pair, whether the verb is on an inhouse list of said-verb (like warned, announced, etc. [sent-316, score-0.257]
80 ), whether the noun is capitalized and whether it’s upper-case. [sent-317, score-0.133]
81 For 2), we provide indicator features for the words before the noun and after the verb. [sent-321, score-0.227]
82 2 N-GM features For 1), we characterize a noun-verb relation via features for the pair’s distribution in Google V2. [sent-324, score-0.178]
83 We extract the 20 most-frequent N-grams that contain both the noun and the verb in the pair. [sent-326, score-0.168]
84 We mask the noun of interest as N and the verb of interest as V. [sent-328, score-0.168]
85 For 2), we use counts for the verb’s context cooccurring with a VBD or VBN tag. [sent-333, score-0.124]
86 , we see whether VBD cases like troops ate or VBN cases like troops eaten are more frequent. [sent-336, score-0.277]
87 Although our corpus contains many VBN/VBD errors, we hope the errors are random enough for aggregate counts to be useful. [sent-337, score-0.152]
88 Number of training examples Figure 5: Out-of-domain learning curve of verb disambiguation classifiers on Medline. [sent-344, score-0.333]
89 We include separate count features for contexts matching the specific noun and for when the noun token can match any word tagged as a noun. [sent-346, score-0.299]
90 ContextSum: We use these context counts in an unsupervised system, ContextSum. [sent-347, score-0.159]
91 With two views of an example, LEX is more likely to have domain-neutral features to draw on. [sent-363, score-0.119]
92 Also, the Treebank provides an atypical num- ber of labeled examples for analysis tasks. [sent-365, score-0.127]
93 In other tasks we only had a handful of N-GM features; here there are 21K features for the distributional patterns of N,V pairs. [sent-371, score-0.153]
94 Counts from any large auxiliary corpus may also help, but web counts should help more (Lapata and Keller, 2005). [sent-377, score-0.283]
95 Our results suggest better features, such as web pattern counts, may help more than expanding training data. [sent-382, score-0.133]
96 In some sense, using web counts as features is a form of domain adaptation: adapting a web model to the training domain. [sent-384, score-0.473]
97 How do we ensure these features are adapted well and not used in domain-specific ways (especially with many features to adapt, as in Section 6)? [sent-385, score-0.178]
98 When less training data is used, or when the system is used on a different domain, N-gram features greatly improve performance. [sent-395, score-0.132]
99 The order of prenominal adjectives in natural language generation. [sent-496, score-0.173]
100 Search engine statistics beyond the n-gram: Application to noun compound bracketing. [sent-523, score-0.196]
wordName wordTfidf (topN-words)
[('lex', 0.514), ('gutenberg', 0.317), ('medline', 0.312), ('adjective', 0.227), ('ordering', 0.199), ('malouf', 0.14), ('bergsma', 0.134), ('counts', 0.124), ('noun', 0.105), ('troops', 0.104), ('vbd', 0.099), ('prenominal', 0.095), ('vbn', 0.095), ('spelling', 0.094), ('compound', 0.091), ('web', 0.09), ('nyt', 0.089), ('vadas', 0.089), ('features', 0.089), ('google', 0.084), ('confusion', 0.079), ('straightener', 0.079), ('zebra', 0.079), ('adjectives', 0.078), ('keller', 0.077), ('examples', 0.076), ('bracketing', 0.075), ('nakov', 0.074), ('confusable', 0.069), ('curve', 0.067), ('correction', 0.064), ('hair', 0.063), ('verb', 0.063), ('wsj', 0.059), ('grolier', 0.059), ('stationed', 0.059), ('vacationed', 0.059), ('bnc', 0.058), ('supervised', 0.057), ('brants', 0.057), ('classifier', 0.052), ('lauer', 0.052), ('golding', 0.052), ('unprecedented', 0.052), ('lapata', 0.051), ('labeled', 0.051), ('poorly', 0.048), ('classifiers', 0.046), ('tnt', 0.045), ('training', 0.043), ('domains', 0.042), ('auxiliary', 0.041), ('shane', 0.04), ('ailon', 0.039), ('crftagger', 0.039), ('crossdomain', 0.039), ('eaten', 0.039), ('ncs', 0.039), ('plentiful', 0.039), ('regularize', 0.039), ('curran', 0.039), ('disambiguation', 0.038), ('domain', 0.037), ('thorsten', 0.037), ('unsupervised', 0.035), ('iraq', 0.035), ('rimell', 0.035), ('transitivity', 0.034), ('robust', 0.033), ('patterns', 0.033), ('indicator', 0.033), ('regularization', 0.033), ('biomedical', 0.032), ('emily', 0.032), ('pitler', 0.032), ('kulick', 0.032), ('modjeska', 0.032), ('preslav', 0.032), ('revolution', 0.032), ('tasks', 0.031), ('pairs', 0.031), ('red', 0.03), ('nlp', 0.03), ('views', 0.03), ('ate', 0.03), ('church', 0.03), ('shaw', 0.03), ('barr', 0.03), ('kilgarriff', 0.03), ('treebank', 0.03), ('usage', 0.029), ('ordered', 0.029), ('hearst', 0.029), ('generation', 0.028), ('sight', 0.028), ('liblinear', 0.028), ('capitalized', 0.028), ('tsuruoka', 0.028), ('corpus', 0.028), ('followed', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
Author: Shane Bergsma ; Emily Pitler ; Dekang Lin
Abstract: In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
2 0.11431456 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
3 0.10739067 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
Author: Xu Sun ; Jianfeng Gao ; Daniel Micol ; Chris Quirk
Abstract: This paper explores the use of clickthrough data for query spelling correction. First, large amounts of query-correction pairs are derived by analyzing users' query reformulation behavior encoded in the clickthrough data. Then, a phrase-based error model that accounts for the transformation probability between multi-term phrases is trained and integrated into a query speller system. Experiments are carried out on a human-labeled data set. Results show that the system using the phrase-based error model outperforms cantly its baseline systems. 1 signifi-
4 0.1024526 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
Author: Valentin I. Spitkovsky ; Daniel Jurafsky ; Hiyan Alshawi
Abstract: We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we refine approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning’s Dependency Model with Valence (DMV) on this data set: parsing accuracy on Section 23 (all sentences) of the Wall Street Journal corpus jumps to 50.4%, beating previous state-of-the- art by more than 5%. Web-scale experiments show that the DMV, perhaps because it is unlexicalized, does not benefit from orders of magnitude more annotated but noisier data. Our model, trained on a single blog, generalizes to 53.3% accuracy out-of-domain, against the Brown corpus nearly 10% higher than the previous published best. The fact that web mark-up strongly correlates with syntactic structure may have broad applicability in NLP.
5 0.097065084 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
Author: Stephen Tratz ; Eduard Hovy
Abstract: The automatic interpretation of noun-noun compounds is an important subproblem within many natural language processing applications and is an area of increasing interest. The problem is difficult, with disagreement regarding the number and nature of the relations, low inter-annotator agreement, and limited annotated data. In this paper, we present a novel taxonomy of relations that integrates previous relations, the largest publicly-available annotated dataset, and a supervised classification method for automatic noun compound interpretation.
6 0.090972096 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
7 0.089899614 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
8 0.08553873 226 acl-2010-The Human Language Project: Building a Universal Corpus of the World's Languages
9 0.081384443 158 acl-2010-Latent Variable Models of Selectional Preference
10 0.074277073 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
11 0.072998658 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
12 0.072947413 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
13 0.064437665 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences
14 0.063488103 25 acl-2010-Adapting Self-Training for Semantic Role Labeling
15 0.062397379 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation
16 0.061711229 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
17 0.061686493 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
18 0.05944027 114 acl-2010-Faster Parsing by Supertagger Adaptation
19 0.05843813 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
20 0.056989357 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
topicId topicWeight
[(0, -0.194), (1, 0.051), (2, 0.007), (3, -0.011), (4, 0.026), (5, -0.007), (6, 0.025), (7, 0.019), (8, 0.06), (9, 0.032), (10, -0.024), (11, 0.061), (12, -0.021), (13, -0.048), (14, 0.041), (15, 0.09), (16, 0.069), (17, 0.009), (18, 0.127), (19, 0.004), (20, 0.039), (21, 0.037), (22, 0.077), (23, 0.008), (24, -0.035), (25, -0.007), (26, 0.091), (27, 0.08), (28, -0.097), (29, 0.039), (30, -0.093), (31, -0.044), (32, 0.044), (33, -0.016), (34, 0.082), (35, -0.039), (36, -0.079), (37, -0.005), (38, -0.108), (39, 0.074), (40, 0.126), (41, 0.212), (42, 0.047), (43, -0.083), (44, -0.066), (45, -0.028), (46, 0.006), (47, 0.01), (48, 0.005), (49, -0.055)]
simIndex simValue paperId paperTitle
same-paper 1 0.93031448 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
Author: Shane Bergsma ; Emily Pitler ; Dekang Lin
Abstract: In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
2 0.708368 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation
Author: Stephen Tratz ; Eduard Hovy
Abstract: The automatic interpretation of noun-noun compounds is an important subproblem within many natural language processing applications and is an area of increasing interest. The problem is difficult, with disagreement regarding the number and nature of the relations, low inter-annotator agreement, and limited annotated data. In this paper, we present a novel taxonomy of relations that integrates previous relations, the largest publicly-available annotated dataset, and a supervised classification method for automatic noun compound interpretation.
3 0.6630497 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
Author: Nathanael Chambers ; Dan Jurafsky
Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.
4 0.65518385 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
Author: Valentin I. Spitkovsky ; Daniel Jurafsky ; Hiyan Alshawi
Abstract: We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we refine approximate partial phrase boundaries to yield accurate parsing constraints. Conversion procedures fall out of our linguistic analysis of a newly available million-word hyper-text corpus. We demonstrate that derived constraints aid grammar induction by training Klein and Manning’s Dependency Model with Valence (DMV) on this data set: parsing accuracy on Section 23 (all sentences) of the Wall Street Journal corpus jumps to 50.4%, beating previous state-of-the- art by more than 5%. Web-scale experiments show that the DMV, perhaps because it is unlexicalized, does not benefit from orders of magnitude more annotated but noisier data. Our model, trained on a single blog, generalizes to 53.3% accuracy out-of-domain, against the Brown corpus nearly 10% higher than the previous published best. The fact that web mark-up strongly correlates with syntactic structure may have broad applicability in NLP.
5 0.65361005 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
Author: Joel Tetreault ; Jennifer Foster ; Martin Chodorow
Abstract: Jennifer Foster NCLT Dublin City University Ireland j fo st er@ comput ing . dcu . ie Martin Chodorow Hunter College of CUNY New York, NY, USA martin . chodorow @hunter . cuny . edu We recreate a state-of-the-art preposition usage system (Tetreault and Chodorow (2008), henceWe evaluate the effect of adding parse features to a leading model of preposition us- age. Results show a significant improvement in the preposition selection task on native speaker text and a modest increment in precision and recall in an ESL error detection task. Analysis of the parser output indicates that it is robust enough in the face of noisy non-native writing to extract useful information.
6 0.57257503 139 acl-2010-Identifying Generic Noun Phrases
7 0.52319485 117 acl-2010-Fine-Grained Genre Classification Using Structural Learning Algorithms
8 0.51785266 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields
9 0.50603515 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries
10 0.50410426 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs
11 0.49544275 256 acl-2010-Vocabulary Choice as an Indicator of Perspective
12 0.49180022 2 acl-2010-"Was It Good? It Was Provocative." Learning the Meaning of Scalar Adjectives
13 0.47131422 85 acl-2010-Detecting Experiences from Weblogs
14 0.46647573 150 acl-2010-Inducing Domain-Specific Semantic Class Taggers from (Almost) Nothing
15 0.46447718 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs
16 0.45703477 164 acl-2010-Learning Phrase-Based Spelling Error Models from Clickthrough Data
17 0.45013908 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars
18 0.44064957 130 acl-2010-Hard Constraints for Grammatical Function Labelling
19 0.43302685 111 acl-2010-Extracting Sequences from the Web
20 0.43086064 114 acl-2010-Faster Parsing by Supertagger Adaptation
topicId topicWeight
[(14, 0.02), (25, 0.07), (39, 0.01), (42, 0.033), (44, 0.013), (59, 0.112), (71, 0.013), (72, 0.011), (73, 0.052), (76, 0.014), (78, 0.061), (80, 0.036), (83, 0.093), (84, 0.03), (92, 0.226), (98, 0.119)]
simIndex simValue paperId paperTitle
same-paper 1 0.79163742 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
Author: Shane Bergsma ; Emily Pitler ; Dekang Lin
Abstract: In this paper, we systematically assess the value of using web-scale N-gram data in state-of-the-art supervised NLP classifiers. We compare classifiers that include or exclude features for the counts of various N-grams, where the counts are obtained from a web-scale auxiliary corpus. We show that including N-gram count features can advance the state-of-the-art accuracy on standard data sets for adjective ordering, spelling correction, noun compound bracketing, and verb part-of-speech disambiguation. More importantly, when operating on new domains, or when labeled training data is not plentiful, we show that using web-scale N-gram features is essential for achieving robust performance.
2 0.67902404 158 acl-2010-Latent Variable Models of Selectional Preference
Author: Diarmuid O Seaghdha
Abstract: This paper describes the application of so-called topic models to selectional preference induction. Three models related to Latent Dirichlet Allocation, a proven method for modelling document-word cooccurrences, are presented and evaluated on datasets of human plausibility judgements. Compared to previously proposed techniques, these models perform very competitively, especially for infrequent predicate-argument combinations where they exceed the quality of Web-scale predictions while using relatively little data.
3 0.67609221 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
Author: Omri Abend ; Ari Rappoport
Abstract: The core-adjunct argument distinction is a basic one in the theory of argument structure. The task of distinguishing between the two has strong relations to various basic NLP tasks such as syntactic parsing, semantic role labeling and subcategorization acquisition. This paper presents a novel unsupervised algorithm for the task that uses no supervised models, utilizing instead state-of-the-art syntactic induction algorithms. This is the first work to tackle this task in a fully unsupervised scenario.
4 0.67455024 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
5 0.67423362 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
6 0.67134798 160 acl-2010-Learning Arguments and Supertypes of Semantic Relations Using Recursive Patterns
7 0.67116416 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.67074752 25 acl-2010-Adapting Self-Training for Semantic Role Labeling
9 0.67051351 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models
10 0.6704632 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
11 0.66981637 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
12 0.66937596 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
13 0.66805756 248 acl-2010-Unsupervised Ontology Induction from Text
14 0.66665941 130 acl-2010-Hard Constraints for Grammatical Function Labelling
15 0.66632366 71 acl-2010-Convolution Kernel over Packed Parse Forest
16 0.66618657 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
17 0.66603798 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences
18 0.66575754 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
19 0.6630137 121 acl-2010-Generating Entailment Rules from FrameNet
20 0.6626904 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields