acl acl2011 acl2011-333 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohit Bansal ; Dan Klein
Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.
Reference: text
sentIndex sentText sentNum sentScore
1 Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. [sent-4, score-0.8]
2 In this work, we first present a method for generating web count features that address the full range of syntactic attachments. [sent-5, score-0.367]
3 We then integrate our features into full-scale dependency and constituent parsers. [sent-7, score-0.363]
4 0% over the second-order dependency parser of McDonald and Pereira (2006), 9. [sent-9, score-0.238]
5 From a dependency viewpoint, structural errors can be cast as incorrect attachments, even for constituent (phrase-structure) parsers. [sent-14, score-0.288]
6 , 2006), about 20% of the errors are prepositional phrase attachment errors as in Figure 1, where a preposition-headed (IN) phrase was assigned an incorrect parent in the implied dependency tree. [sent-16, score-0.532]
7 Here, the Berkeley parser (solid blue edges) incorrectly attaches from debt to the noun phrase $ 30 billion whereas the correct attachment (dashed gold edges) is to the verb raising. [sent-17, score-0.857]
8 Here, (a) is a non-canonical PP 693 … rVaiBsiGng NVPP NP PP $ 30 billion from debt … Figure 1: A PP attachment error in the parse output of the Berkeley parser (on Penn Treebank). [sent-19, score-0.637]
9 Guess edges are in solid blue, gold edges are in dashed gold and edges common in guess and gold parses are in black. [sent-20, score-0.4]
10 attachment ambiguity where by yesterday afternoon should attach to had already, (b) is an NP-internal ambiguity where half a should attach to dozen and not to newspapers, and (c) is an adverb attachment ambiguity, where just should modify fine and not the verb ’s. [sent-21, score-1.05]
11 One way to access more information is to exploit surface counts from large corpora like the web (Volk, 2001 ; Lapata and Keller, 2004). [sent-23, score-0.297]
12 For example, the phrase raising from is much more frequent on the Web than $ x billion from. [sent-24, score-0.306]
13 While this ‘affinity’ is only a surface correlation, Volk (2001) showed that comparing such counts can often correctly resolve tricky PP attachments. [sent-25, score-0.239]
14 For example, Nakov and Hearst (2005b) showed that looking for paraphrase counts can further improve PP resolution. [sent-27, score-0.284]
15 NPby est rdPaSPy afterhnaod naVlPready… PhDa(lbTf) DaTQNP dPoDzeTn ewPsDpTapers…VB´sZ(c)AVjRDuPsBVt PADJPAfiJDnJe P Figure 2: Different kinds of attachment errors in the parse output of the Berkeley parser (on Penn Treebank). [sent-31, score-0.491]
16 Guess edges are in solid blue, gold edges are in dashed gold and edges common in guess and gold parses are in black. [sent-32, score-0.4]
17 Still other work has exploited Web counts for other isolated ambiguities, such as NP coordination (Nakov and Hearst, 2005b) and noun-sequence bracketing (Nakov and Hearst, 2005a; Pitler et al. [sent-34, score-0.379]
18 In this paper, we show how to apply these ideas to all attachments in full-scale parsing. [sent-37, score-0.259]
19 Affinity features are relatively straightforward, but paraphrase features, which have been hand-developed in the past, are more complex. [sent-40, score-0.265]
20 For dependency parsing, we augment the features in the second-order parser of McDonald and Pereira (2006). [sent-42, score-0.366]
21 For constituent parsing, we rerank the output of the Berkeley parser (Petrov et al. [sent-43, score-0.317]
22 Third, past systems have usually gotten their counts from web search APIs, which does not scale to quadratically-many attachments in each sentence. [sent-45, score-0.519]
23 Given the success of Web counts for isolated ambiguities, there is relatively little previous research in this direction. [sent-47, score-0.234]
24 (2010), which use Web-scale n-gram counts for multi-way noun bracketing decisions, though that work considers only sequences of nouns and uses only affinity-based web features. [sent-49, score-0.421]
25 (2008) smooth the sparseness of lexical features in a discriminative dependency parser by using clusterbased word-senses as intermediate abstractions in 694 addition to POS tags (also see Finkel et al. [sent-53, score-0.409]
26 To show end-to-end effectiveness, we incorporate our features into state-of-the-art dependency and constituent parsers. [sent-57, score-0.363]
27 For the dependency case, we can integrate them into the dynamic programming of a base parser; we use the discriminativelytrained MST dependency parser (McDonald et al. [sent-58, score-0.351]
28 2 Web-count Features Structural errors in the output of state-of-the-art parsers, constituent or dependency, can be viewed as attachment errors, examples of which are Figure 1 and Figure 2. [sent-69, score-0.488]
29 1 One way to address attachment errors is through features which factor over head-argument 1For constituent parsers, there can be minor tree variations which can result in the same set of induced dependencies, but these are rare in comparison. [sent-70, score-0.616]
30 ($ raising $ from from) debt Figure 3: Features factored over head-argument pairs. [sent-74, score-0.257]
31 Here, we discuss which webcount based features φ(h, a) should fire over a given head-argument pair (we consider the words h and a to be indexed, and so features can be sensitive to their order and distance, as is also standard). [sent-76, score-0.256]
32 The approach of Lauer (1995), for example, would be to take an ambiguous noun sequence like hydrogen ion exchange and compare the various counts (or associated con- ditional probabilities) of n-grams like hydrogen ion and hydrogen exchange. [sent-79, score-0.472]
33 Our affinity features closely follow this basic idea of association statistics. [sent-83, score-0.291]
34 However, because a real parser will not have access to gold-standard knowledge of the competing attachment sites (see Atterer and Schutze (2007)’s criticism of previous work), we must instead compute features for all possible head-argument pairs from our web corpus. [sent-84, score-0.679]
35 Moreover, when there are only two competing attachment options, one can do things like directly compare two count-based heuristics and choose the larger. [sent-85, score-0.313]
36 Integration into a parser requires features to be functions of single attachments, not pairwise comparisons between alternatives. [sent-86, score-0.253]
37 We employ a collection of affinity features of varying specificity. [sent-88, score-0.291]
38 First, rather than a single all-purpose feature like ADJ, the utility of such query counts will vary according to aspects like the parts-of-speech of h and a (because a high adjacency count is not equally informative for all kinds of attachments). [sent-93, score-0.327]
39 Hence, we add more refined affinity features that are specific to each pair of POS tags, i. [sent-94, score-0.291]
40 Second, using real-valued features did not work as well as binning the query-counts (we used b = floor(logr (count)/5) ∗ 5) and then firing indicator features (AcoDuJn t∧) POS(h) a∧n POS(a) ∧rin bg finorvdiacluaetosr ro ffe ba udrefeisne AdD by ∧the P query c ∧ou PnOt. [sent-98, score-0.411]
41 For all features used, we add cumulative variants where indicators are fired for all count bins b0 up to query count bin b. [sent-104, score-0.432]
42 2 Paraphrase Features In addition to measuring counts of the words present in the sentence, there exist clever ways in which paraphrases and other accidental indicators can help resolve specific ambiguities, some of which are discussed in Nakov and Hearst (2005a), Nakov and Hearst (2005b). [sent-106, score-0.247]
43 For example, finding attestations of eat :spaghetti with sauce suggests a nominal attachment in Jean ate spaghetti with sauce. [sent-107, score-0.357]
44 As another example, one clue that the example in Figure 1 is a verbal attachment is that the proform paraphrase raising it from is commonly attested. [sent-108, score-0.632]
45 These paraphrase features hint at the correct at- tachment decision by looking for web n-grams with special contexts that reveal syntax superficially. [sent-110, score-0.378]
46 Again, while effective in their isolated disambiguation tasks, past work has been limited by both the range of attachments considered and the need to intuit these special contexts. [sent-111, score-0.393]
47 For instance, frequency of the pattern The noun prep suggests noun attachment and of the pattern verb adverb prep suggests verb attachment for the preposition in the phrase verb noun prep, but these features were not in the manually brainstormed list. [sent-112, score-1.522]
48 In this work, we automatically generate a large number of paraphrase-style features for arbitrary attachment ambiguities. [sent-113, score-0.441]
49 For example, for h = raising and a = from (see Figure 1), we look at web n-grams of the form raising c from and see that one of the most frequent values of c on the web turns out to be the word it. [sent-117, score-0.647]
50 Note that h and a are head and argument words and so actually occur in the sentence, but c is a context word that generally does not. [sent-119, score-0.284]
51 The idea is that if frequent occurrences of raising it from indicated a correct attachment between raising and from, frequent occurrences of lowering it with will indicate the correct696 ness of an attachment between lowering and with. [sent-122, score-1.244]
52 Finally, to handle the cases where no induced context word is helpful, we also construct abstracted versions of these paraphrase features where the con- text words c are collapsed to their parts-of-speech POS(c), obtained using a unigram-tagger trained on the parser training set. [sent-123, score-0.39]
53 As discussed in Section 5, the top features learned by our learning algorithm duplicate the hand-crafted configurations used in previous work (Nakov and Hearst, 2005b) but also add numerous others, and, of course, apply to many more attachment types. [sent-124, score-0.441]
54 One challenge with this approach is that an external search API is now embedded into the parser, raising issues of both speed and daily query limits, especially if all possible attachments trigger queries. [sent-128, score-0.542]
55 The most basic queries are counts of head-argument pairs in contiguous h a and gapped h ? [sent-137, score-0.243]
56 2 Here, we describe how we process queries – – 2Paraphrase features give situations where we query ? [sent-139, score-0.325]
57 The entry for q1 points to an inner hashmap whose key is the final word q2 of the query bigram. [sent-146, score-0.353]
58 In similar ways, we also mine the most frequent words that occur before, in between and after the head and argument query pairs. [sent-168, score-0.442]
59 For example, to col- lect mid words, we go through the 3-grams w1w2w3; if w1 matches q¯1 in the outer hashmap and w3 occurs in the inner hashmap for ¯q 1, then we store w2 and the count of the 3-gram. [sent-169, score-0.544]
60 We also collect unigram counts of the head and argument words by sweeping over the unigrams once. [sent-171, score-0.391]
61 Web N-grams Query Count-Trie Figure 4: Trie-based nested hashmap for collecting counts of queries. [sent-190, score-0.298]
62 4 Parsing Experiments Our features are designed to be used in full-sentence parsing rather than for limited decisions about isolated ambiguities. [sent-193, score-0.299]
63 We first integrate our features into a dependency parser, where the integration is more natural and pushes all the way into the underlying dynamic program. [sent-194, score-0.241]
64 We then add them to a constituent parser in a reranking approach. [sent-195, score-0.319]
65 We used the ‘pennconverter’5 tool to convert Penn trees from constituent format to dependency format. [sent-201, score-0.235]
66 6 Table 1 shows unlabeled attachments scores (UAS) for their second-order projective parser and the improved numbers resulting from the addition of our Web-scale features. [sent-221, score-0.384]
67 2 Constituent Parsing We also evaluate the utility of web-scale features on top of a state-of-the-art constituent parser the Berkeley parser (Petrov et al. [sent-225, score-0.5]
68 Because the underlying parser does not factor along lexical attachments, we instead adopt the discriminative reranking framework, where we generate the top-k candidates from the baseline system and then rerank this k-best list using (generally non-local) features. [sent-227, score-0.31]
69 (2009), Koo and Collins (2010) has been exploring more nonlocal features for dependency parsing. [sent-231, score-0.241]
70 It will be interesting to see how these features interact with our web features. [sent-232, score-0.241]
71 1540 Table 2: Oracle F1-scores for k-best lists output by Berkeley parser for English WSJ parsing (Dev is section 22 and Test is section 23, all lengths). [sent-240, score-0.252]
72 The affinity and paraphrase features contribute about two-fifths and three-fifths of this improvement, respectively. [sent-250, score-0.428]
73 Finally, we rerank with both our Web-scale features and the configurational features. [sent-253, score-0.273]
74 11 5 Analysis Table 4 shows error counts and relative reductions that our web features provide over the 2nd-order dependency baseline. [sent-258, score-0.595]
75 While we do see substantial gains for classic PP (IN) attachment cases, we see equal or greater error reductions for a range of attachment types. [sent-259, score-0.767]
76 The columns depict the tag, its total attachments as argument, number of correct ones in baseline (McDonald and Pereira, 2006) and this work, and the relative error reduction. [sent-271, score-0.316]
77 Results are for dependency parsing on the dev set for iters:5,training-k:1. [sent-272, score-0.243]
78 1% total error reduction for attachments of an IN argument (which includes PPs as well as complementized SBARs) includes many errors where the gold attachments are to both noun and verb heads. [sent-275, score-0.995]
79 Similarly, for an NN-headed argument, the major corrections are for attachments to noun and verb heads, which includes both object-attachment ambiguities and coordination ambiguities. [sent-276, score-0.586]
80 We next investigate the features that were given high weight by our learning algorithm (in the constituent parsing case). [sent-277, score-0.334]
81 We list only the head and argument POS and the direction (arrow from head to arg). [sent-280, score-0.378]
82 Table 6 shows which affinity features received the highest weights, as well as examples of training set attachments for which the feature fired (for concreteness), suppressing both features involving punctua- tion and the features’ count and distance bins. [sent-284, score-0.757]
83 The second row (NN→IN) indicates that whether a preposition i rso appropriate t)o i natdtiaccaht etos a noun eist hwerel al captured by how often that preposition follows that noun. [sent-286, score-0.328]
84 All of these features essentially state cases where local surface counts are good indi- POSheadmid-wordPOSargExample (head, arg) 400) of the mid-word schema for a verb head and preposition argument (with head on left of argument). [sent-288, score-0.929]
85 Interestingly, the top such features capture exactly the intuition from Nakov and Hearst (2005b), namely that if the verb h and the preposition a occur with a pronoun in between, we have evidence that a attaches to h (it certainly can’t attach to the pronoun). [sent-293, score-0.503]
86 As another example of known useful features being learned automatically, Table 8 shows the previous-context-word paraphrase features for a noun head and preposition argument (N → IN). [sent-295, score-0.851]
87 Nnoauknov h aeandd H anedars pt (2005b) suggested tth (aNt th →e attestation of be N IN is a good indicator of attachment to the noun (the IN cannot generally attach to forms of auxiliaries). [sent-296, score-0.611]
88 We also find their surface marker / punc700 – – bfr-wordPOSheadPOSargExample (head, arg) 400) of the before-word schema for a noun head and preposition argument (with head on left of argument). [sent-298, score-0.694]
89 However, we additionally find other cues, most notably that if the N IN sequence occurs following a capitalized determiner, it tends to indicate a nominal attachment (in the n-gram, the preposition cannot attach leftward to anything else because of the beginning of the sentence). [sent-300, score-0.521]
90 In Table 9, we see the top-weight paraphrase features that had a conjunction as a middle-word cue. [sent-301, score-0.265]
91 These features essentially say that if two heads w1 and w2 occur in the direct coordination n-gram w1 and w2, then they are good heads to coordinate (coordination unfortunately looks the same as complementation or modification to a basic dependency model). [sent-302, score-0.463]
92 These features are relevant to a range of coordination ambiguities. [sent-303, score-0.259]
93 Finally, Table 10 depicts the high-weight, highcount general paraphrase-cue features for arbitrary head and argument categories, with those shown in previous tables suppressed. [sent-304, score-0.372]
94 The second entry (NN - NN) shows that one noun is a good modifier of another if they frequently appear together hyphenated (another punctuation-based cue mentioned in previous work on noun bracketing, see Nakov and Hearst (2005a)). [sent-309, score-0.24]
95 While they were motivated on separate grounds, these features can also compensate for inapplicability of the affinity features. [sent-310, score-0.291]
96 6 Conclusion Web features are a way to bring evidence from a large unlabeled corpus to bear on hard disambiguation decisions that are not easily resolvable based on limited parser training data. [sent-312, score-0.253]
97 Our approach allows revealing features to be mined for the entire range of attachment types and then aggregated and balanced in a full parsing setting. [sent-313, score-0.572]
98 Our results show that these web features resolve ambiguities not correctly handled by current state-of-the-art systems. [sent-314, score-0.379]
99 This research is sup701 POShPOSamid/bfr-wordExample (h, a) 2000) general features of the mid and before paraphrase schema (examples show head and arg in linear order with arrow from head to arg). [sent-316, score-0.735]
100 Exploiting the WWW as a corpus to resolve PP attachment ambiguities. [sent-437, score-0.368]
wordName wordTfidf (topN-words)
[('attachment', 0.313), ('nakov', 0.279), ('attachments', 0.259), ('raising', 0.182), ('hearst', 0.171), ('affinity', 0.163), ('hashmap', 0.151), ('counts', 0.147), ('paraphrase', 0.137), ('head', 0.134), ('features', 0.128), ('parser', 0.125), ('constituent', 0.122), ('preposition', 0.114), ('dependency', 0.113), ('web', 0.113), ('argument', 0.11), ('berkeley', 0.103), ('mcdonald', 0.101), ('query', 0.101), ('noun', 0.1), ('wsj', 0.099), ('queries', 0.096), ('arg', 0.096), ('attach', 0.094), ('pereira', 0.087), ('isolated', 0.087), ('coordination', 0.084), ('parsing', 0.084), ('ambiguities', 0.083), ('koo', 0.081), ('count', 0.079), ('configurational', 0.075), ('debt', 0.075), ('hydrogen', 0.075), ('iters', 0.075), ('reranking', 0.072), ('pp', 0.072), ('rerank', 0.07), ('lowering', 0.07), ('billion', 0.067), ('attaches', 0.067), ('volk', 0.066), ('prep', 0.065), ('schema', 0.065), ('edges', 0.064), ('pitler', 0.063), ('outer', 0.061), ('bracketing', 0.061), ('inner', 0.061), ('verb', 0.06), ('keller', 0.059), ('guess', 0.058), ('pos', 0.058), ('wildcards', 0.057), ('error', 0.057), ('frequent', 0.057), ('lapata', 0.056), ('adj', 0.056), ('collins', 0.056), ('resolve', 0.055), ('preslav', 0.055), ('indicator', 0.054), ('errors', 0.053), ('petrov', 0.052), ('gold', 0.05), ('ash', 0.05), ('atterer', 0.05), ('attestation', 0.05), ('hca', 0.05), ('nn', 0.05), ('heads', 0.049), ('range', 0.047), ('vbd', 0.047), ('reduction', 0.047), ('ambiguity', 0.047), ('dev', 0.046), ('marti', 0.046), ('indicators', 0.045), ('adverb', 0.044), ('jackknifing', 0.044), ('cha', 0.044), ('spaghetti', 0.044), ('penn', 0.044), ('discriminative', 0.043), ('lists', 0.043), ('tde', 0.041), ('mid', 0.041), ('vadas', 0.041), ('lauer', 0.041), ('sweep', 0.041), ('vb', 0.04), ('occur', 0.04), ('entry', 0.04), ('terry', 0.04), ('compound', 0.039), ('para', 0.038), ('dozen', 0.038), ('reductions', 0.037), ('surface', 0.037)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 333 acl-2011-Web-Scale Features for Full-Scale Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.
2 0.26637268 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai
Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.
3 0.21162733 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
Author: Shane Bergsma ; David Yarowsky ; Kenneth Church
Abstract: Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are reporting impressive numbers these days, but they don’t do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Europarl), and (3) unannotated monolingual (e.g. Google N-grams). Size matters: (1) is a million words, (2) is potentially billions of words and (3) is potentially trillions of words. The unannotated monolingual data is helpful when the ambiguity can be resolved through associations among the lexical items. The bilingual data is helpful when the ambiguity can be resolved by the order of words in the translation. We train separate classifiers with monolingual and bilingual features and iteratively improve them via achieves data and pervised tations. co-training. The co-trained classifier close to 96% accuracy on Treebank makes 20% fewer errors than a susystem trained with Treebank anno-
4 0.20992114 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
Author: Yue Zhang ; Joakim Nivre
Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.
5 0.19600762 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing
Author: Gholamreza Haffari ; Marzieh Razavi ; Anoop Sarkar
Abstract: We combine multiple word representations based on semantic clusters extracted from the (Brown et al., 1992) algorithm and syntactic clusters obtained from the Berkeley parser (Petrov et al., 2006) in order to improve discriminative dependency parsing in the MSTParser framework (McDonald et al., 2005). We also provide an ensemble method for combining diverse cluster-based models. The two contributions together significantly improves unlabeled dependency accuracy from 90.82% to 92. 13%.
6 0.17410769 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
7 0.15779623 167 acl-2011-Improving Dependency Parsing with Semantic Classes
8 0.14335512 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation
9 0.12867969 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks
10 0.12350806 225 acl-2011-Monolingual Alignment by Edit Rate Computation on Sentential Paraphrase Pairs
11 0.12310551 182 acl-2011-Joint Annotation of Search Queries
12 0.1193443 132 acl-2011-Extracting Paraphrases from Definition Sentences on the Web
13 0.11244203 282 acl-2011-Shift-Reduce CCG Parsing
14 0.10659286 224 acl-2011-Models and Training for Unsupervised Preposition Sense Disambiguation
15 0.10497263 258 acl-2011-Ranking Class Labels Using Query Sessions
16 0.10439503 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
17 0.1035469 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes
18 0.10332365 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing
19 0.10119464 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities
20 0.10006338 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
topicId topicWeight
[(0, 0.27), (1, -0.036), (2, -0.12), (3, -0.214), (4, -0.073), (5, -0.125), (6, 0.111), (7, -0.047), (8, 0.126), (9, -0.125), (10, 0.057), (11, 0.061), (12, -0.001), (13, -0.087), (14, 0.032), (15, 0.103), (16, -0.058), (17, 0.087), (18, -0.065), (19, 0.041), (20, -0.13), (21, -0.025), (22, -0.007), (23, 0.019), (24, 0.093), (25, -0.037), (26, 0.037), (27, 0.015), (28, -0.027), (29, -0.024), (30, 0.038), (31, 0.01), (32, 0.044), (33, -0.016), (34, 0.036), (35, 0.011), (36, 0.042), (37, -0.077), (38, -0.003), (39, 0.006), (40, 0.011), (41, 0.094), (42, 0.054), (43, -0.062), (44, 0.026), (45, 0.057), (46, 0.084), (47, -0.032), (48, -0.016), (49, 0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.95509189 333 acl-2011-Web-Scale Features for Full-Scale Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.
2 0.84618568 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai
Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.
Author: Roy Schwartz ; Omri Abend ; Roi Reichart ; Ari Rappoport
Abstract: Dependency parsing is a central NLP task. In this paper we show that the common evaluation for unsupervised dependency parsing is highly sensitive to problematic annotations. We show that for three leading unsupervised parsers (Klein and Manning, 2004; Cohen and Smith, 2009; Spitkovsky et al., 2010a), a small set of parameters can be found whose modification yields a significant improvement in standard evaluation measures. These parameters correspond to local cases where no linguistic consensus exists as to the proper gold annotation. Therefore, the standard evaluation does not provide a true indication of algorithm quality. We present a new measure, Neutral Edge Direction (NED), and show that it greatly reduces this undesired phenomenon.
4 0.74173468 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing
Author: Gholamreza Haffari ; Marzieh Razavi ; Anoop Sarkar
Abstract: We combine multiple word representations based on semantic clusters extracted from the (Brown et al., 1992) algorithm and syntactic clusters obtained from the Berkeley parser (Petrov et al., 2006) in order to improve discriminative dependency parsing in the MSTParser framework (McDonald et al., 2005). We also provide an ensemble method for combining diverse cluster-based models. The two contributions together significantly improves unlabeled dependency accuracy from 90.82% to 92. 13%.
5 0.72618282 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation
Author: Nathan Green
Abstract: Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency parse will have a cascading effect down the NLP pipeline and in the end, improve machine translation output, even with a reduction in parser accuracy that the noun phrase structure might cause. This paper examines this noun phrase structure’s effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2.43%, 0.23 Bleu score, improvement for English to Czech machine translation. .
6 0.72158331 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features
7 0.70931447 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks
8 0.66642141 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach
9 0.6334998 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
10 0.62790877 243 acl-2011-Partial Parsing from Bitext Projections
11 0.62456071 167 acl-2011-Improving Dependency Parsing with Semantic Classes
12 0.62437439 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing
13 0.58503819 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models
14 0.56986195 199 acl-2011-Learning Condensed Feature Representations from Large Unsupervised Data Sets for Supervised Learning
15 0.54403156 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
16 0.53618044 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers
17 0.53379476 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing
18 0.52142161 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
19 0.51552892 267 acl-2011-Reversible Stochastic Attribute-Value Grammars
20 0.51235491 282 acl-2011-Shift-Reduce CCG Parsing
topicId topicWeight
[(5, 0.019), (17, 0.064), (26, 0.336), (37, 0.163), (39, 0.06), (41, 0.051), (53, 0.017), (55, 0.026), (59, 0.03), (72, 0.032), (91, 0.04), (96, 0.087), (97, 0.013)]
simIndex simValue paperId paperTitle
1 0.92597562 105 acl-2011-Dr Sentiment Knows Everything!
Author: Amitava Das ; Sivaji Bandyopadhyay
Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research have been done, the existing reported solutions or available systems are still far from perfect or do not meet the satisfaction level of end users’ . The main issue is the various conceptual rules that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues and governs the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. In the present paper we propose a template based online interactive gaming technology, called Dr Sentiment to automatically create the PsychoSentiWordNet involving internet population. The PsychoSentiWordNet is an extension of SentiWordNet that presently holds human psychological knowledge on a few aspects along with sentiment knowledge.
same-paper 2 0.86415112 333 acl-2011-Web-Scale Features for Full-Scale Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.
3 0.85886711 115 acl-2011-Engkoo: Mining the Web for Language Learning
Author: Matthew R. Scott ; Xiaohua Liu ; Ming Zhou ; Microsoft Engkoo Team
Abstract: This paper presents Engkoo 1, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an application platform that supports a multitude of NLP technologies such as cross language retrieval, alignment, sentence classification, and statistical machine translation. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build perhaps the world’s largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net.
4 0.85029697 253 acl-2011-PsychoSentiWordNet
Author: Amitava Das
Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research has been done but still the existing reported solutions or available systems are far from perfect or to meet the satisfaction level of end user's. The main issue may be there are many conceptual rules that govern sentiment, and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues; govern the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. PsychoSentiWordNet is an extension over SentiWordNet that holds human psychological knowledge and sentiment knowledge simultaneously. 1
5 0.84044194 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents
Author: Emmanuel Prochasson ; Pascale Fung
Abstract: We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora. We obtain very high F-Measure between 80% and 98% for recognizing and extracting correct translations for rare terms (from 1to 5 occurrences). Moreover, we show that our system can be trained on a pair of languages and test on a different pair of languages, obtaining a F-Measure of 77% for the classification of Chinese-English translations using a training corpus of Spanish-French. Our method is therefore even potentially applicable to low resources languages without training data.
6 0.72161883 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
7 0.66415215 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction
8 0.65689421 256 acl-2011-Query Weighting for Ranking Model Adaptation
9 0.65014458 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
10 0.64505541 258 acl-2011-Ranking Class Labels Using Query Sessions
11 0.62420279 182 acl-2011-Joint Annotation of Search Queries
12 0.62189394 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes
13 0.59856433 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment
14 0.59706092 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines
15 0.59406245 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
16 0.59230006 292 acl-2011-Target-dependent Twitter Sentiment Classification
17 0.59205765 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
18 0.5857572 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities
19 0.58252585 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition
20 0.57751381 311 acl-2011-Translationese and Its Dialects