acl acl2010 acl2010-211 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. [sent-3, score-0.687]
2 We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. [sent-4, score-0.804]
3 We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. [sent-5, score-0.721]
4 Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. [sent-6, score-0.218]
5 Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding. [sent-8, score-0.388]
6 , 2003) or syntactic trees 1In this paper, a fragment means an elementary tree in a tree-substitution grammar, while a subtree means a fragment that bottoms out in terminals. [sent-12, score-0.449]
7 In this paper, we present a simplified parser which combines the two basic ideas, using both large fragments and symbol refinement, to provide non-local and local context respectively. [sent-22, score-0.677]
8 The two approaches turn out to be highly complementary; even the simplest (deterministic) symbol refinement and a basic use of an all-fragments grammar combine to give accuracies substantially above recent work on treesubstitution grammar based parsers and approaching top refinement-based parsers. [sent-23, score-0.938]
9 This reduction is a flexible, implicit representation of the frag- ments that, rather than extracting an intractably large grammar over fragment types, indexes all nodes in the training treebank and uses a compact grammar over indexed node tokens. [sent-34, score-1.079]
10 This indexed grammar, when appropriately marginalized, is equivalent to one in which all fragments are explicitly extracted. [sent-35, score-0.678]
11 In this direction, we present a coarse-to-fine inference scheme and a compact graph encoding of the training set, which, together, make parsing manageable. [sent-37, score-0.439]
12 Of course, having a grammar that includes all training substructures is only desirable to the extent that those structures can be appropriately weighted. [sent-39, score-0.276]
13 However, we use a simple weighting scheme which does decompose appropriately over the implicit encoding, and which is flexible enough to allow weights to depend not only on frequency but also on fragment size, node patterns, and certain lexical properties. [sent-41, score-0.499]
14 The all-fragments approach has the advantage that parsing down to the character level requires no special treatment; we show that an explicit lexicon is not needed when sentences are considered as strings of characters rather than words. [sent-45, score-0.264]
15 In the parsing case, the central result is that accuracies in the range of state-of-the-art parsers (i. [sent-48, score-0.286]
16 1 All-Fragments Grammars We consider an all-fragments grammar G (see Figure 1(a)) derived from a binarized treebank B. [sent-53, score-0.261]
17 G is formally a tree-substitution grammar (Resnik, 1992; Bod, 1993) wherein each subgraph of each training tree in B is an elementary tree, or fragment f, in G. [sent-54, score-0.421]
18 In G, each derivation d is a tree (multiset) of fragments (Figure 1(c)), and the weight of the derivation is the Qproduct of the weights of the fragments: ω(d) = Qf∈d ω(f). [sent-55, score-0.664]
19 nIns mofo tdheels f lrimke P G, many derivations will gen- erally correspond to the same unsegmented tree, and the parsing task is to find the tree whose sum of derivation weights is highest: tmax = arg maxt Pd∈t ω(d). [sent-57, score-0.468]
20 As a tractable alternative, we consider an implicit grammar GI (see Figure 1(b)) that has the same posterior probabilities as G. [sent-61, score-0.308]
21 4 GI has base symbols, which are the symbol types from the original treebank, as well as indexed symbols, which are obtained by assigning a unique index to each node token in the training treebank. [sent-63, score-0.644]
22 The vast majority of symbols in GI are therefore indexed symbols. [sent-64, score-0.439]
23 In particular, we found that GI was smaller than explicit extraction of all depth 1 and 2 unbinarized fragments for our 4The difference is that Goodman (1996a) collapses our BEGIN and END rules into the binary productions, giving a larger grammar which is less convenient for weighting. [sent-66, score-0.651]
24 ns and fragments in the grammar for (a) the explicitly extracted all-fragments treebanks in practice, even just the raw treebank grammar grows almost linearly in the size of B. [sent-70, score-0.751]
25 The BEGIN rules transition from a base symbol to an indexed symbol and rep– resent the beginning of a fragment from G. [sent-72, score-1.092]
26 The CONTINUE rules use only indexed symbols and correspond to specific depth-1 binary fragment tokens from training trees, representing the internal continuation of a fragment in G. [sent-73, score-0.877]
27 Finally, END rules transition from an indexed symbol to a base symbol, representing the frontier of a fragment. [sent-74, score-0.65]
28 By construction, all derivations in GI will segment, as shown in Figure 1(d), into regions corresponding to tokens of fragments from the training treebank B. [sent-75, score-0.51]
29 Let π be the map which takes appropriate fragments in GI (those that begin and end with base symbols and otherwise contain only indexed symbols), and maps them to the corresponding f in G. [sent-76, score-0.892]
30 We can consider any derivation dI in GI to be a tree of fragments fI, each fragment a token of a fragment type f = π(fI) in the original grammar G. [sent-77, score-0.984]
31 9 mil- lion indexed symbols in GI (after graph packing). [sent-81, score-0.496]
32 Even extracting binarized fragments (depth 1 and 2, with one order of parent annotation) gives us 0. [sent-82, score-0.499]
33 75 million rules, and, practically, we would need fragments of greater depth. [sent-83, score-0.41]
34 In particular, each derivation d in G has a nonempty set of corresponding derivations {dI} = eπm−1p (d) eint oGfI ,c brerecsapuosen fragments f nins {dd correspond to multiple fragments fI in GI that differ only in their indexed symbols (one fIper occurrence of f in B). [sent-85, score-1.263]
35 3 Equivalence for Weighted Grammars In general, arbitrary weight functions ω on fragments in G do not decompose along the increased locality of GI. [sent-89, score-0.414]
36 There- fore, any fragment fIwill have a weight in GI of the form: ωI(fI) = ωBEGIN(b) YωCONT(r) YωEND(e) Yr∈C Ye∈E where b is the BEGIN rule, r are CONTINUE rules, and e are END rules in the fragment fI (see Figure 1(d)). [sent-92, score-0.454]
37 Here s(X) denotes the total number of fragments rooted at base symbol X. [sent-101, score-0.734]
38 for fragments f in G by ωG(f) = X ωI(fI) = n(f)ωI(f) fI∈Xπ−1 (f) = n(f)ωBEGIN(b0) Y ωCONT(r0) Y ωEND(e0) rY0∈C eY0∈E where now b0, r0 and e0 are non-indexed type abstractions of f’s member productions in GI and n(f) = |π−1 (f) | is the number of tokens of f in Bn(. [sent-102, score-0.357]
39 f Under the weight function ωG(f), any derivation d in G will have weight which obeys ωG(d) = YωG(f) = Yn(f)ωI(f) Yf∈d Yf∈d = XωI(dI) dXI∈d and so the posterior P(d|s) of a derivation d for a sden soten tchee s owstielrl ober tPhe(d same fw ah deethreivra computed in G or GI. [sent-103, score-0.325]
40 Therefore, provided our weighting function on fragments f in G decomposes over the derivational representation of f in GI, we can equivalently compute the quantities we need for inference (see Section 4) using GI instead. [sent-104, score-0.464]
41 1 Classical DOP1 The original data-oriented parsing model ‘DOP1 ’ (Bod, 1993) is a particular instance of the general weighting scheme which decomposes appropriately over the implicit encoding, described in Section 2. [sent-106, score-0.399]
42 The END rule weight is 0 or 1 depending on whether A is an intermediate symbol or not. [sent-109, score-0.434]
43 6 The local fragments in DOP1 were flat (non-binary) so this weight choice simulates that property by not allowing switching between fragments at intermediate symbols. [sent-110, score-0.869]
44 , the frequency of fragment f divided by the number of fragments rooted at base symbol X. [sent-113, score-0.909]
45 This is simulated by our weight choices (Figure 2) where each fragment fI in GI has weight ωI(fI) = 1/s(X) and therefore, ωG(f) = ωI(fI) = n(f)/s(X). [sent-114, score-0.289]
46 The number of fragrooted at base symbol X is then s(X) = PXi s(Xi). [sent-116, score-0.323]
47 ImplicPitly parsing with the full DOP1 model (no sampling Pof fragments) using the weights in Figure 2 gives a 68% parsing accuracy on the WSJ dev-set. [sent-117, score-0.393]
48 7 This result indicates that the weight of a fragment should depend on more than just its frequency. [sent-118, score-0.267]
49 2 Better Parameterization As has been pointed out in the literature, largefragment grammars can benefit from weights of fragments depending not only on their frequency but also on other properties. [sent-120, score-0.455]
50 For example, Bod (2001) restricts the size and number of words in the frontier of the fragments, and Collins and Duffy (2002) and Goodman (2003) both give larger fragments smaller weights. [sent-121, score-0.357]
51 In particular, we set ωCONT(r) for each binary CONTINUE rule r to a learned constant ωBODY, and we set the weight for each rule with a POS parent to a 6Intermediate symbols are those created during binarization. [sent-123, score-0.415]
52 We annotate with full left binarization history to imitate the flat nature of fragments in DOP1 . [sent-125, score-0.357]
53 A, B, C are base symbols, Ax, By, Cz are indexed symbols and i,j,k are between-word indices. [sent-130, score-0.495]
54 Fractional values of ωLEX these parameters allow the weight of a fragment to depend on its size and lexical properties. [sent-137, score-0.267]
55 The DOP1 model uses binary values (0 if symbol is intermediate, as which is equivalent the END rule weight, 1 otherwise) to prohibiting fragment symbols. [sent-139, score-0.493]
56 We learn a fractional that allows fragments formulation switching (but penalizes) at annotated symbols = 1 constant switching csp(Xintermediate) csp(Xnon−intermediate) at intermediate asp between through the = 1 − asp and + asp. [sent-140, score-0.803]
57 1T −his a feature allows fragments to be assigned weights based on the binarization status of their nodes. [sent-141, score-0.4]
58 With the above weights, the recursive formula for s(Xi), the total weighted number of fragments rooted at indexed symbol Xi, is different from DOP1 (Equation 1). [sent-142, score-0.958]
59 The resulting grammar is primarily parameterized by the training treebank B. [sent-146, score-0.257]
60 4 Efficient Inference GI The previously described implicit grammar defines a posterior distribution |s) over a sentence s via a large, indexed PCFG|s). [sent-163, score-0.588]
61 As shown in Table 1, our model (an allfragments grammar with the weighting scheme tent by Johnson (2002). [sent-173, score-0.308]
62 Their pruning grammars were coarse versions of the raw treebank grammar. [sent-185, score-0.331]
63 Our grammar GI has a very large number of indexed symbols, so we use a coarse pass to prune away their unindexed abstractions. [sent-189, score-0.492]
64 The simple, intuitive, and effective choice for such a coarse grammar GC is a minimal PCFG grammar composed of the base treebank symbols X and the minimal depth-1 binary rules X → Y Z (and wmiinthi mthael same 1lev beiln aorfy a rnunloetsa tXion → as Yin the full grammar). [sent-190, score-0.69]
65 If a particular base symbol X is pruned by the coarse pass for a particular span (i, j) (i. [sent-191, score-0.393]
66 , the posterior marginal P(X, i,j |s) is less than a ctheerta pions threshold), tnhaeln P i(nX th,ei, jfu|sll) grammar aGnI a, we do not allow building any indexed symbol Xl of type X for that span. [sent-193, score-0.756]
67 Coarse-pass Log Posterior Threshold (PT) Figure 4: Effect of coarse-pass pruning on parsing accuracy (for WSJ dev-set, ≤ 40 words). [sent-201, score-0.329]
68 0 -6 -1 -3 -5 -7 -9 -11 Coarse-pass Log Posterior Threshold (PT) -13 Figure 5: Effect of coarse-pass pruning on parsing accuracy (WSJ, training ≤ 20 words, tested on dev-set ≤ 20 words). [sent-208, score-0.37]
69 pruning is very small and that the peak accuracy is almost equal to the accuracy without pruning (the dotted line). [sent-210, score-0.352]
70 2 v loargia ptoiosn- in parsing accuracies in response to the amount of pruning done by the coarse-pass. [sent-214, score-0.362]
71 (2008)), that a certain amount of pruning helps accuracy, perhaps by promoting agreement between the coarse and full grammars (model intersection). [sent-218, score-0.257]
72 However, these ‘fortuitous’ search errors give only a small improvement and the peak accuracy is almost equal to the parsing accuracy without any pruning (as seen in Figure This outcome suggests that the coarsepass pruning is critical for tractability but not for performance. [sent-219, score-0.593]
73 1103 tree-to-graph encoding Figure 6: Collapsing the duplicate training subtrees converts them to a graph and reduces the number of indexed symbols significantly. [sent-223, score-0.604]
74 However, the number of indexed symbols in our implicit grammar GI is still large, because every node in each training tree (i. [sent-227, score-0.784]
75 9 million indexed symbol tokens in the word-level parsing model (this number increases further to almost 12. [sent-231, score-0.753]
76 This large symbol space makes parsing slow and memory-intensive. [sent-234, score-0.42]
77 We reduce the number of symbols in our im- plicit grammar GI by applying a compact, packed graph encoding to the treebank training trees. [sent-235, score-0.602]
78 This technique reduces the number of indexed symbols significantly as shown in Table 2 (1. [sent-240, score-0.439]
79 This reduction increases parsing speed by a factor of 1. [sent-244, score-0.241]
80 We store the duplicate-subtree counts for each indexed symbol of the collapsed graph (using a hashmap). [sent-247, score-0.604]
81 ments s(Xi) parented by an indexed symbol Xi (see Section 3. [sent-251, score-0.583]
82 1 Character-Level Parsing The all-fragments approach to parsing has the added advantage that parsing below the word level requires no special treatment, i. [sent-255, score-0.306]
83 With our implicit approach, we can avoid training a lexicon by building up the parse tree from characters instead of words. [sent-260, score-0.249]
84 A test sentence’s words are split up similarly and the test-parse is built from training fragments using the same model and inference procedure as defined for word-level parsing (see Sections 2, 3 and 4). [sent-262, score-0.591]
85 289 Table 3: All-fragments WSJ results for the character-level parsing model, using parent annotation and one level of markovization. [sent-271, score-0.289]
86 12 Character-level parsing expands the training trees (see Figure 7) and the already large indexed symbol space size explodes (1. [sent-279, score-0.777]
87 The packing shrinks the symbol space size from 12. [sent-286, score-0.267]
88 This reduction increases parsing speed by almost a factor of 20 and brings down memory-usage to under 8GB. [sent-289, score-0.241]
89 A standard solution is symbol refinement; Johnson (1998) presents the particularly simple case of parent annotation, in which each node is 12Note that the word-level model yields a higher accuracy of 88. [sent-293, score-0.408]
90 It is reasonable to hope that the gains from using large fragments and the gains from symbol refinement will be complementary. [sent-311, score-0.777]
91 Zuidema (2007) showed a slight improvement in parsing accuracy when enough fragments were added to learn enrichments beyond manual refinements. [sent-315, score-0.554]
92 Table 4 shows results for a basic PCFG, and its augmentation with either basic refinement (parent annotation and one level of markovization), with all-fragments rules (as in previous sections), or both. [sent-317, score-0.345]
93 The basic incorporation of large fragments alone does not yield particularly strong perfor- mance, nor does basic symbol refinement. [sent-318, score-0.73]
94 3 Additional Deterministic Refinement Basic symbol refinement (parent annotation), in combination with all-fragments, gives test-set accuracies of 88. [sent-321, score-0.497]
95 Klein and Manning (2003) describe a broad set of simple, deterministic symbol refinements beyond parent annotation. [sent-326, score-0.409]
96 This novel tractability of an allfragments grammar is achieved using both coarsepass pruning and packed graph encoding. [sent-336, score-0.541]
97 2 Training Size Variation Figure 8 shows how WSJ parsing accuracy increases with increasing amount of training data (i. [sent-339, score-0.238]
98 Even ifwe train on only 10% of the WSJ training data (3983 sentences), we still achieve a reasonable parsing accuracy of nearly 84% (on the development set, ≤ 40 words), which is comparable to the fullsystem wroesrudlst)s, o wbhtaiicnhed i by oZmupidaermabal (2007), eC fouhllnet al. [sent-342, score-0.238]
99 Basic Refinement is our all-fragments grammar with parent annotation. [sent-363, score-0.239]
100 14 7 Conclusion Our approach of using all fragments, in combination with basic symbol refinement, and even without an explicit lexicon, achieves results in the range of state-of-the-art parsers on full scale treebanks, across multiple languages. [sent-376, score-0.441]
wordName wordTfidf (topN-words)
[('fragments', 0.357), ('indexed', 0.28), ('symbol', 0.267), ('gi', 0.258), ('fragment', 0.175), ('symbols', 0.159), ('petrov', 0.157), ('refinement', 0.153), ('parsing', 0.153), ('grammar', 0.142), ('wsj', 0.137), ('pruning', 0.132), ('zk', 0.105), ('bod', 0.104), ('dop', 0.104), ('sima', 0.1), ('csp', 0.099), ('tmax', 0.099), ('implicit', 0.099), ('parent', 0.097), ('goodman', 0.093), ('lex', 0.091), ('klein', 0.088), ('yj', 0.084), ('ax', 0.077), ('accuracies', 0.077), ('asp', 0.075), ('treebank', 0.074), ('derivation', 0.072), ('coarse', 0.07), ('xi', 0.07), ('pcfg', 0.068), ('posterior', 0.067), ('encoding', 0.067), ('weighting', 0.067), ('fi', 0.066), ('explicit', 0.065), ('charniak', 0.064), ('zuidema', 0.064), ('tree', 0.063), ('packed', 0.062), ('allfragments', 0.06), ('markovization', 0.06), ('intermediate', 0.059), ('continue', 0.058), ('graph', 0.057), ('di', 0.057), ('weight', 0.057), ('base', 0.056), ('parsers', 0.056), ('post', 0.055), ('grammars', 0.055), ('collins', 0.054), ('rooted', 0.054), ('million', 0.053), ('basic', 0.053), ('johnson', 0.053), ('tq', 0.052), ('alel', 0.052), ('substructures', 0.052), ('rule', 0.051), ('reduction', 0.048), ('treesubstitution', 0.048), ('cz', 0.048), ('tractability', 0.048), ('rules', 0.047), ('characters', 0.046), ('deterministic', 0.045), ('cohn', 0.045), ('binarized', 0.045), ('tsg', 0.045), ('accuracy', 0.044), ('weights', 0.043), ('slav', 0.043), ('gildea', 0.043), ('cont', 0.042), ('khalil', 0.042), ('duffy', 0.042), ('compact', 0.042), ('training', 0.041), ('appropriately', 0.041), ('estimator', 0.04), ('objectives', 0.04), ('begin', 0.04), ('speed', 0.04), ('body', 0.04), ('unknown', 0.04), ('coarsepass', 0.04), ('hashmap', 0.04), ('unbinarized', 0.04), ('yf', 0.04), ('inference', 0.04), ('scheme', 0.039), ('switching', 0.039), ('annotation', 0.039), ('derivations', 0.038), ('ments', 0.036), ('treebanks', 0.036), ('trees', 0.036), ('depend', 0.035)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
2 0.17750706 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
3 0.12955655 69 acl-2010-Constituency to Dependency Translation with Forests
Author: Haitao Mi ; Qun Liu
Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.
4 0.12700973 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
Author: Timothy A. D. Fowler ; Gerald Penn
Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.
5 0.12171952 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
6 0.11794844 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
7 0.11725321 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.10377447 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
9 0.1031573 99 acl-2010-Efficient Third-Order Dependency Parsers
10 0.099846005 162 acl-2010-Learning Common Grammar from Multilingual Corpus
11 0.096615754 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
12 0.096560605 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
13 0.096547015 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names
14 0.095976606 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
15 0.092314981 133 acl-2010-Hierarchical Search for Word Alignment
16 0.091549754 236 acl-2010-Top-Down K-Best A* Parsing
17 0.091431871 131 acl-2010-Hierarchical A* Parsing with Bridge Outside Scores
18 0.089899614 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data
19 0.089701883 243 acl-2010-Tree-Based and Forest-Based Translation
20 0.087654978 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information
topicId topicWeight
[(0, -0.241), (1, -0.085), (2, 0.094), (3, -0.007), (4, -0.13), (5, -0.095), (6, 0.181), (7, 0.023), (8, 0.036), (9, -0.057), (10, -0.058), (11, -0.071), (12, 0.033), (13, -0.012), (14, -0.043), (15, -0.012), (16, -0.026), (17, 0.019), (18, -0.0), (19, 0.011), (20, -0.021), (21, 0.001), (22, 0.021), (23, -0.023), (24, -0.068), (25, 0.057), (26, 0.003), (27, 0.104), (28, -0.006), (29, -0.012), (30, 0.07), (31, -0.008), (32, 0.001), (33, 0.09), (34, 0.021), (35, -0.017), (36, -0.016), (37, -0.051), (38, -0.022), (39, -0.041), (40, 0.001), (41, 0.063), (42, 0.015), (43, -0.123), (44, -0.052), (45, 0.034), (46, -0.083), (47, 0.113), (48, -0.002), (49, 0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.95650572 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
2 0.75696087 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
3 0.64695537 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
4 0.62795812 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization
Author: Shay Cohen ; Noah A Smith
Abstract: We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.
5 0.59514362 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars
Author: Sindhu Raghavan ; Adriana Kovashka ; Raymond Mooney
Abstract: In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.
6 0.59126735 162 acl-2010-Learning Common Grammar from Multilingual Corpus
8 0.56368011 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two
9 0.55997002 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
10 0.55212623 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
11 0.54872429 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
12 0.53875768 99 acl-2010-Efficient Third-Order Dependency Parsers
13 0.52421504 67 acl-2010-Computing Weakest Readings
14 0.49879703 114 acl-2010-Faster Parsing by Supertagger Adaptation
15 0.49067745 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web
16 0.48840374 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices
17 0.48245165 169 acl-2010-Learning to Translate with Source and Target Syntax
18 0.48183146 130 acl-2010-Hard Constraints for Grammatical Function Labelling
19 0.48027015 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
20 0.47047386 69 acl-2010-Constituency to Dependency Translation with Forests
topicId topicWeight
[(0, 0.055), (7, 0.017), (14, 0.034), (25, 0.116), (33, 0.022), (39, 0.019), (42, 0.028), (44, 0.012), (59, 0.101), (66, 0.012), (69, 0.01), (71, 0.016), (72, 0.013), (73, 0.045), (76, 0.116), (78, 0.036), (80, 0.018), (83, 0.082), (84, 0.036), (98, 0.113)]
simIndex simValue paperId paperTitle
same-paper 1 0.91733223 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
2 0.91332108 139 acl-2010-Identifying Generic Noun Phrases
Author: Nils Reiter ; Anette Frank
Abstract: This paper presents a supervised approach for identifying generic noun phrases in context. Generic statements express rulelike knowledge about kinds or events. Therefore, their identification is important for the automatic construction of knowledge bases. In particular, the distinction between generic and non-generic statements is crucial for the correct encoding of generic and instance-level information. Generic expressions have been studied extensively in formal semantics. Building on this work, we explore a corpus-based learning approach for identifying generic NPs, using selections of linguistically motivated features. Our results perform well above the baseline and existing prior work.
3 0.87218142 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
Author: Martin Haulrich
Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.
4 0.85681641 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining
Author: Peng Li ; Jing Jiang ; Yinglin Wang
Abstract: In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.
5 0.82403278 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System
Author: Emily M. Bender ; Scott Drellishak ; Antske Fokkens ; Michael Wayne Goodman ; Daniel P. Mills ; Laurie Poulson ; Safiyyah Saleem
Abstract: This demonstration presents the LinGO Grammar Matrix grammar customization system: a repository of distilled linguistic knowledge and a web-based service which elicits a typological description of a language from the user and yields a customized grammar fragment ready for sustained development into a broad-coverage grammar. We describe the implementation of this repository with an emphasis on how the information is made available to users, including in-browser testing capabilities.
6 0.82154512 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
7 0.81926215 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.81916249 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
9 0.81854123 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
10 0.81806636 71 acl-2010-Convolution Kernel over Packed Parse Forest
11 0.81609368 130 acl-2010-Hard Constraints for Grammatical Function Labelling
12 0.8151769 162 acl-2010-Learning Common Grammar from Multilingual Corpus
13 0.81457341 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
14 0.81048965 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
15 0.81031704 197 acl-2010-Practical Very Large Scale CRFs
16 0.80457389 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
17 0.80418837 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses
18 0.80396706 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names
19 0.80291075 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
20 0.80231202 248 acl-2010-Unsupervised Ontology Induction from Text