acl acl2010 acl2010-211 knowledge-graph by maker-knowledge-mining

211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

Source: pdf

Author: Mohit Bansal ; Dan Klein

Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. [sent-3, score-0.687]

2 We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. [sent-4, score-0.804]

3 We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. [sent-5, score-0.721]

4 Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. [sent-6, score-0.218]

5 Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding. [sent-8, score-0.388]

6 , 2003) or syntactic trees 1In this paper, a fragment means an elementary tree in a tree-substitution grammar, while a subtree means a fragment that bottoms out in terminals. [sent-12, score-0.449]

7 In this paper, we present a simplified parser which combines the two basic ideas, using both large fragments and symbol refinement, to provide non-local and local context respectively. [sent-22, score-0.677]

8 The two approaches turn out to be highly complementary; even the simplest (deterministic) symbol refinement and a basic use of an all-fragments grammar combine to give accuracies substantially above recent work on treesubstitution grammar based parsers and approaching top refinement-based parsers. [sent-23, score-0.938]

9 This reduction is a flexible, implicit representation of the frag- ments that, rather than extracting an intractably large grammar over fragment types, indexes all nodes in the training treebank and uses a compact grammar over indexed node tokens. [sent-34, score-1.079]

10 This indexed grammar, when appropriately marginalized, is equivalent to one in which all fragments are explicitly extracted. [sent-35, score-0.678]

11 In this direction, we present a coarse-to-fine inference scheme and a compact graph encoding of the training set, which, together, make parsing manageable. [sent-37, score-0.439]

12 Of course, having a grammar that includes all training substructures is only desirable to the extent that those structures can be appropriately weighted. [sent-39, score-0.276]

13 However, we use a simple weighting scheme which does decompose appropriately over the implicit encoding, and which is flexible enough to allow weights to depend not only on frequency but also on fragment size, node patterns, and certain lexical properties. [sent-41, score-0.499]

14 The all-fragments approach has the advantage that parsing down to the character level requires no special treatment; we show that an explicit lexicon is not needed when sentences are considered as strings of characters rather than words. [sent-45, score-0.264]

15 In the parsing case, the central result is that accuracies in the range of state-of-the-art parsers (i. [sent-48, score-0.286]

16 1 All-Fragments Grammars We consider an all-fragments grammar G (see Figure 1(a)) derived from a binarized treebank B. [sent-53, score-0.261]

17 G is formally a tree-substitution grammar (Resnik, 1992; Bod, 1993) wherein each subgraph of each training tree in B is an elementary tree, or fragment f, in G. [sent-54, score-0.421]

18 In G, each derivation d is a tree (multiset) of fragments (Figure 1(c)), and the weight of the derivation is the Qproduct of the weights of the fragments: ω(d) = Qf∈d ω(f). [sent-55, score-0.664]

19 nIns mofo tdheels f lrimke P G, many derivations will gen- erally correspond to the same unsegmented tree, and the parsing task is to find the tree whose sum of derivation weights is highest: tmax = arg maxt Pd∈t ω(d). [sent-57, score-0.468]

20 As a tractable alternative, we consider an implicit grammar GI (see Figure 1(b)) that has the same posterior probabilities as G. [sent-61, score-0.308]

21 4 GI has base symbols, which are the symbol types from the original treebank, as well as indexed symbols, which are obtained by assigning a unique index to each node token in the training treebank. [sent-63, score-0.644]

22 The vast majority of symbols in GI are therefore indexed symbols. [sent-64, score-0.439]

23 In particular, we found that GI was smaller than explicit extraction of all depth 1 and 2 unbinarized fragments for our 4The difference is that Goodman (1996a) collapses our BEGIN and END rules into the binary productions, giving a larger grammar which is less convenient for weighting. [sent-66, score-0.651]

24 ns and fragments in the grammar for (a) the explicitly extracted all-fragments treebanks in practice, even just the raw treebank grammar grows almost linearly in the size of B. [sent-70, score-0.751]

25 The BEGIN rules transition from a base symbol to an indexed symbol and rep– resent the beginning of a fragment from G. [sent-72, score-1.092]

26 The CONTINUE rules use only indexed symbols and correspond to specific depth-1 binary fragment tokens from training trees, representing the internal continuation of a fragment in G. [sent-73, score-0.877]

27 Finally, END rules transition from an indexed symbol to a base symbol, representing the frontier of a fragment. [sent-74, score-0.65]

28 By construction, all derivations in GI will segment, as shown in Figure 1(d), into regions corresponding to tokens of fragments from the training treebank B. [sent-75, score-0.51]

29 Let π be the map which takes appropriate fragments in GI (those that begin and end with base symbols and otherwise contain only indexed symbols), and maps them to the corresponding f in G. [sent-76, score-0.892]

30 We can consider any derivation dI in GI to be a tree of fragments fI, each fragment a token of a fragment type f = π(fI) in the original grammar G. [sent-77, score-0.984]

31 9 mil- lion indexed symbols in GI (after graph packing). [sent-81, score-0.496]

32 Even extracting binarized fragments (depth 1 and 2, with one order of parent annotation) gives us 0. [sent-82, score-0.499]

33 75 million rules, and, practically, we would need fragments of greater depth. [sent-83, score-0.41]

34 In particular, each derivation d in G has a nonempty set of corresponding derivations {dI} = eπm−1p (d) eint oGfI ,c brerecsapuosen fragments f nins {dd correspond to multiple fragments fI in GI that differ only in their indexed symbols (one fIper occurrence of f in B). [sent-85, score-1.263]

35 3 Equivalence for Weighted Grammars In general, arbitrary weight functions ω on fragments in G do not decompose along the increased locality of GI. [sent-89, score-0.414]

36 There- fore, any fragment fIwill have a weight in GI of the form: ωI(fI) = ωBEGIN(b) YωCONT(r) YωEND(e) Yr∈C Ye∈E where b is the BEGIN rule, r are CONTINUE rules, and e are END rules in the fragment fI (see Figure 1(d)). [sent-92, score-0.454]

37 Here s(X) denotes the total number of fragments rooted at base symbol X. [sent-101, score-0.734]

38 for fragments f in G by ωG(f) = X ωI(fI) = n(f)ωI(f) fI∈Xπ−1 (f) = n(f)ωBEGIN(b0) Y ωCONT(r0) Y ωEND(e0) rY0∈C eY0∈E where now b0, r0 and e0 are non-indexed type abstractions of f’s member productions in GI and n(f) = |π−1 (f) | is the number of tokens of f in Bn(. [sent-102, score-0.357]

39 f Under the weight function ωG(f), any derivation d in G will have weight which obeys ωG(d) = YωG(f) = Yn(f)ωI(f) Yf∈d Yf∈d = XωI(dI) dXI∈d and so the posterior P(d|s) of a derivation d for a sden soten tchee s owstielrl ober tPhe(d same fw ah deethreivra computed in G or GI. [sent-103, score-0.325]

40 Therefore, provided our weighting function on fragments f in G decomposes over the derivational representation of f in GI, we can equivalently compute the quantities we need for inference (see Section 4) using GI instead. [sent-104, score-0.464]

41 1 Classical DOP1 The original data-oriented parsing model ‘DOP1 ’ (Bod, 1993) is a particular instance of the general weighting scheme which decomposes appropriately over the implicit encoding, described in Section 2. [sent-106, score-0.399]

42 The END rule weight is 0 or 1 depending on whether A is an intermediate symbol or not. [sent-109, score-0.434]

43 6 The local fragments in DOP1 were flat (non-binary) so this weight choice simulates that property by not allowing switching between fragments at intermediate symbols. [sent-110, score-0.869]

44 , the frequency of fragment f divided by the number of fragments rooted at base symbol X. [sent-113, score-0.909]

45 This is simulated by our weight choices (Figure 2) where each fragment fI in GI has weight ωI(fI) = 1/s(X) and therefore, ωG(f) = ωI(fI) = n(f)/s(X). [sent-114, score-0.289]

46 The number of fragrooted at base symbol X is then s(X) = PXi s(Xi). [sent-116, score-0.323]

47 ImplicPitly parsing with the full DOP1 model (no sampling Pof fragments) using the weights in Figure 2 gives a 68% parsing accuracy on the WSJ dev-set. [sent-117, score-0.393]

48 7 This result indicates that the weight of a fragment should depend on more than just its frequency. [sent-118, score-0.267]

49 2 Better Parameterization As has been pointed out in the literature, largefragment grammars can benefit from weights of fragments depending not only on their frequency but also on other properties. [sent-120, score-0.455]

50 For example, Bod (2001) restricts the size and number of words in the frontier of the fragments, and Collins and Duffy (2002) and Goodman (2003) both give larger fragments smaller weights. [sent-121, score-0.357]

51 In particular, we set ωCONT(r) for each binary CONTINUE rule r to a learned constant ωBODY, and we set the weight for each rule with a POS parent to a 6Intermediate symbols are those created during binarization. [sent-123, score-0.415]

52 We annotate with full left binarization history to imitate the flat nature of fragments in DOP1 . [sent-125, score-0.357]

53 A, B, C are base symbols, Ax, By, Cz are indexed symbols and i,j,k are between-word indices. [sent-130, score-0.495]

54 Fractional values of ωLEX these parameters allow the weight of a fragment to depend on its size and lexical properties. [sent-137, score-0.267]

55 The DOP1 model uses binary values (0 if symbol is intermediate, as which is equivalent the END rule weight, 1 otherwise) to prohibiting fragment symbols. [sent-139, score-0.493]

56 We learn a fractional that allows fragments formulation switching (but penalizes) at annotated symbols = 1 constant switching csp(Xintermediate) csp(Xnon−intermediate) at intermediate asp between through the = 1 − asp and + asp. [sent-140, score-0.803]

57 1T −his a feature allows fragments to be assigned weights based on the binarization status of their nodes. [sent-141, score-0.4]

58 With the above weights, the recursive formula for s(Xi), the total weighted number of fragments rooted at indexed symbol Xi, is different from DOP1 (Equation 1). [sent-142, score-0.958]

59 The resulting grammar is primarily parameterized by the training treebank B. [sent-146, score-0.257]

60 4 Efficient Inference GI The previously described implicit grammar defines a posterior distribution |s) over a sentence s via a large, indexed PCFG|s). [sent-163, score-0.588]

61 As shown in Table 1, our model (an allfragments grammar with the weighting scheme tent by Johnson (2002). [sent-173, score-0.308]

62 Their pruning grammars were coarse versions of the raw treebank grammar. [sent-185, score-0.331]

63 Our grammar GI has a very large number of indexed symbols, so we use a coarse pass to prune away their unindexed abstractions. [sent-189, score-0.492]

64 The simple, intuitive, and effective choice for such a coarse grammar GC is a minimal PCFG grammar composed of the base treebank symbols X and the minimal depth-1 binary rules X → Y Z (and wmiinthi mthael same 1lev beiln aorfy a rnunloetsa tXion → as Yin the full grammar). [sent-190, score-0.69]

65 If a particular base symbol X is pruned by the coarse pass for a particular span (i, j) (i. [sent-191, score-0.393]

66 , the posterior marginal P(X, i,j |s) is less than a ctheerta pions threshold), tnhaeln P i(nX th,ei, jfu|sll) grammar aGnI a, we do not allow building any indexed symbol Xl of type X for that span. [sent-193, score-0.756]

67 Coarse-pass Log Posterior Threshold (PT) Figure 4: Effect of coarse-pass pruning on parsing accuracy (for WSJ dev-set, ≤ 40 words). [sent-201, score-0.329]

68 0 -6 -1 -3 -5 -7 -9 -11 Coarse-pass Log Posterior Threshold (PT) -13 Figure 5: Effect of coarse-pass pruning on parsing accuracy (WSJ, training ≤ 20 words, tested on dev-set ≤ 20 words). [sent-208, score-0.37]

69 pruning is very small and that the peak accuracy is almost equal to the accuracy without pruning (the dotted line). [sent-210, score-0.352]

70 2 v loargia ptoiosn- in parsing accuracies in response to the amount of pruning done by the coarse-pass. [sent-214, score-0.362]

71 (2008)), that a certain amount of pruning helps accuracy, perhaps by promoting agreement between the coarse and full grammars (model intersection). [sent-218, score-0.257]

72 However, these ‘fortuitous’ search errors give only a small improvement and the peak accuracy is almost equal to the parsing accuracy without any pruning (as seen in Figure This outcome suggests that the coarsepass pruning is critical for tractability but not for performance. [sent-219, score-0.593]

73 1103 tree-to-graph encoding Figure 6: Collapsing the duplicate training subtrees converts them to a graph and reduces the number of indexed symbols significantly. [sent-223, score-0.604]

74 However, the number of indexed symbols in our implicit grammar GI is still large, because every node in each training tree (i. [sent-227, score-0.784]

75 9 million indexed symbol tokens in the word-level parsing model (this number increases further to almost 12. [sent-231, score-0.753]

76 This large symbol space makes parsing slow and memory-intensive. [sent-234, score-0.42]

77 We reduce the number of symbols in our im- plicit grammar GI by applying a compact, packed graph encoding to the treebank training trees. [sent-235, score-0.602]

78 This technique reduces the number of indexed symbols significantly as shown in Table 2 (1. [sent-240, score-0.439]

79 This reduction increases parsing speed by a factor of 1. [sent-244, score-0.241]

80 We store the duplicate-subtree counts for each indexed symbol of the collapsed graph (using a hashmap). [sent-247, score-0.604]

81 ments s(Xi) parented by an indexed symbol Xi (see Section 3. [sent-251, score-0.583]

82 1 Character-Level Parsing The all-fragments approach to parsing has the added advantage that parsing below the word level requires no special treatment, i. [sent-255, score-0.306]

83 With our implicit approach, we can avoid training a lexicon by building up the parse tree from characters instead of words. [sent-260, score-0.249]

84 A test sentence’s words are split up similarly and the test-parse is built from training fragments using the same model and inference procedure as defined for word-level parsing (see Sections 2, 3 and 4). [sent-262, score-0.591]

85 289 Table 3: All-fragments WSJ results for the character-level parsing model, using parent annotation and one level of markovization. [sent-271, score-0.289]

86 12 Character-level parsing expands the training trees (see Figure 7) and the already large indexed symbol space size explodes (1. [sent-279, score-0.777]

87 The packing shrinks the symbol space size from 12. [sent-286, score-0.267]

88 This reduction increases parsing speed by almost a factor of 20 and brings down memory-usage to under 8GB. [sent-289, score-0.241]

89 A standard solution is symbol refinement; Johnson (1998) presents the particularly simple case of parent annotation, in which each node is 12Note that the word-level model yields a higher accuracy of 88. [sent-293, score-0.408]

90 It is reasonable to hope that the gains from using large fragments and the gains from symbol refinement will be complementary. [sent-311, score-0.777]

91 Zuidema (2007) showed a slight improvement in parsing accuracy when enough fragments were added to learn enrichments beyond manual refinements. [sent-315, score-0.554]

92 Table 4 shows results for a basic PCFG, and its augmentation with either basic refinement (parent annotation and one level of markovization), with all-fragments rules (as in previous sections), or both. [sent-317, score-0.345]

93 The basic incorporation of large fragments alone does not yield particularly strong perfor- mance, nor does basic symbol refinement. [sent-318, score-0.73]

94 3 Additional Deterministic Refinement Basic symbol refinement (parent annotation), in combination with all-fragments, gives test-set accuracies of 88. [sent-321, score-0.497]

95 Klein and Manning (2003) describe a broad set of simple, deterministic symbol refinements beyond parent annotation. [sent-326, score-0.409]

96 This novel tractability of an allfragments grammar is achieved using both coarsepass pruning and packed graph encoding. [sent-336, score-0.541]

97 2 Training Size Variation Figure 8 shows how WSJ parsing accuracy increases with increasing amount of training data (i. [sent-339, score-0.238]

98 Even ifwe train on only 10% of the WSJ training data (3983 sentences), we still achieve a reasonable parsing accuracy of nearly 84% (on the development set, ≤ 40 words), which is comparable to the fullsystem wroesrudlst)s, o wbhtaiicnhed i by oZmupidaermabal (2007), eC fouhllnet al. [sent-342, score-0.238]

99 Basic Refinement is our all-fragments grammar with parent annotation. [sent-363, score-0.239]

100 14 7 Conclusion Our approach of using all fragments, in combination with basic symbol refinement, and even without an explicit lexicon, achieves results in the range of state-of-the-art parsers on full scale treebanks, across multiple languages. [sent-376, score-0.441]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('fragments', 0.357), ('indexed', 0.28), ('symbol', 0.267), ('gi', 0.258), ('fragment', 0.175), ('symbols', 0.159), ('petrov', 0.157), ('refinement', 0.153), ('parsing', 0.153), ('grammar', 0.142), ('wsj', 0.137), ('pruning', 0.132), ('zk', 0.105), ('bod', 0.104), ('dop', 0.104), ('sima', 0.1), ('csp', 0.099), ('tmax', 0.099), ('implicit', 0.099), ('parent', 0.097), ('goodman', 0.093), ('lex', 0.091), ('klein', 0.088), ('yj', 0.084), ('ax', 0.077), ('accuracies', 0.077), ('asp', 0.075), ('treebank', 0.074), ('derivation', 0.072), ('coarse', 0.07), ('xi', 0.07), ('pcfg', 0.068), ('posterior', 0.067), ('encoding', 0.067), ('weighting', 0.067), ('fi', 0.066), ('explicit', 0.065), ('charniak', 0.064), ('zuidema', 0.064), ('tree', 0.063), ('packed', 0.062), ('allfragments', 0.06), ('markovization', 0.06), ('intermediate', 0.059), ('continue', 0.058), ('graph', 0.057), ('di', 0.057), ('weight', 0.057), ('base', 0.056), ('parsers', 0.056), ('post', 0.055), ('grammars', 0.055), ('collins', 0.054), ('rooted', 0.054), ('million', 0.053), ('basic', 0.053), ('johnson', 0.053), ('tq', 0.052), ('alel', 0.052), ('substructures', 0.052), ('rule', 0.051), ('reduction', 0.048), ('treesubstitution', 0.048), ('cz', 0.048), ('tractability', 0.048), ('rules', 0.047), ('characters', 0.046), ('deterministic', 0.045), ('cohn', 0.045), ('binarized', 0.045), ('tsg', 0.045), ('accuracy', 0.044), ('weights', 0.043), ('slav', 0.043), ('gildea', 0.043), ('cont', 0.042), ('khalil', 0.042), ('duffy', 0.042), ('compact', 0.042), ('training', 0.041), ('appropriately', 0.041), ('estimator', 0.04), ('objectives', 0.04), ('begin', 0.04), ('speed', 0.04), ('body', 0.04), ('unknown', 0.04), ('coarsepass', 0.04), ('hashmap', 0.04), ('unbinarized', 0.04), ('yf', 0.04), ('inference', 0.04), ('scheme', 0.039), ('switching', 0.039), ('annotation', 0.039), ('derivations', 0.038), ('ments', 0.036), ('treebanks', 0.036), ('trees', 0.036), ('depend', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

Author: Mohit Bansal ; Dan Klein

2 0.17750706 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

Author: Trevor Cohn ; Phil Blunsom

Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.

3 0.12955655 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

4 0.12700973 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

5 0.12171952 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

Author: Elif Yamangil ; Stuart M. Shieber

Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.

6 0.11794844 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

7 0.11725321 169 acl-2010-Learning to Translate with Source and Target Syntax

8 0.10377447 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

9 0.1031573 99 acl-2010-Efficient Third-Order Dependency Parsers

10 0.099846005 162 acl-2010-Learning Common Grammar from Multilingual Corpus

11 0.096615754 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

12 0.096560605 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

13 0.096547015 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

14 0.095976606 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

15 0.092314981 133 acl-2010-Hierarchical Search for Word Alignment

16 0.091549754 236 acl-2010-Top-Down K-Best A* Parsing

17 0.091431871 131 acl-2010-Hierarchical A* Parsing with Bridge Outside Scores

18 0.089899614 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

19 0.089701883 243 acl-2010-Tree-Based and Forest-Based Translation

20 0.087654978 155 acl-2010-Kernel Based Discourse Relation Recognition with Temporal Ordering Information

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.241), (1, -0.085), (2, 0.094), (3, -0.007), (4, -0.13), (5, -0.095), (6, 0.181), (7, 0.023), (8, 0.036), (9, -0.057), (10, -0.058), (11, -0.071), (12, 0.033), (13, -0.012), (14, -0.043), (15, -0.012), (16, -0.026), (17, 0.019), (18, -0.0), (19, 0.011), (20, -0.021), (21, 0.001), (22, 0.021), (23, -0.023), (24, -0.068), (25, 0.057), (26, 0.003), (27, 0.104), (28, -0.006), (29, -0.012), (30, 0.07), (31, -0.008), (32, 0.001), (33, 0.09), (34, 0.021), (35, -0.017), (36, -0.016), (37, -0.051), (38, -0.022), (39, -0.041), (40, 0.001), (41, 0.063), (42, 0.015), (43, -0.123), (44, -0.052), (45, 0.034), (46, -0.083), (47, 0.113), (48, -0.002), (49, 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95650572 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

Author: Mohit Bansal ; Dan Klein

2 0.75696087 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

Author: Trevor Cohn ; Phil Blunsom

3 0.64695537 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

Author: Liang Huang ; Kenji Sagae

Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.

4 0.62795812 255 acl-2010-Viterbi Training for PCFGs: Hardness Results and Competitiveness of Uniform Initialization

Author: Shay Cohen ; Noah A Smith

Abstract: We consider the search for a maximum likelihood assignment of hidden derivations and grammar weights for a probabilistic context-free grammar, the problem approximately solved by “Viterbi training.” We show that solving and even approximating Viterbi training for PCFGs is NP-hard. We motivate the use of uniformat-random initialization for Viterbi EM as an optimal initializer in absence of further information about the correct model parameters, providing an approximate bound on the log-likelihood.

5 0.59514362 34 acl-2010-Authorship Attribution Using Probabilistic Context-Free Grammars

Author: Sindhu Raghavan ; Adriana Kovashka ; Raymond Mooney

Abstract: In this paper, we present a novel approach for authorship attribution, the task of identifying the author of a document, using probabilistic context-free grammars. Our approach involves building a probabilistic context-free grammar for each author and using this grammar as a language model for classification. We evaluate the performance of our method on a wide range of datasets to demonstrate its efficacy.

6 0.59126735 162 acl-2010-Learning Common Grammar from Multilingual Corpus

7 0.57671946 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

8 0.56368011 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two

9 0.55997002 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

10 0.55212623 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

11 0.54872429 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

12 0.53875768 99 acl-2010-Efficient Third-Order Dependency Parsers

13 0.52421504 67 acl-2010-Computing Weakest Readings

14 0.49879703 114 acl-2010-Faster Parsing by Supertagger Adaptation

15 0.49067745 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

16 0.48840374 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

17 0.48245165 169 acl-2010-Learning to Translate with Source and Target Syntax

18 0.48183146 130 acl-2010-Hard Constraints for Grammatical Function Labelling

19 0.48027015 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

20 0.47047386 69 acl-2010-Constituency to Dependency Translation with Forests

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.055), (7, 0.017), (14, 0.034), (25, 0.116), (33, 0.022), (39, 0.019), (42, 0.028), (44, 0.012), (59, 0.101), (66, 0.012), (69, 0.01), (71, 0.016), (72, 0.013), (73, 0.045), (76, 0.116), (78, 0.036), (80, 0.018), (83, 0.082), (84, 0.036), (98, 0.113)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91733223 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

Author: Mohit Bansal ; Dan Klein

2 0.91332108 139 acl-2010-Identifying Generic Noun Phrases

Author: Nils Reiter ; Anette Frank

Abstract: This paper presents a supervised approach for identifying generic noun phrases in context. Generic statements express rulelike knowledge about kinds or events. Therefore, their identification is important for the automatic construction of knowledge bases. In particular, the distinction between generic and non-generic statements is crucial for the correct encoding of generic and instance-level information. Generic expressions have been studied extensively in formal semantics. Building on this work, we explore a corpus-based learning approach for identifying generic NPs, using selections of linguistically motivated features. Our results perform well above the baseline and existing prior work.

3 0.87218142 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

Author: Martin Haulrich

Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.

4 0.85681641 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

Author: Peng Li ; Jing Jiang ; Yinglin Wang

Abstract: In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. This kind of summary templates can be useful in various applications. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. We apply our method on five Wikipedia entity categories and compare our method with two baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method.

5 0.82403278 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

Author: Emily M. Bender ; Scott Drellishak ; Antske Fokkens ; Michael Wayne Goodman ; Daniel P. Mills ; Laurie Poulson ; Safiyyah Saleem

Abstract: This demonstration presents the LinGO Grammar Matrix grammar customization system: a repository of distilled linguistic knowledge and a web-based service which elicits a typological description of a language from the user and yields a customized grammar fragment ready for sustained development into a broad-coverage grammar. We describe the implementation of this repository with an emphasis on how the information is made available to users, including in-browser testing capabilities.

6 0.82154512 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

7 0.81926215 169 acl-2010-Learning to Translate with Source and Target Syntax

8 0.81916249 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

9 0.81854123 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

10 0.81806636 71 acl-2010-Convolution Kernel over Packed Parse Forest

11 0.81609368 130 acl-2010-Hard Constraints for Grammatical Function Labelling

12 0.8151769 162 acl-2010-Learning Common Grammar from Multilingual Corpus

13 0.81457341 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

14 0.81048965 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

15 0.81031704 197 acl-2010-Practical Very Large Scale CRFs

16 0.80457389 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

17 0.80418837 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

18 0.80396706 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

19 0.80291075 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

20 0.80231202 248 acl-2010-Unsupervised Ontology Induction from Text