acl acl2010 acl2010-23 knowledge-graph by maker-knowledge-mining

23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar


Source: pdf

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. [sent-5, score-0.307]

2 We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). [sent-7, score-0.189]

3 In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy. [sent-8, score-0.225]

4 1 Introduction Combinatory categorial grammar (CCG) is a variant of categorial grammar which has attracted interest for both theoretical and practical reasons. [sent-9, score-0.458]

5 On the practical side, we have corpora with CCG derivations for each sentence (Hockenmaier and Steedman, 2007), a wide-coverage parser trained on that corpus (Clark and Curran, 2007) and a system for converting CCG derivations into semantic representations (Bos et al. [sent-11, score-0.421]

6 However, despite being treated as a single unified grammar formalism, each of these authors use variations of CCG which differ primarily on which combinators are included in the grammar and the restrictions that are put on them. [sent-13, score-0.162]

7 Included in this class of strongly context-free CCGs are a grammar including all the derivations in CCGbank and the grammar used in the Clark and Curran parser. [sent-17, score-0.414]

8 The Petrov parser (Petrov and Klein, 2007) uses latent variables to refine the grammar extracted from a corpus to improve accuracy, originally used to improve parsing results on the Penn treebank (PTB). [sent-19, score-0.276]

9 We train the Petrov parser on CCGbank and achieve the best results to date on sentences from section 23 in terms of supertagging accuracy, PARSEVAL measures and dependency accuracy. [sent-20, score-0.225]

10 Bos’s system for building semantic representations from CCG derivations is only possible due to the categorial nature of CCG. [sent-22, score-0.283]

11 The set of categories are constructed from a finite set of atoms A (e. [sent-25, score-0.309]

12 According to the literature, combinatory categorial grammar has been defined to have a vari- ety of rule systems. [sent-40, score-0.399]

13 These rule systems vary from a small rule set, motivated theoretically (VijayShanker and Weir, 1994), to a larger rule set, motivated linguistically, (Steedman, 2000) to a very large rule set, motivated by practical coverage (Hockenmaier and Steedman, 2007; Clark and Curran, 2007). [sent-41, score-0.396]

14 A combinatory categorial grammar (CCG) is a categorial grammar whose rule system consists of rule schemata where the left side is a sequence of categories and the right side is a single category where the categories may include variables over both categories and connectives. [sent-43, score-1.228]

15 In addition, rule schemata may specify a sequence of categories and connectives using the . [sent-44, score-0.342]

16 appears in a rule, it matches any sequence of categories and connectives according to the connectives adjacent to the . [sent-51, score-0.198]

17 For example, the rule schema for forward composition is: X/Y, Y/Z → X/Z and the rule schema for generalized crossed composition is: X/Y, Y |1Z1 |2 . [sent-55, score-1.059]

18 A well-known categorial grammar which is not a CCG is Lambek categorial grammar (Lambek, 1958) whose introduction rules cannot be characterized as combinatory rules (Zielonka, 1981). [sent-63, score-0.629]

19 1 Classes for defining CCG We define a number of schema classes general enough that the important variants of CCG can be defined by selecting some subset of the classes. [sent-65, score-0.37]

20 In addition to the schema classes, we also define two restriction classes which define ways in which the rule schemata from the schema classes can be restricted. [sent-66, score-0.97]

21 We define the following restriction classes: (A) Rule Restriction to a Finite Set The rule schemata in the schema classes of a CCG are limited to a finite number of instantiations. [sent-98, score-0.753]

22 (B) Rule Restrictions to Certain Categories 3 The rule schemata in the schema classes of a CCG are limited to a finite number of instantiations although variables are allowed in the instantiations. [sent-99, score-0.733]

23 Vijay-Shanker and Weir (1994) define CCG to be schema class (4) with restriction class (B). [sent-100, score-0.448]

24 Steedman (2000) defines CCG to be schema classes (1-5), (6), (10) with restriction class (B). [sent-101, score-0.471]

25 The set of atoms in any derivation of any CCG consisting of a subset of the schema classes (1-8) and (10-11) is finite. [sent-104, score-0.545]

26 A finite lexicon can introduce only a finite number of atoms in lexical categories. [sent-106, score-0.377]

27 Any rule corresponding to a schema in the schema classes (1-8) has only those atoms on the right that occur somewhere on the left. [sent-107, score-0.837]

28 Our proofs about restriction class (B) are essentially identical to proofs regarding the multi-modal variant. [sent-109, score-0.205]

29 such rules, limiting the new atoms to a finite number. [sent-110, score-0.225]

30 The subcategories for a category c are c1 and c2 if c = c1 • c2 for • ∈ B and c if c is atomic. [sent-112, score-0.202]

31 Its second subcategories are th aned dsu cb icfa cte isgories of its subcategories. [sent-113, score-0.141]

32 Any CCG consisting of a subset of the rule schemata (1-3), (6-8) and (10-11) has derivations consisting of only a finite number of categories. [sent-115, score-0.564]

33 We first prove the proposition excluding schema class (8). [sent-117, score-0.434]

34 We will use structural induction on the derivations to prove that there is a bound on the size of the subcategories of any category in the derivation. [sent-118, score-0.431]

35 The base case is the assignment of a lexical category to a word and the inductive step is the use of a rule from schema classes (1-4), (6-7) and (10-1 1). [sent-119, score-0.504]

36 Given that the lexicon is finite, there is a bound k on the size of the subcategories of lexical categories. [sent-120, score-0.215]

37 Furthermore, there is a bound lon the size of the subcategories of categories on the right side of any rule in (10) and (11). [sent-121, score-0.421]

38 For rules from schema class (1), the category on the right is a subcategory of the first category on the left, so the subcategories on the right are bound by m. [sent-123, score-0.799]

39 For rules from schema classes (2-3), the category on the right has subcategories X and Z each of which is bound in size by m since they occur as subcategories of categories on the left. [sent-124, score-0.914]

40 For rules from schema class (6), since reducing generalized composition is a special case of re337 ducing generalized crossing composition, we need only consider the latter. [sent-125, score-0.642]

41 The category on the right has subcategories X | 1Z1|2 . [sent-126, score-0.23]

42 For rules from schema class (7), the category on the right has subcategories X and Z. [sent-140, score-0.61]

43 The size of Z is bound by m because it is a subcategory of a category on the left. [sent-141, score-0.181]

44 The size of X is bound by m because it is a second subcategory of a category on the left. [sent-142, score-0.181]

45 Finally, the use of rules in schema classes (10- 11) have categories on the right that are bounded by l, which is, in turn, bounded by m. [sent-143, score-0.515]

46 Then, by proposition 1, there must only be a finite number of categories in any derivation in a CCG consisting of a subset of rule schemata (1-3), (6-7) and (1011). [sent-144, score-0.601]

47 The proof including schema class (8) is essentially identical except that k must be defined in terms of the size of the second subcategories. [sent-145, score-0.414]

48 A grammar is strongly context-free if there exists a CFG such that the derivations of the two grammars are identical. [sent-147, score-0.302]

49 Any CCG consisting of a subset of the schema classes (1-3), (6-8) and (10-11) is strongly context-free. [sent-149, score-0.455]

50 Since the CCG generates derivations whose categories are finite in number let C be that set of categories. [sent-151, score-0.364]

51 Then, for each rule schema C1, C2 → C3 in (1-3) and (6-8), we construct a context-fr→ee Crule C3′ → C1′ , C2′ for each Ci′ in S(C, Ci) for 1 ≤ i ≤ →3. [sent-153, score-0.388]

52 Similarly, for each rule schema C1 → C2 i in (10), we construct a context-free rule C2′ → C1′ which results in a finite number of such rules. [sent-154, score-0.614]

53 Finally, for each rule schema X~ → Z in (11) we construct a context-free rule ZX → Then, )for we ea ccohn entry ti na tchoen tleexxti-cfornee w → ZC →, we construct a context-free rthulee eCx → w. [sent-155, score-0.501]

54 The constructed CFG has precisely the same rules as the CCG restricted to the categories in C except that the left and right sides have been reversed. [sent-156, score-0.164]

55 Thus, by proposition 2, the CFG has exactly the same derivations as the CCG. [sent-157, score-0.243]

56 Any CCG consisting of a subset of the schema classes (1-3), (6-8) and (10-11) along with restriction class (B) is strongly context-free. [sent-159, score-0.575]

57 If a CCG is allowed to restrict the use of its rules to certain categories as in schema class (B), then when we construct the context-free rules by enumerating only those categories in the set C allowed by the restriction. [sent-162, score-0.621]

58 Any CCG that includes restriction class (A) is strongly context-free. [sent-164, score-0.173]

59 We construct a context-free grammar with exactly those rules in the finite set of instantiations of the CCG rule schemata along with contextfree rules corresponding to the lexicon. [sent-166, score-0.577]

60 This CFG generates exactly the same derivations as the CCG. [sent-167, score-0.164]

61 We have thus proved that of a wide range of the rule schemata used to define CCGs are contextfree. [sent-168, score-0.201]

62 3 Combinatory Categorial Grammars in Practice CCGbank (Hockenmaier and Steedman, 2007) is a corpus of CCG derivations that was semiautomatically converted from the Wall Street Journal section of the Penn treebank. [sent-170, score-0.146]

63 Figure 2 shows a categorization of the rules used in CCGbank according to the schema classes defined in the preceding section where a rule is placed into the least general class to which it belongs. [sent-171, score-0.548]

64 In addition to having no generalized composition other than the reducing variant, it should also be noted that in all generalized composition rules, X = Y implying that the reducing class of generalized composition is a very natural schema class for CCGbank. [sent-172, score-0.937]

65 If we assume that type-raising is restricted to those instances occurring in CCGbank4, then a CCG consisting of schema classes (1-3), (6-7) and (10-1 1) can generate all the derivations in CCGbank. [sent-173, score-0.529]

66 One could also observe that since CCGbank is finite, its grammar is not only a context-free grammar but can produce only a finite number of derivations. [sent-175, score-0.296]

67 However, our statement is much stronger because this CCG can generate all of the derivations in CCGbank given only the lexicon, the finite set of unrestricted rules and the finite number of type-raising rules. [sent-176, score-0.493]

68 Despite the fact that there is a strongly context-free CCG which generates all of the derivations in CCGbank, it is still possible that the grammar learned by the Clark and Curran parser is not a context-free grammar. [sent-179, score-0.389]

69 However, in addition to rule schemata (1-6) and (10-1 1) they also include restriction class (A) by restricting rules to only those found in the training data5 . [sent-180, score-0.39]

70 Thus, by proposition 5, the Clark and Curran parser is a context-free parser. [sent-181, score-0.188]

71 Unlike the context-free grammars extracted from the Penn treebank, these allow for the categorial semantics that accompanies any categorial parse and for a more elegant analysis of linguistic structures such as extraction and coordination. [sent-183, score-0.296]

72 The Petrov parser uses latent variables to refine a coarse-grained grammar extracted from a train- ing corpus to a grammar which makes much more fine-grained syntactic distinctions. [sent-186, score-0.3]

73 For example, 5The Clark and Curran parser has an option, which is dis- abled by default, for not restricting the rules to those that appear in the training data. [sent-187, score-0.178]

74 However, they find that this restriction is “detrimental to neither parser accuracy or coverage” (Clark and Curran, 2007). [sent-188, score-0.21]

75 The Petrov parser was chosen for our experi- ments because it refines the grammar in a mathematically principled way without altering the nature of the derivations that are output. [sent-194, score-0.336]

76 This is important because the input to the semantic backend and the system that converts CCG derivations to dependencies requires CCG derivations as they appear in CCGbank. [sent-195, score-0.334]

77 CCGbank, in addition to the basic atoms S, N, NP and PP, also differentiates both the S and NP atoms with features allowing more subtle distinctions. [sent-198, score-0.182]

78 These features allow finer control ofthe use of combinatory rules in the resulting grammars. [sent-200, score-0.141]

79 at73e548n2sc%e in section 00 that receive derivations from the four parsers shown. [sent-207, score-0.191]

80 n465t32e9sn%ce in section 23 that receive derivations three parsers shown. [sent-210, score-0.191]

81 In the supertagging literature, POS tagging and supertagging are distinguished POS tags are the traditional Penn treebank tags (e. [sent-215, score-0.15]

82 However, because the Petrov parser trained on CCGbank has no notion of Penn treebank POS tags, we can only – evaluate the accuracy of the supertags. [sent-218, score-0.203]

83 The results are shown in figures 3 and 4 where the “Accuracy” column shows accuracy of the supertags against the CCGbank categories and the “No feats” column shows accuracy when features are ignored. [sent-219, score-0.198]

84 The difference in accuracy is only statistically significant between Clark and Curran’s Normal Form model ignoring features and the Petrov parser trained on CCGbank without features (p-value = 0. [sent-221, score-0.163]

85 Figure 5 gives the PARSEVAL measures on section 00 for Clark and Curran’s two best models and the Petrov parser trained on the original CCGbank and the version without features after various numbers of training iterations. [sent-232, score-0.153]

86 In the case of Clark and Curran’s hybrid model, the poor accuracy relative to the Petrov parsers can be attributed to the fact that this model chooses derivations based on the associated dependencies at the expense of constituent accuracy (see section 3. [sent-234, score-0.301]

87 Due to the similarity of the accuracies and the difference in the coverage between I-5 of the Petrov parser on CCGbank and I-6 of the Petrov parser on CCGbank without features, we reevaluate their results on only those sentences for which they both return derivations in figures 6 and 8. [sent-238, score-0.456]

88 Figure 9 gives a comparison between the Petrov parser trained on the Penn treebank and on CCGbank. [sent-240, score-0.169]

89 For this reason, the word to word dependencies of categorial grammar parsers are often evaluated. [sent-317, score-0.305]

90 We used the CCG derivation to dependency converter generate included in the C&C; tools package to convert the output of the Petrov parser to dependencies. [sent-329, score-0.25]

91 A labeled dependency is correct if the ordered pair of words is correct, the head word has the correct category and the position of the category that is the source of that edge is correct. [sent-333, score-0.175]

92 Figure 12 shows accuracies from the Petrov parser trained on CCGbank along with accuracies for the Clark and Curran parser. [sent-334, score-0.211]

93 We only show accuracies for the Petrov parser trained on the original version of CCGbank because the dependency converter cannot currently generate dependencies for featureless derivations. [sent-335, score-0.301]

94 The relatively poor coverage of the Petrov parser is due to the failure of the dependency converter to output dependencies from valid CCG derivations. [sent-336, score-0.268]

95 However, the coverage of the dependency converter is actually lower when run on the gold standard derivations indicating that this cov- erage problem is not indicative of inaccuracies in the Petrov parser. [sent-337, score-0.263]

96 The Petrov parser has better results by a statistically significant margin for both labeled and unlabeled recall and unlabeled F-score. [sent-339, score-0.191]

97 In contrast, the Clark and Curran parser is significantly faster than the Petrov parsers, which we hypothesize to be attributed to the degree to which Clark and Curran have optimized their code, their use of C++ as opposed to Java and their use of a supertagger to prune the lexicon. [sent-347, score-0.135]

98 Based on these results, we trained the Petrov parser on CCGbank and achieved state of the art results in terms of supertagging accuracy, PARSEVAL measures and dependency accuracy. [sent-349, score-0.245]

99 First, the ability to extract semantic representations from CCG derivations is not dependent on the language class of a CCG. [sent-351, score-0.199]

100 CCGbank: a corpus of CCG derivations and dependency structures extracted from the penn treebank. [sent-390, score-0.241]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ccg', 0.526), ('ccgbank', 0.377), ('petrov', 0.316), ('schema', 0.275), ('curran', 0.198), ('clark', 0.17), ('derivations', 0.146), ('subcategories', 0.141), ('categorial', 0.137), ('finite', 0.134), ('parseval', 0.129), ('parser', 0.109), ('schemata', 0.109), ('composition', 0.1), ('nzn', 0.096), ('rule', 0.092), ('atoms', 0.091), ('combinatory', 0.089), ('ccgs', 0.084), ('categories', 0.084), ('grammar', 0.081), ('proposition', 0.079), ('classes', 0.076), ('steedman', 0.074), ('generalized', 0.068), ('restriction', 0.067), ('subcategory', 0.064), ('cfg', 0.064), ('category', 0.061), ('zn', 0.06), ('penn', 0.058), ('connectives', 0.057), ('crossed', 0.057), ('dcl', 0.057), ('hockenmaier', 0.057), ('supertagging', 0.055), ('np', 0.054), ('class', 0.053), ('strongly', 0.053), ('derivation', 0.052), ('rules', 0.052), ('converter', 0.052), ('weir', 0.049), ('parsers', 0.045), ('dependencies', 0.042), ('accuracies', 0.041), ('treebank', 0.04), ('dependency', 0.037), ('bound', 0.036), ('accuracy', 0.034), ('unlabeled', 0.033), ('consisting', 0.032), ('combinator', 0.032), ('cpus', 0.032), ('feats', 0.032), ('nxite', 0.032), ('vinken', 0.032), ('variables', 0.029), ('nb', 0.029), ('coverage', 0.028), ('lambek', 0.028), ('right', 0.028), ('prove', 0.027), ('bos', 0.027), ('unrestricted', 0.027), ('reducing', 0.026), ('chairman', 0.026), ('supertagger', 0.026), ('proof', 0.025), ('elsevier', 0.024), ('fowler', 0.024), ('measures', 0.024), ('essentially', 0.023), ('figures', 0.023), ('supertags', 0.023), ('grammars', 0.022), ('proofs', 0.022), ('variant', 0.022), ('construct', 0.021), ('denoted', 0.021), ('convention', 0.02), ('side', 0.02), ('size', 0.02), ('trained', 0.02), ('subset', 0.019), ('exactly', 0.018), ('toronto', 0.018), ('publishing', 0.018), ('instructions', 0.018), ('page', 0.018), ('splits', 0.018), ('instantiations', 0.018), ('identical', 0.018), ('klein', 0.018), ('lexicon', 0.018), ('spent', 0.017), ('parsing', 0.017), ('specified', 0.017), ('restricting', 0.017), ('labeled', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

2 0.46147788 228 acl-2010-The Importance of Rule Restrictions in CCG

Author: Marco Kuhlmann ; Alexander Koller ; Giorgio Satta

Abstract: Combinatory Categorial Grammar (CCG) is generally construed as a fully lexicalized formalism, where all grammars use one and the same universal set of rules, and crosslinguistic variation is isolated in the lexicon. In this paper, we show that the weak generative capacity of this ‘pure’ form of CCG is strictly smaller than that of CCG with grammar-specific rules, and of other mildly context-sensitive grammar formalisms, including Tree Adjoining Grammar (TAG). Our result also carries over to a multi-modal extension of CCG.

3 0.37350318 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

4 0.29050893 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

Author: Sujith Ravi ; Jason Baldridge ; Kevin Knight

Abstract: We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.

5 0.24968281 114 acl-2010-Faster Parsing by Supertagger Adaptation

Author: Jonathan K. Kummerfeld ; Jessika Roesner ; Tim Dawborn ; James Haggerty ; James R. Curran ; Stephen Clark

Abstract: We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highestscoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG supertagger and parser, obtain- ing significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.

6 0.1846631 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars

7 0.12700973 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

8 0.09117306 169 acl-2010-Learning to Translate with Source and Target Syntax

9 0.074675404 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

10 0.070480354 236 acl-2010-Top-Down K-Best A* Parsing

11 0.069841266 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

12 0.069590881 85 acl-2010-Detecting Experiences from Weblogs

13 0.069243178 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

14 0.069012351 130 acl-2010-Hard Constraints for Grammatical Function Labelling

15 0.068346485 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

16 0.067639947 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

17 0.064151086 99 acl-2010-Efficient Third-Order Dependency Parsers

18 0.063314132 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

19 0.061710458 217 acl-2010-String Extension Learning

20 0.06000733 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.164), (1, -0.022), (2, 0.141), (3, -0.03), (4, -0.156), (5, -0.161), (6, 0.293), (7, 0.071), (8, 0.268), (9, -0.019), (10, 0.326), (11, 0.128), (12, -0.325), (13, 0.125), (14, -0.04), (15, 0.031), (16, 0.029), (17, -0.05), (18, -0.131), (19, -0.06), (20, 0.119), (21, -0.102), (22, -0.066), (23, 0.063), (24, -0.042), (25, -0.003), (26, -0.025), (27, -0.064), (28, -0.005), (29, 0.008), (30, 0.01), (31, 0.093), (32, 0.078), (33, 0.018), (34, -0.051), (35, 0.078), (36, 0.032), (37, 0.02), (38, 0.027), (39, -0.018), (40, 0.05), (41, -0.002), (42, -0.063), (43, -0.007), (44, 0.022), (45, 0.033), (46, 0.047), (47, -0.018), (48, -0.064), (49, -0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96605712 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

2 0.942963 228 acl-2010-The Importance of Rule Restrictions in CCG

Author: Marco Kuhlmann ; Alexander Koller ; Giorgio Satta

Abstract: Combinatory Categorial Grammar (CCG) is generally construed as a fully lexicalized formalism, where all grammars use one and the same universal set of rules, and crosslinguistic variation is isolated in the lexicon. In this paper, we show that the weak generative capacity of this ‘pure’ form of CCG is strictly smaller than that of CCG with grammar-specific rules, and of other mildly context-sensitive grammar formalisms, including Tree Adjoining Grammar (TAG). Our result also carries over to a multi-modal extension of CCG.

3 0.73501825 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

4 0.67802495 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

Author: Sujith Ravi ; Jason Baldridge ; Kevin Knight

Abstract: We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.

5 0.67792535 114 acl-2010-Faster Parsing by Supertagger Adaptation

Author: Jonathan K. Kummerfeld ; Jessika Roesner ; Tim Dawborn ; James Haggerty ; James R. Curran ; Stephen Clark

Abstract: We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highestscoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG supertagger and parser, obtain- ing significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.

6 0.6134336 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars

7 0.40700281 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

8 0.30348048 182 acl-2010-On the Computational Complexity of Dominance Links in Grammatical Formalisms

9 0.28716099 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

10 0.25943112 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

11 0.24395496 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

12 0.23710103 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

13 0.23409934 169 acl-2010-Learning to Translate with Source and Target Syntax

14 0.2330481 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

15 0.23278002 130 acl-2010-Hard Constraints for Grammatical Function Labelling

16 0.22432856 67 acl-2010-Computing Weakest Readings

17 0.22201352 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

18 0.20147698 222 acl-2010-SystemT: An Algebraic Approach to Declarative Information Extraction

19 0.20000258 186 acl-2010-Optimal Rank Reduction for Linear Context-Free Rewriting Systems with Fan-Out Two

20 0.19913673 162 acl-2010-Learning Common Grammar from Multilingual Corpus


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.013), (14, 0.016), (25, 0.21), (33, 0.022), (42, 0.012), (44, 0.013), (55, 0.167), (59, 0.074), (73, 0.036), (78, 0.157), (83, 0.062), (84, 0.016), (98, 0.082)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87135684 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

2 0.79643476 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

3 0.76094866 224 acl-2010-Talking NPCs in a Virtual Game World

Author: Tina Kluwer ; Peter Adolphs ; Feiyu Xu ; Hans Uszkoreit ; Xiwen Cheng

Abstract: This paper describes the KomParse system, a natural-language dialog system in the three-dimensional virtual world Twinity. In order to fulfill the various communication demands between nonplayer characters (NPCs) and users in such an online virtual world, the system realizes a flexible and hybrid approach combining knowledge-intensive domainspecific question answering, task-specific and domain-specific dialog with robust chatbot-like chitchat.

4 0.75629878 28 acl-2010-An Entity-Level Approach to Information Extraction

Author: Aria Haghighi ; Dan Klein

Abstract: We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%.

5 0.75321889 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

6 0.72374398 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

7 0.69208169 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

8 0.69045526 228 acl-2010-The Importance of Rule Restrictions in CCG

9 0.68985146 10 acl-2010-A Latent Dirichlet Allocation Method for Selectional Preferences

10 0.68920988 71 acl-2010-Convolution Kernel over Packed Parse Forest

11 0.6832149 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

12 0.68261999 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

13 0.67725247 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

14 0.6729542 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar

15 0.67040664 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

16 0.66648984 169 acl-2010-Learning to Translate with Source and Target Syntax

17 0.65857744 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

18 0.65749019 229 acl-2010-The Influence of Discourse on Syntax: A Psycholinguistic Model of Sentence Processing

19 0.64794523 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

20 0.64608383 130 acl-2010-Hard Constraints for Grammatical Function Labelling