acl acl2010 acl2010-203 knowledge-graph by maker-knowledge-mining

203 acl-2010-Rebanking CCGbank for Improved NP Interpretation


Source: pdf

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The difficulty of the task means that we ought to view treebanking as an ongoing process akin to grammar development, such as the many years of work on the ERG (Flickinger, 2000). [sent-14, score-0.127]

2 This paper demonstrates how a treebank can be rebanked to incorporate novel analyses and infor- University of Groningen The Netherlands bo s @meaningfact ory . [sent-15, score-0.305]

3 Our first changes integrate four previously suggested improvements to CCGbank. [sent-23, score-0.09]

4 Our analysis allows the distinction between core and peripheral arguments to be represented for predicate nouns. [sent-26, score-0.497]

5 Our analysis also recovers non- local dependencies mediated by nominal predicates; for instance, Google is the agent of acquire in Google ’s decision to acquire YouTube. [sent-28, score-0.141]

6 Together, these changes modify 30% of the labelled dependencies in CCGbank, demonstrating how multiple resources can be brought together in a single, richly annotated corpus. [sent-37, score-0.274]

7 Statistical parsers induce their grammars from corpora, and the corpora for linguistically motivated formalisms currently do not contain high quality predicateargument annotation, because they were derived from the Penn Treebank (PTB Marcus et al. [sent-45, score-0.093]

8 Manually written grammars for these formalisms, such as the ERG HPSG grammar (Flickinger, 2000) and the XLE LFG grammar (Butt et al. [sent-47, score-0.176]

9 , 2006) produce far more detailed and linguistically cor- rect analyses than any English statistical parser, due to the comparatively coarse-grained annotation schemes of the corpora statistical parsers are trained on. [sent-48, score-0.126]

10 1 Combinatory Categorial Grammar Combinatory Categorial Grammar (CCG; Steedman, 2000) is a lexicalised grammar, which means that all grammatical dependencies are specified in the lexical entries and that the production of derivations is governed by a small set of rules. [sent-60, score-0.096]

11 A category can have a functor as its result, so that a word can have a complex valency structure. [sent-63, score-0.095]

12 For instance, a verb phrase is represented by the category S\NP: it is a function from a leftward cNaPte (a subject) :to i a sse an fteunncctei. [sent-64, score-0.095]

13 nA f rtormans ait liveeft wvearrbd requires an object to become a verb phrase, producing the category (S\NP)/NP. [sent-65, score-0.095]

14 CCG extends the basic application rules of pure categorial grammar with (generalised) composition rules and type raising. [sent-68, score-0.147]

15 The index of the word the category \isN assigned to is left implicit. [sent-80, score-0.095]

16 208 3 Combining CCGbank corrections There have been a few papers describing corrections to CCGbank. [sent-82, score-0.176]

17 Vadas and Curran (2007) addressed this by manually annotating all of the ambiguous noun phrases in the PTB, and went on to use this information to correct 20,409 dependencies (1. [sent-92, score-0.218]

18 2 Punctuation corrections The syntactic analysis of punctuation is notoriously difficult, and punctuation is not always treated consistently in the Penn Treebank (Bies et al. [sent-96, score-0.21]

19 This allows a grammar rule to be removed, preventing a great deal of spurious ambiguity and improving the speed of the C&C; parser (Clark and Curran, 2007) by 37%. [sent-100, score-0.146]

20 3 Verb predicate-argument corrections Semantic role descriptions generally recognise a distinction between core arguments, whose role comes from a set specific to the predicate, and peripheral arguments, who have a role drawn from a small, generic set. [sent-102, score-0.452]

21 This distinction is represented in the surface syntax in CCG, because the category of a verb must specify its argument structure. [sent-103, score-0.237]

22 , 2005) to convert 1,543 complements to adjuncts and 13,256 adjuncts to complements (Honnibal and Curran, 2007). [sent-106, score-0.276]

23 If a constituent such as as a director received an adjunct category, but was labelled as a core argument in Propbank, we changed it to a complement, using its head’s part-of-speech tag to infer its constituent type. [sent-107, score-0.377]

24 We performed the equivalent transformation to ensure all peripheral arguments of verbs were analysed as adjuncts. [sent-108, score-0.426]

25 4 Verb-particle constructions Propbank also offers reliable annotation of verbparticle constructions. [sent-110, score-0.138]

26 209 Rome 0s NP (NP/(N/PP))\NP N/(N/PP)< gift of (N/PP)/PP)/PP PP/NP peace to NP PP/NP Europe NP PP> PP> (N/PP)/PP> N/PP> NP> Figure 1: Deverbal noun predicate with agent, patient and beneficiary arguments. [sent-112, score-0.176]

27 4 Noun predicate-argument structure Many common nouns in English can receive optional complements and adjuncts, realised by prepositional phrases, genitive determiners, compound nouns, relative clauses, and for some nouns, complementised clauses. [sent-113, score-0.452]

28 of peacep to Europeb In (9), the genitive introduces the patient, but when the patient is supplied by the PP, it instead introduces the agent. [sent-119, score-0.246]

29 The mapping differs for gift, where the genitive introduces the agent. [sent-120, score-0.147]

30 The ambiguity can be seen in an NP such as The nobleman ’s portrait, where the genitive could mark possession (peripheral), or it could introduce the patient (core). [sent-122, score-0.193]

31 The distinction between core and peripheral arguments is particularly difficult for compound nouns, as pre-modification is very productive in English. [sent-123, score-0.506]

32 1 CCG analysis We designed our analysis for transparency between the syntax and the predicate-argument structure, by stipulating that all and only the core arguments should be syntactic arguments of the predicate’s category. [sent-125, score-0.329]

33 This is fairly straightforward for arguments introduced by prepositions: destruction of Carthage N/PPy PPy/NPy NP PPCarthage> Ndestruction> In our analysis, the head of of Carthage is Carthage, as of is assumed to be a semantically transparent case-marker. [sent-126, score-0.245]

34 We apply this analysis to prepositional phrases that provide arguments to verbs as well a departure from CCGbank. [sent-127, score-0.24]

35 Prepositional phrases that introduce peripheral arguments are analysed as syntactic adjuncts: The war in 149 B. [sent-128, score-0.463]

36 — NPy/Ny N (Ny\Ny)/NPz NP \N> (Ny\Ny)in Nwar< NPwar> Adjunct prepositional phrases remain headed by the preposition, as it is the preposition’s semantics that determines whether they function as temporal, causal, spatial etc. [sent-130, score-0.169]

37 Carthage 0s destruction NP (NPy/(Ny/PPz)y)\NPz N/PPy (NPy/(Ny/PPCarthage)y)0s< NPdestruction> In this analysis, we regard the genitive clitic as a case-marker that performs a movement operation roughly analogous to WH-extraction. [sent-135, score-0.206]

38 Its category is therefore similar to the one used in object extraction, (N\N)/(S/NP). [sent-136, score-0.095]

39 This analysis allows recovery of verbal arguments of nominalised raising and control verbs, a construction which both Gildea and Hockenmaier (2003) and Boxwell and White (2008) identify as a problem case when aligning Propbank and CCGbank. [sent-138, score-0.132]

40 The category assigned to decision can coindex the missing NP argument of buy with its own PP argument. [sent-140, score-0.24]

41 When that argument is supplied by the genitive, it is also supplied to the verb, buy, filling its dependency with its agent, Google. [sent-141, score-0.198]

42 This argument would be quite difficult to recover using a shallow syntactic analysis, as the path would be quite long. [sent-142, score-0.092]

43 There are 494 such verb arguments mediated by nominal predicates in Sections 02-21. [sent-143, score-0.177]

44 These analyses allow us to draw comple- ment/adjunct distinctions for nominal predicates, so that the surface syntax takes us very close to a full predicate-argument analysis. [sent-144, score-0.19]

45 The only local core arguments that we do not annotate as syntactic complements are compound nouns, such as decision makers. [sent-148, score-0.324]

46 We avoided these arguments because of the productivity of nounnoun compounding in English, which makes these argument structures very difficult to recover. [sent-149, score-0.224]

47 2 Implementation and statistics Our analysis requires semantic role labels for each argument of the nominal predicates in the Penn Treebank precisely what NomBank (Meyers et al. [sent-153, score-0.137]

48 We then assume that any prepositional phrase or genitive determiner annotated as a core argument in NomBank should be analysed as a complement, while peripheral arguments and adnominals that receive no semantic role label at all are analysed as adjuncts. [sent-158, score-1.027]

49 We converted 34,345 adnominal prepositional phrases to complements, leaving 18,919 as adjuncts. [sent-159, score-0.147]

50 The most common preposition converted was of, which was labelled as a core argument 99. [sent-160, score-0.245]

51 The most common adjunct preposition was in, which realised a peripheral argument in — 59. [sent-162, score-0.436]

52 73% of the occurrences of the 5 most frequent prepositions (of, in, for, on and to) realised peripheral arguments, compared with 53% for other prepositions. [sent-165, score-0.268]

53 Core arguments were also more common than peripheral arguments for possessives. [sent-166, score-0.469]

54 The percentage was similar for both personal pronouns (such as his) and genitive phrases (such as the boy’s). [sent-168, score-0.184]

55 211 5 Adding restrictivity distinctions Adnominals can have either a restrictive or a nonrestrictive (appositional) interpretation, determining the potential reference of the noun phrase it modifies. [sent-169, score-0.262]

56 This ambiguity manifests itself in whether prepositional phrases, relative clauses and other adnominals are analysed as modifiers of either N or NP, yielding a restrictive or nonrestrictive interpretation respectively. [sent-170, score-0.41]

57 In CCGbank, all adnominals attach to NPs, producing non-restrictive interpretations. [sent-171, score-0.137]

58 We therefore move restrictive adnominals to N nodes: All NP/N staff on N (N\N)/NP casual contracts N/N N N> NPTC N\N> N< NP> This corrects the previous interpretation, which stated that there were no permanent staff. [sent-172, score-0.201]

59 All NP\NP modifiers that are not preceded by punctuation were mdifoiveresd htoa t ahree lnoowte psret Ned endo dbye possible and relabelled N\N. [sent-177, score-0.11]

60 Some adnominals in CCGbank are created by the S\NP → NP\NP unary type-changing rule, wthehic Sh\ tNraPns →for NmPs r\eNdPuce udn arreylat tiyvpee c-lcahuasnegsi. [sent-182, score-0.137]

61 The rebanked corpus contains 34,134 N\N restriTchtieve r modifiers, rapnuds 9,784 ninosn- 3r4e,s1t3ri4ct Nive\ Nmo rde-ifiers. [sent-184, score-0.156]

62 6 Reanalysing partitive constructions True partitive constructions consist of a quantifier (16), a cardinal (17) or demonstrative (18) applied to an NP via of. [sent-186, score-0.38]

63 There are similar constructions headed by common nouns, as in (19): (16) Some of us (17) Four of our members (18) Those of us who smoke (19) A glass of wine We regard the common noun partitives as headed by the initial noun, such as glass, because this noun usually controls the number agreement. [sent-187, score-0.43]

64 We therefore analyse these cases as nouns with prepositional arguments. [sent-188, score-0.115]

65 In (19), glass would be assigned the category N/PP. [sent-189, score-0.146]

66 True partitive constructions are different, however: they are always headed by the head of the NP supplied by of. [sent-190, score-0.358]

67 e6r52847centag of dependencies and categories left unchanged in Section 00. [sent-201, score-0.175]

68 oW NeP i/dPenPtif,i aendd a tnhde reanalysed 3,010 partitive genitives in CCGbank. [sent-206, score-0.142]

69 7 Similarity to CCGbank Table 1 shows the percentage of labelled dependencies (L. [sent-207, score-0.184]

70 A labelled dependency is a 4-tuple consisting of the head, the argument, the lexical category of the head, and the argument slot that the dependency fills. [sent-210, score-0.275]

71 For instance, the subject fills slot 1 and the object fills slot 2 on the transitive verb category (S\NP)/NP. [sent-211, score-0.095]

72 t Thahne rleex airceal m categories bse tcoau lasbee one lexical category change alters all of the dependencies headed by a predicate, as they all depend on its lexical category. [sent-213, score-0.331]

73 Unlabelled dependencies consist of only the head and argument. [sent-214, score-0.15]

74 The biggest changes were those described in Sections 4 and 5. [sent-215, score-0.09]

75 After the addition of nominal predicate-argument structure, over 50% of the labelled dependencies were changed. [sent-216, score-0.229]

76 Many of these changes involved changing an adjunct to a complement, which affects the unlabelled dependencies because the head and argument are inverted. [sent-217, score-0.457]

77 8 Lexicon statistics Our changes make the grammar sensitive to new distinctions, which increases the number of lexical categories required. [sent-218, score-0.257]

78 1 Table 2: Effect of the changes on the size of the lexicon. [sent-229, score-0.09]

79 of lexical categories (Cats), the number of lexical categories that occur at least 10 times in Sections 02-21 (Cats ≥ 10), and the average number of categories Cavatasil ≥ab 1le0 )f,o arn assignment teo euamchb etor koefn c itnSection 00 (Cats/Word). [sent-230, score-0.237]

80 The addition ofquotes only added two categories (LQU U and RQU U), and the addition of the quote tokens slightly decreased the average categories per word. [sent-233, score-0.158]

81 The Propbank and verb-particle changes both introduced rare categories for complicated, infrequent argument structures. [sent-234, score-0.261]

82 Head nouns were previously guaranteed the category N in CCGbank; possessive clitics always received the category (NP/N)\NP; and possessive personal pronouns were always NP/N. [sent-236, score-0.336]

83 sOeusrs changes nina-l troduce new categories for these frequent tokens, which meant a substantial increase in the number of possible categories per word. [sent-237, score-0.248]

84 9 Parsing Evaluation Some of the changes we have made correct problems that have caused the performance of a statistical CCG parser to be over-estimated. [sent-238, score-0.148]

85 Other changes introduce new distinctions, which a parser may or may not find difficult to reproduce. [sent-239, score-0.148]

86 To in- vestigate these issues, we trained and evaluated the C&C; CCG parser on our rebanked corpora. [sent-240, score-0.214]

87 5913 Table 4: Comparison of parsers trained on CCGbank and the rebanked corpora, using dependencies that occur in both. [sent-267, score-0.307]

88 The parser scored slightly lower as the NP brackets, Quotes, Propbank and Particles corrections were added. [sent-272, score-0.146]

89 CCGbank contains some dependencies that are trivial to recover, because Hockenmaier and Steedman (2007) was forced to adopt a strictly right-branching analysis for NP brackets. [sent-274, score-0.096]

90 There was a larger drop in accuracy on the fully rebanked corpus, which included our analyses of restrictivity, partitive constructions and noun predicate-argument structure. [sent-275, score-0.502]

91 The labelled dependencies evaluation is particularly sensitive to this, as a single category change affects multiple dependencies. [sent-277, score-0.279]

92 This can be seen in the smaller gap in category accuracy. [sent-278, score-0.095]

93 We investigated whether the differences in performance were due to the different evaluation data by comparing the parsers’ performance against the original parser on the dependencies they agreed upon, to allow direct comparison. [sent-279, score-0.154]

94 Table 4 compares the labelled and unlabelled recall of the rebanked parsers we trained against the CCGbank parser on these intersections. [sent-281, score-0.406]

95 The parser’s performance remained fairly stable on the dependencies left unchanged. [sent-284, score-0.096]

96 8% worse than the CCGbank parser on the intersection de- pendencies, suggesting that the fine-grained distinctions we introduced did cause some sparse data problems. [sent-286, score-0.132]

97 This is another example of a phenomenon that could be analysed much better in CCGbank using an existing resource, the BBN named entity corpus. [sent-300, score-0.089]

98 The process we have demonstrated can be used to train a parser that returns dependencies that abstract away as much surface syntactic variation as possible including, now, even whether the predicate and arguments are expressed in a noun phrase or a full clause. [sent-303, score-0.416]

99 The parsing evaluation for this paper would have been much more difficult without the assistance of Stephen Boxwell, who helped generate the gold-standard dependencies with his software. [sent-309, score-0.096]

100 Corpus-oriented grammar development for acquiring a head-driven phrase structure grammar from the Penn Treebank. [sent-400, score-0.176]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ccgbank', 0.375), ('np', 0.284), ('peripheral', 0.205), ('npy', 0.195), ('ccg', 0.173), ('carthage', 0.156), ('rebanked', 0.156), ('genitive', 0.147), ('adnominals', 0.137), ('arguments', 0.132), ('curran', 0.125), ('propbank', 0.117), ('rome', 0.117), ('hockenmaier', 0.115), ('partitive', 0.103), ('steedman', 0.1), ('ptb', 0.098), ('npz', 0.098), ('nombank', 0.097), ('dependencies', 0.096), ('category', 0.095), ('argument', 0.092), ('changes', 0.09), ('analysed', 0.089), ('labelled', 0.088), ('corrections', 0.088), ('grammar', 0.088), ('constructions', 0.087), ('noun', 0.085), ('categories', 0.079), ('boxwell', 0.078), ('honnibal', 0.078), ('treebank', 0.078), ('adjunct', 0.076), ('distinctions', 0.074), ('complements', 0.073), ('vadas', 0.073), ('prepositional', 0.071), ('analyses', 0.071), ('combinatory', 0.065), ('adjuncts', 0.065), ('core', 0.065), ('penn', 0.064), ('restrictive', 0.064), ('realised', 0.063), ('headed', 0.061), ('punctuation', 0.061), ('brackets', 0.06), ('categorial', 0.059), ('destruction', 0.059), ('npmembers', 0.059), ('rebanking', 0.059), ('parser', 0.058), ('director', 0.056), ('parsers', 0.055), ('compound', 0.054), ('pp', 0.054), ('head', 0.054), ('supplied', 0.053), ('buy', 0.053), ('james', 0.052), ('cats', 0.051), ('verbparticle', 0.051), ('youtube', 0.051), ('portrait', 0.051), ('glass', 0.051), ('possessive', 0.051), ('distinction', 0.05), ('unlabelled', 0.049), ('modifiers', 0.049), ('oil', 0.047), ('particles', 0.047), ('meyers', 0.046), ('patient', 0.046), ('nominal', 0.045), ('predicate', 0.045), ('recognise', 0.044), ('nouns', 0.044), ('julia', 0.043), ('bracketing', 0.042), ('bies', 0.042), ('adnominal', 0.039), ('comparatives', 0.039), ('constable', 0.039), ('crafting', 0.039), ('deps', 0.039), ('deverbal', 0.039), ('fidditch', 0.039), ('genitives', 0.039), ('louvre', 0.039), ('restrictivity', 0.039), ('subcategorise', 0.039), ('treebanking', 0.039), ('woke', 0.039), ('xtag', 0.039), ('formalisms', 0.038), ('csli', 0.038), ('crude', 0.038), ('complement', 0.038), ('phrases', 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

2 0.37350318 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

3 0.23868561 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

Author: Sujith Ravi ; Jason Baldridge ; Kevin Knight

Abstract: We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.

4 0.2077097 228 acl-2010-The Importance of Rule Restrictions in CCG

Author: Marco Kuhlmann ; Alexander Koller ; Giorgio Satta

Abstract: Combinatory Categorial Grammar (CCG) is generally construed as a fully lexicalized formalism, where all grammars use one and the same universal set of rules, and crosslinguistic variation is isolated in the lexicon. In this paper, we show that the weak generative capacity of this ‘pure’ form of CCG is strictly smaller than that of CCG with grammar-specific rules, and of other mildly context-sensitive grammar formalisms, including Tree Adjoining Grammar (TAG). Our result also carries over to a multi-modal extension of CCG.

5 0.19551539 114 acl-2010-Faster Parsing by Supertagger Adaptation

Author: Jonathan K. Kummerfeld ; Jessika Roesner ; Tim Dawborn ; James Haggerty ; James R. Curran ; Stephen Clark

Abstract: We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highestscoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG supertagger and parser, obtain- ing significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.

6 0.18667176 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

7 0.17933327 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars

8 0.16773887 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

9 0.13440129 130 acl-2010-Hard Constraints for Grammatical Function Labelling

10 0.12508321 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

11 0.12247853 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

12 0.12027799 169 acl-2010-Learning to Translate with Source and Target Syntax

13 0.11602695 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

14 0.10582872 216 acl-2010-Starting from Scratch in Semantic Role Labeling

15 0.10574024 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning

16 0.10264599 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

17 0.10069577 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses

18 0.095413469 233 acl-2010-The Same-Head Heuristic for Coreference

19 0.090706997 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

20 0.088547371 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.235), (1, 0.051), (2, 0.241), (3, -0.009), (4, -0.114), (5, -0.077), (6, 0.153), (7, 0.062), (8, 0.209), (9, -0.026), (10, 0.322), (11, 0.092), (12, -0.269), (13, 0.106), (14, 0.025), (15, 0.041), (16, 0.052), (17, -0.017), (18, -0.031), (19, -0.034), (20, 0.035), (21, -0.062), (22, 0.043), (23, 0.042), (24, 0.036), (25, -0.044), (26, 0.039), (27, -0.061), (28, -0.019), (29, -0.006), (30, -0.011), (31, 0.037), (32, 0.047), (33, -0.05), (34, -0.012), (35, 0.013), (36, -0.04), (37, 0.007), (38, -0.089), (39, -0.019), (40, 0.044), (41, 0.049), (42, -0.019), (43, 0.008), (44, -0.05), (45, -0.005), (46, 0.001), (47, 0.066), (48, 0.051), (49, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95669484 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

2 0.85536474 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

3 0.78467655 228 acl-2010-The Importance of Rule Restrictions in CCG

Author: Marco Kuhlmann ; Alexander Koller ; Giorgio Satta

Abstract: Combinatory Categorial Grammar (CCG) is generally construed as a fully lexicalized formalism, where all grammars use one and the same universal set of rules, and crosslinguistic variation is isolated in the lexicon. In this paper, we show that the weak generative capacity of this ‘pure’ form of CCG is strictly smaller than that of CCG with grammar-specific rules, and of other mildly context-sensitive grammar formalisms, including Tree Adjoining Grammar (TAG). Our result also carries over to a multi-modal extension of CCG.

4 0.72283602 114 acl-2010-Faster Parsing by Supertagger Adaptation

Author: Jonathan K. Kummerfeld ; Jessika Roesner ; Tim Dawborn ; James Haggerty ; James R. Curran ; Stephen Clark

Abstract: We propose a novel self-training method for a parser which uses a lexicalised grammar and supertagger, focusing on increasing the speed of the parser rather than its accuracy. The idea is to train the supertagger on large amounts of parser output, so that the supertagger can learn to supply the supertags that the parser will eventually choose as part of the highestscoring derivation. Since the supertagger supplies fewer supertags overall, the parsing speed is increased. We demonstrate the effectiveness of the method using a CCG supertagger and parser, obtain- ing significant speed increases on newspaper text with no loss in accuracy. We also show that the method can be used to adapt the CCG parser to new domains, obtaining accuracy and speed improvements for Wikipedia and biomedical text.

5 0.67450362 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons

Author: Sujith Ravi ; Jason Baldridge ; Kevin Knight

Abstract: We combine two complementary ideas for learning supertaggers from highly ambiguous lexicons: grammar-informed tag transitions and models minimized via integer programming. Each strategy on its own greatly improves performance over basic expectation-maximization training with a bitag Hidden Markov Model, which we show on the CCGbank and CCG-TUT corpora. The strategies provide further error reductions when combined. We describe a new two-stage integer programming strategy that efficiently deals with the high degree of ambiguity on these datasets while obtaining the full effect of model minimization.

6 0.57175809 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars

7 0.52036417 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection

8 0.51734596 130 acl-2010-Hard Constraints for Grammatical Function Labelling

9 0.47301361 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

10 0.47152308 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

11 0.45417213 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

12 0.43875512 19 acl-2010-A Taxonomy, Dataset, and Classifier for Automatic Noun Compound Interpretation

13 0.38695589 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates

14 0.37570605 216 acl-2010-Starting from Scratch in Semantic Role Labeling

15 0.37329721 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

16 0.35817504 139 acl-2010-Identifying Generic Noun Phrases

17 0.34642607 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning

18 0.34575498 101 acl-2010-Entity-Based Local Coherence Modelling Using Topological Fields

19 0.34476125 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

20 0.33911398 41 acl-2010-Automatic Selectional Preference Acquisition for Latin Verbs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.015), (25, 0.512), (39, 0.011), (42, 0.011), (47, 0.012), (59, 0.058), (73, 0.03), (78, 0.079), (80, 0.013), (83, 0.053), (84, 0.023), (98, 0.075)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95438004 224 acl-2010-Talking NPCs in a Virtual Game World

Author: Tina Kluwer ; Peter Adolphs ; Feiyu Xu ; Hans Uszkoreit ; Xiwen Cheng

Abstract: This paper describes the KomParse system, a natural-language dialog system in the three-dimensional virtual world Twinity. In order to fulfill the various communication demands between nonplayer characters (NPCs) and users in such an online virtual world, the system realizes a flexible and hybrid approach combining knowledge-intensive domainspecific question answering, task-specific and domain-specific dialog with robust chatbot-like chitchat.

same-paper 2 0.94452322 203 acl-2010-Rebanking CCGbank for Improved NP Interpretation

Author: Matthew Honnibal ; James R. Curran ; Johan Bos

Abstract: Once released, treebanks tend to remain unchanged despite any shortcomings in their depth of linguistic analysis or coverage of specific phenomena. Instead, separate resources are created to address such problems. In this paper we show how to improve the quality of a treebank, by integrating resources and implementing improved analyses for specific constructions. We demonstrate this rebanking process by creating an updated version of CCGbank that includes the predicate-argument structure of both verbs and nouns, baseNP brackets, verb-particle constructions, and restrictive and non-restrictive nominal modifiers; and evaluate the impact of these changes on a statistical parser.

3 0.92045349 28 acl-2010-An Entity-Level Approach to Information Extraction

Author: Aria Haghighi ; Dan Klein

Abstract: We present a generative model of template-filling in which coreference resolution and role assignment are jointly determined. Underlying template roles first generate abstract entities, which in turn generate concrete textual mentions. On the standard corporate acquisitions dataset, joint resolution in our entity-level model reduces error over a mention-level discriminative approach by up to 20%.

4 0.84826559 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

5 0.79421777 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar

Author: Timothy A. D. Fowler ; Gerald Penn

Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.

6 0.78101361 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

7 0.77089167 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

8 0.59953082 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars

9 0.59841269 71 acl-2010-Convolution Kernel over Packed Parse Forest

10 0.59096634 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar

11 0.58427262 169 acl-2010-Learning to Translate with Source and Target Syntax

12 0.58312726 128 acl-2010-Grammar Prototyping and Testing with the LinGO Grammar Matrix Customization System

13 0.57139403 257 acl-2010-WSD as a Distributed Constraint Optimization Problem

14 0.56930542 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

15 0.56263268 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

16 0.55612493 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

17 0.55034679 121 acl-2010-Generating Entailment Rules from FrameNet

18 0.54781443 191 acl-2010-PCFGs, Topic Models, Adaptor Grammars and Learning Topical Collocations and the Structure of Proper Names

19 0.54384422 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

20 0.53602546 17 acl-2010-A Structured Model for Joint Learning of Argument Roles and Predicate Senses