acl acl2013 acl2013-57 knowledge-graph by maker-knowledge-mining

57 acl-2013-Arguments and Modifiers from the Learner's Perspective


Source: pdf

Author: Leon Bergen ; Edward Gibson ; Timothy J. O'Donnell

Abstract: We present a model for inducing sentential argument structure, which distinguishes arguments from optional modifiers. We use this model to study whether representing an argument/modifier distinction helps in learning argument structure, and whether a linguistically-natural argument/modifier distinction can be induced from distributional data alone. Our results provide evidence for both hypotheses.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present a model for inducing sentential argument structure, which distinguishes arguments from optional modifiers. [sent-5, score-0.413]

2 We use this model to study whether representing an argument/modifier distinction helps in learning argument structure, and whether a linguistically-natural argument/modifier distinction can be induced from distributional data alone. [sent-6, score-0.699]

3 1 Introduction A fundamental challenge facing the language learner is to determine the content and structure of the stored units in the lexicon. [sent-8, score-0.152]

4 This problem is made more difficult by the fact that many lexical units have argument structure. [sent-9, score-0.227]

5 The sentence, John put the socks is incomplete; when hearing such an utterance, a speaker of English will expect a location to also be specified: John put the socks in the drawer. [sent-11, score-0.706]

6 Facts such as these can be captured if the lexical entry for put also specifies that the verb has three required arguments: (i) who is doing the putting (ii) what is being put (iii) and the destination of the putting. [sent-12, score-0.249]

7 The problem of acquiring argument structure is further complicated by the fact that not all phrases in a sentence fill an argument role. [sent-13, score-0.506]

8 Consider the sentence John put the socks in the drawer at 5 o ’clock. [sent-15, score-0.5]

9 The phrase at 5 o ’clock occurs here with the verb put, but it is not an argument. [sent-16, score-0.036]

10 Removing this phrase does not change the core structure of the PUTTING event, nor is the sentence incomplete without this phrase. [sent-17, score-0.132]

11 The distinction between arguments and modifiers has a long history in traditional grammar and is leveraged in many modern theories of syntax (Haegeman, 1994; Steedman, 2001 ; Sag et al. [sent-18, score-0.469]

12 Despite the ubiquity of the distinc- JoNPhnVSNVP P JNoPhnVSNPVPPPPP put the socks in the drawer put the socks in the drawer at 5 o’clock Figure 1: The VP’s in these sentences only share structure if we separate arguments from modifiers. [sent-20, score-1.17]

13 tion in syntax, however, there is a lack of consensus on the necessary and sufficient conditions for argumenthood (Sch u¨tze, 1995; Sch u¨tze and Gibson, 1999). [sent-21, score-0.065]

14 It remains unclear whether the argument/modifier distinction is purely semantic or is also represented in syntax, whether it is binary or graded, and what effects argument/modifierhood have on the distribution of linguistic forms. [sent-22, score-0.277]

15 We propose that the argument/modifier distinction is inferred on a phrase–by–phrase basis using probabilistic inference. [sent-24, score-0.165]

16 Crucially, allowing the learner to separate the core argument structure of phrases from peripheral modifier content in- creases the generalizability of argument constructions. [sent-25, score-0.76]

17 For example, the two sentences in Figure 1 intuitively share the same argument structures, but this overlap can only be identified if the prepositional phrase, “at 5 o’clock,” is treated as a modifier. [sent-26, score-0.227]

18 Thus representing the argument/modifier distinction can help the learner find useful argument structures which generalize robustly. [sent-27, score-0.496]

19 Although, like the majority of theorists, we agree that the argument/adjunct distinction is fundamentally semantic, in this work we focus on its distributional correlates. [sent-28, score-0.21]

20 Does the optionality of modifier phrases help the learner acquire lexical items with the right argument structure? [sent-29, score-0.431]

21 2 Approach We adopt an approach where the lexicon consists of an inventory of stored tree fragments. [sent-30, score-0.239]

22 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 1 5–1 9, tree fragments encode the necessary phrase types (i. [sent-33, score-0.18]

23 In this system, sentences are generated by recursive substitution of tree fragments at the frontier argument nodes of other tree fragments. [sent-36, score-0.734]

24 1 To model modification, we introduce a second structure–building operation, adjunction. [sent-40, score-0.031]

25 While substitution must be licensed by the existence of an argument node, adjunction can insert constituents into well–formed trees. [sent-41, score-0.625]

26 Many syntactic theories have made use of an adjunction operation to model modification. [sent-42, score-0.312]

27 , 1995; Chiang and Bikel, 2002) which can insert a constituent as the sister to any node in an existing tree. [sent-44, score-0.496]

28 In order to derive the complete tree for a sentence, starting from an S root node, we recursively sample arguments and modifiers as follows. [sent-45, score-0.514]

29 2 For every nonterminal node on the frontier of our derivation, we sample an elementary tree from our lexicon to substitute into this node. [sent-46, score-0.917]

30 As already noted, these elementary trees represent the argument structure of our tree. [sent-47, score-0.764]

31 Then, for each argument nonterminal on the tree’s interior, we sister– adjoin one or more modifier nodes, which themselves are built by the same recursive process. [sent-48, score-0.562]

32 In the TSG derivation, at top, an elementary tree with four arguments including the intuitively optional temporal PP is used as the backbone for the derivation. [sent-50, score-0.77]

33 The four phrases filling these arguments are then substituted into the elementary tree, as indicated by arrows. [sent-51, score-0.505]

34 In the bottom derivation, which uses sister–adjunction, an elementary tree with only three arguments is used as the backbone. [sent-52, score-0.649]

35 While the right-most temporal PP needed to be an argument of the elementary tree in the TSG derivation, the bottom derivation uses sister– adjunction to insert this PP as a child of the VP. [sent-53, score-1.188]

36 Sister–adjunction therefore allows us to use an ar– – 1Note that we depart from many discussions of argument structure in that we do not require that every stored fragment has a head word. [sent-54, score-0.315]

37 In effect, we allow completely abstract phrasal constructions to also have argument structures. [sent-55, score-0.227]

38 gument structure that matches the true argument structure of the verb “put. [sent-57, score-0.331]

39 ” This figure illustrates how derivations in our model can have a greater degree of generalizability than those in a standard TSG. [sent-58, score-0.081]

40 Sister–adjunction will be used to derive children which are not part of the core argument structure, meaning that a greater variety of structures can be derived by a combination of common argument structures and sister-adjoined modifiers. [sent-59, score-0.629]

41 Importantly, this makes the learning problem for our model less sparse than for TSGs; our model can derive the trees in a corpus using fewer types of elementary trees than a TSG. [sent-60, score-0.706]

42 As a result, the distribution over these elementary trees is easier to estimate. [sent-61, score-0.531]

43 To understand what role modifiers play during learning, we will develop a learning model that can induce the lexicon and modifier contexts used by our generative model. [sent-62, score-0.385]

44 3 Model Our model extends earlier work on induction of Bayesian TSGs (Post and Gildea, 2009; O’Donnell, 2011; Cohn et al. [sent-63, score-0.031]

45 The model uses a Bayesian non–parametric distribution—the Pitman-Yor Process, to place a prior over the lexicon of elementary trees. [sent-65, score-0.477]

46 This distribution allows the complexity of the lexicon to grow to arbitrary size with the input, while still enforcing a bias for more compact lexicons. [sent-66, score-0.105]

47 16 1 For each nonterminal c, we define: Gc|ac, bc, PE ∼ PYP(ac, bc, PE(· |c)) (1) e|c, Gc ∼ Gc, (2) where PE(· |c) is a context free distribution over elementary ·t|rce)es i sro aot ceodn taet c, a frnede e diiss an elementary tree. [sent-67, score-0.929]

48 In addition to defining a distribution over elementary trees, we also define a distribution which governs modification via sister–adjunction. [sent-71, score-0.538]

49 To sample a modifier, we first decide whether or not to sister–adjoin into location lin a tree. [sent-72, score-0.101]

50 Following this step, we sample a modifier category (e. [sent-73, score-0.176]

51 , a PP) conditioned on the location l’s context: its parent and left siblings. [sent-75, score-0.032]

52 Because contexts are sparse, we use a backoff scheme based on hierarchical Dirichlet processes similar to the ngram backoff schemes defined in (Teh, 2006; Goldwater et al. [sent-76, score-0.102]

53 Let c be a nonterminal node in a tree derived by substitution into argument positions. [sent-78, score-0.687]

54 The node c will have n ≥ 1 children derived by argument wsuiblls htiatuvteio nn: d0, . [sent-79, score-0.372]

55 eInn o dredreivr etod s biyste arr–adjoin between two of these children di, di+1, we recursively sample nonterminals si,1 , . [sent-83, score-0.125]

56 , si,j−1 , c is the context for the j’th modifier between these children. [sent-95, score-0.14]

57 The distribution over sister–adjoined nonterminals is defined using a hierarchical Dirichlet process to implement backoff in a prefix tree over contexts. [sent-96, score-0.296]

58 , q1) over sister–adjoined nonterminals si,j given the context ql , . [sent-100, score-0.143]

59 (5) The distribution G at the root of the hierarchy is not conditioned on any prior context. [sent-110, score-0.046]

60 We define G by: G ∼ DP(α, Multinomial(m)) (6) where m is a vector with entries for each nonterminal, and where we sample m ∼ Dir(1,. [sent-111, score-0.036]

61 To perform inference, we developed a local Gibbs sampler which generalizes the one proposed by (Cohn et al. [sent-115, score-0.038]

62 First, we examine whether representing the argument/modifier distinction increases the ability of the model to learn highly generalizable elementary trees that can be used as argument structures across a variety of sentences. [sent-118, score-0.981]

63 Second, we ask whether our model is able to induce the correct argument/modifier distinction according to a linguistic gold–standard. [sent-119, score-0.229]

64 We trained our model on sections 2–21 of the WSJ part of the Penn Treebank (Marcus et al. [sent-120, score-0.031]

65 The model was trained on the trees in this corpus, without any further annotations for substitution or modification. [sent-122, score-0.225]

66 To address the first question, we compared the structure of the grammar learned by our model to a grammar learned by a version of our model without sister–adjunction (i. [sent-123, score-0.114]

67 Our model should find more common structure among the trees in the input corpus, and therefore it should learn a set of elementary trees which are more complex and more widely shared across sentences. [sent-127, score-0.666]

68 We evaluated this hypothesis by analyzing the average complexity of the most probable elementary trees learned by these models. [sent-128, score-0.485]

69 As Table 1 shows, our model discovers elementary trees that have greater depth and more nodes than those found by the TSG. [sent-129, score-0.598]

70 In addition, our model accounts for a larger portion ofthe corpus with fewer rules: the top 50, 100, and 200 most common elementary trees in our model’s lexicon account for a greater portion of the corpus than the corresponding sets in the TSG. [sent-130, score-0.575]

71 By using sister-adjuntion to separate the ADVP node from the rest of the sentence’s derivation, our model was able to use a common depth-3 elementary tree to derive the backbone of the sentence. [sent-132, score-0.784]

72 In contrast, the TSG cannot give the same derivation, as it needs to include the ADVP 117 S NP AADDVVPP VP wNhoP s imimp ly VPPP Most of those who left stock funds VBD yPmPa s w VitBc hDhe d into money market funds Figure 3: Part of a derivation found by our model. [sent-133, score-0.182]

73 i39v82zge487tr198#734T260o842k503e2450ns Table 1: This table shows the average depth and node count for elementary trees in our model and the TSG. [sent-136, score-0.657]

74 The results are shown for the 50, 100, and 200 most frequent types of elementary trees. [sent-137, score-0.387]

75 node in the elementary tree; this wider elementary tree is much less common in the corpus. [sent-138, score-1.029]

76 We next examined whether our model learned to correctly identify modifiers in the corpus. [sent-139, score-0.219]

77 This corpus adds annotations indicating, for each node in the Penn Treebank, whether that node is a modifier. [sent-144, score-0.255]

78 , 2005) with a set of heuristics, as well as the NPbranching structures proposed in (Vadas and Curran, 2007). [sent-146, score-0.04]

79 Our model was trained on this corpus, after it had been stripped of argument/modifier annotations. [sent-148, score-0.031]

80 Our model constrains every non- terminal to have at least one argument child, and our Gibbs sampler initializes argument/modifier choices randomly subject to this constraint. [sent-150, score-0.296]

81 1 ec95al2#190G8 u3 e98s4s2ed#8627C75o01r26rect Table 2: This table shows precision and recall in identifying modifier nodes in the corpus. [sent-153, score-0.192]

82 therefore calculated the probability that a node that was randomly initialized as a modifier was in fact a modifier, i. [sent-154, score-0.251]

83 Next, we looked at the precision of our model following training. [sent-157, score-0.077]

84 Table 2 shows that among nodes that were labeled as modifiers, 0. [sent-158, score-0.052]

85 This table also shows the recall performance for our model decreased by 0. [sent-161, score-0.031]

86 Some of this decrease is due to limitations of the gold standard; for example, our model learns to classify infinitives and auxiliary verbs as arguments consistent with standard linguistic anal— yses whereas the gold standard classifies these as modifiers. [sent-163, score-0.149]

87 — 5 Summary We have investigated the role of the argument/modifier distinction in learning. [sent-165, score-0.165]

88 We first looked at whether introducing this distinction helps in generalizing from an input corpus. [sent-166, score-0.244]

89 Our model, which represents modification using sister–adjunction, learns a richer lexicon than a model without modification, and its lexicon provides a more compact representation of the input corpus. [sent-167, score-0.208]

90 We next looked at whether the traditional linguistic classification of arguments and modifiers can be induced from distributional information. [sent-168, score-0.397]

91 Without supervision from the correct labelings of modifiers, our model learned to identify modifiers more accurately than chance. [sent-169, score-0.186]

92 This suggests that although the argument/modifier distinction is traditionally drawn without reference to distributional properties, the distributional corre- lates of this distinction are sufficient to partially reconstruct it from a corpus. [sent-170, score-0.42]

93 Taken together, these results suggest that representing the difference between arguments and modifiers may make it easier to acquire a language’s argument structure. [sent-171, score-0.5]

94 German and English treebanks and lexica for tree– adjoining grammars. [sent-198, score-0.062]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('elementary', 0.387), ('sister', 0.333), ('adjunction', 0.25), ('socks', 0.228), ('argument', 0.227), ('distinction', 0.165), ('drawer', 0.163), ('modifiers', 0.155), ('tsg', 0.15), ('clock', 0.144), ('tree', 0.144), ('modifier', 0.14), ('donnell', 0.125), ('arguments', 0.118), ('node', 0.111), ('nonterminal', 0.109), ('put', 0.109), ('trees', 0.098), ('substitution', 0.096), ('derivation', 0.094), ('pp', 0.094), ('ql', 0.088), ('adjoin', 0.086), ('tsgs', 0.075), ('cohn', 0.075), ('pe', 0.075), ('frontier', 0.071), ('mit', 0.069), ('gc', 0.066), ('argumenthood', 0.065), ('carson', 0.065), ('demberg', 0.065), ('dpr', 0.065), ('kaeshammer', 0.065), ('wasow', 0.065), ('learner', 0.064), ('adjoining', 0.062), ('derive', 0.061), ('brain', 0.06), ('lexicon', 0.059), ('modification', 0.059), ('stop', 0.058), ('spo', 0.058), ('nonterminals', 0.055), ('adjoined', 0.053), ('bergen', 0.053), ('gibson', 0.053), ('insert', 0.052), ('tze', 0.052), ('nodes', 0.052), ('structure', 0.052), ('backoff', 0.051), ('generalizability', 0.05), ('advp', 0.05), ('backbone', 0.05), ('vadas', 0.05), ('vera', 0.05), ('productivity', 0.048), ('sch', 0.046), ('distribution', 0.046), ('vp', 0.046), ('looked', 0.046), ('sag', 0.046), ('distributional', 0.045), ('incomplete', 0.044), ('funds', 0.044), ('cognitive', 0.041), ('timothy', 0.04), ('structures', 0.04), ('pc', 0.039), ('penn', 0.039), ('sampler', 0.038), ('optional', 0.037), ('bayesian', 0.037), ('post', 0.037), ('stored', 0.036), ('sample', 0.036), ('treebank', 0.036), ('phrase', 0.036), ('children', 0.034), ('temporal', 0.034), ('ep', 0.034), ('cj', 0.034), ('bc', 0.034), ('rambow', 0.034), ('whether', 0.033), ('goldwater', 0.032), ('reuse', 0.032), ('location', 0.032), ('di', 0.032), ('non', 0.032), ('chiang', 0.032), ('putting', 0.031), ('theories', 0.031), ('dp', 0.031), ('sharon', 0.031), ('edward', 0.031), ('sc', 0.031), ('model', 0.031), ('depth', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 57 acl-2013-Arguments and Modifiers from the Learner's Perspective

Author: Leon Bergen ; Edward Gibson ; Timothy J. O'Donnell

Abstract: We present a model for inducing sentential argument structure, which distinguishes arguments from optional modifiers. We use this model to study whether representing an argument/modifier distinction helps in learning argument structure, and whether a linguistically-natural argument/modifier distinction can be induced from distributional data alone. Our results provide evidence for both hypotheses.

2 0.3011601 4 acl-2013-A Context Free TAG Variant

Author: Ben Swanson ; Elif Yamangil ; Eugene Charniak ; Stuart Shieber

Abstract: We propose a new variant of TreeAdjoining Grammar that allows adjunction of full wrapping trees but still bears only context-free expressivity. We provide a transformation to context-free form, and a further reduction in probabilistic model size through factorization and pooling of parameters. This collapsed context-free form is used to implement efficient gram- mar estimation and parsing algorithms. We perform parsing experiments the Penn Treebank and draw comparisons to TreeSubstitution Grammars and between different variations in probabilistic model design. Examination of the most probable derivations reveals examples of the linguistically relevant structure that our variant makes possible.

3 0.2942777 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars

Author: Elif Yamangil ; Stuart M. Shieber

Abstract: In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. Our work shows performance improvements on the Penn Treebank and finds more compact yet linguistically rich representations of the data, but more importantly provides techniques in grammar transformation and statistical inference that make practical the use of these more expressive systems, thereby enabling further experimentation along these lines.

4 0.11410574 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

Author: Matt Post ; Shane Bergsma

Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.

5 0.11376939 314 acl-2013-Semantic Roles for String to Tree Machine Translation

Author: Marzieh Bazrafshan ; Daniel Gildea

Abstract: We experiment with adding semantic role information to a string-to-tree machine translation system based on the rule extraction procedure of Galley et al. (2004). We compare methods based on augmenting the set of nonterminals by adding semantic role labels, and altering the rule extraction process to produce a separate set of rules for each predicate that encompass its entire predicate-argument structure. Our results demonstrate that the second approach is effective in increasing the quality of translations.

6 0.10830528 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data

7 0.10817141 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction

8 0.10023125 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features

9 0.091287501 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

10 0.089778371 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

11 0.087643415 27 acl-2013-A Two Level Model for Context Sensitive Inference Rules

12 0.086821951 80 acl-2013-Chinese Parsing Exploiting Characters

13 0.086213753 189 acl-2013-ImpAr: A Deterministic Algorithm for Implicit Semantic Role Labelling

14 0.084088013 306 acl-2013-SPred: Large-scale Harvesting of Semantic Predicates

15 0.083807312 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models

16 0.080145724 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing

17 0.076745786 94 acl-2013-Coordination Structures in Dependency Treebanks

18 0.073872223 275 acl-2013-Parsing with Compositional Vector Grammars

19 0.071456589 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing

20 0.069701448 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.18), (1, -0.049), (2, -0.08), (3, -0.063), (4, -0.165), (5, 0.09), (6, 0.054), (7, 0.046), (8, -0.011), (9, 0.017), (10, 0.053), (11, 0.064), (12, 0.14), (13, -0.055), (14, -0.074), (15, -0.157), (16, 0.181), (17, 0.13), (18, -0.025), (19, 0.077), (20, 0.099), (21, 0.03), (22, 0.181), (23, 0.077), (24, -0.065), (25, -0.11), (26, 0.043), (27, -0.062), (28, 0.117), (29, 0.028), (30, -0.096), (31, -0.063), (32, -0.012), (33, -0.065), (34, -0.017), (35, 0.005), (36, 0.048), (37, 0.077), (38, 0.025), (39, -0.114), (40, 0.06), (41, -0.157), (42, -0.02), (43, -0.014), (44, -0.099), (45, -0.037), (46, 0.109), (47, -0.066), (48, 0.021), (49, -0.006)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94549435 57 acl-2013-Arguments and Modifiers from the Learner's Perspective

Author: Leon Bergen ; Edward Gibson ; Timothy J. O'Donnell

Abstract: We present a model for inducing sentential argument structure, which distinguishes arguments from optional modifiers. We use this model to study whether representing an argument/modifier distinction helps in learning argument structure, and whether a linguistically-natural argument/modifier distinction can be induced from distributional data alone. Our results provide evidence for both hypotheses.

2 0.91339326 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars

Author: Elif Yamangil ; Stuart M. Shieber

Abstract: In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. Our work shows performance improvements on the Penn Treebank and finds more compact yet linguistically rich representations of the data, but more importantly provides techniques in grammar transformation and statistical inference that make practical the use of these more expressive systems, thereby enabling further experimentation along these lines.

3 0.89582711 4 acl-2013-A Context Free TAG Variant

Author: Ben Swanson ; Elif Yamangil ; Eugene Charniak ; Stuart Shieber

Abstract: We propose a new variant of TreeAdjoining Grammar that allows adjunction of full wrapping trees but still bears only context-free expressivity. We provide a transformation to context-free form, and a further reduction in probabilistic model size through factorization and pooling of parameters. This collapsed context-free form is used to implement efficient gram- mar estimation and parsing algorithms. We perform parsing experiments the Penn Treebank and draw comparisons to TreeSubstitution Grammars and between different variations in probabilistic model design. Examination of the most probable derivations reveals examples of the linguistically relevant structure that our variant makes possible.

4 0.45694092 165 acl-2013-General binarization for parsing and translation

Author: Matthias Buchse ; Alexander Koller ; Heiko Vogler

Abstract: Binarization ofgrammars is crucial for improving the complexity and performance of parsing and translation. We present a versatile binarization algorithm that can be tailored to a number of grammar formalisms by simply varying a formal parameter. We apply our algorithm to binarizing tree-to-string transducers used in syntax-based machine translation.

5 0.45507887 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

Author: Shay B. Cohen ; Mark Johnson

Abstract: Probabilistic context-free grammars have the unusual property of not always defining tight distributions (i.e., the sum of the “probabilities” of the trees the grammar generates can be less than one). This paper reviews how this non-tightness can arise and discusses its impact on Bayesian estimation of PCFGs. We begin by presenting the notion of “almost everywhere tight grammars” and show that linear CFGs follow it. We then propose three different ways of reinterpreting non-tight PCFGs to make them tight, show that the Bayesian estimators in Johnson et al. (2007) are correct under one of them, and provide MCMC samplers for the other two. We conclude with a discussion of the impact of tightness empirically.

6 0.41884595 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

7 0.41325015 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

8 0.37278008 357 acl-2013-Transfer Learning for Constituency-Based Grammars

9 0.36206752 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation

10 0.36127016 311 acl-2013-Semantic Neighborhoods as Hypergraphs

11 0.34456432 275 acl-2013-Parsing with Compositional Vector Grammars

12 0.34092677 299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts

13 0.33950892 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

14 0.33480969 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics

15 0.3325254 376 acl-2013-Using Lexical Expansion to Learn Inference Rules from Sparse Data

16 0.32982108 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation

17 0.32911864 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

18 0.32242453 349 acl-2013-The mathematics of language learning

19 0.31861231 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing

20 0.31768781 270 acl-2013-ParGramBank: The ParGram Parallel Treebank


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.058), (2, 0.035), (6, 0.026), (11, 0.086), (14, 0.024), (15, 0.015), (24, 0.042), (26, 0.068), (35, 0.083), (42, 0.053), (48, 0.057), (70, 0.061), (76, 0.217), (88, 0.03), (90, 0.025), (95, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82814592 57 acl-2013-Arguments and Modifiers from the Learner's Perspective

Author: Leon Bergen ; Edward Gibson ; Timothy J. O'Donnell

Abstract: We present a model for inducing sentential argument structure, which distinguishes arguments from optional modifiers. We use this model to study whether representing an argument/modifier distinction helps in learning argument structure, and whether a linguistically-natural argument/modifier distinction can be induced from distributional data alone. Our results provide evidence for both hypotheses.

2 0.77852613 147 acl-2013-Exploiting Topic based Twitter Sentiment for Stock Prediction

Author: Jianfeng Si ; Arjun Mukherjee ; Bing Liu ; Qing Li ; Huayi Li ; Xiaotie Deng

Abstract: This paper proposes a technique to leverage topic based sentiments from Twitter to help predict the stock market. We first utilize a continuous Dirichlet Process Mixture model to learn the daily topic set. Then, for each topic we derive its sentiment according to its opinion words distribution to build a sentiment time series. We then regress the stock index and the Twitter sentiment time series to predict the market. Experiments on real-life S&P100; Index show that our approach is effective and performs better than existing state-of-the-art non-topic based methods. 1

3 0.73177153 312 acl-2013-Semantic Parsing as Machine Translation

Author: Jacob Andreas ; Andreas Vlachos ; Stephen Clark

Abstract: Semantic parsing is the problem of deriving a structured meaning representation from a natural language utterance. Here we approach it as a straightforward machine translation task, and demonstrate that standard machine translation components can be adapted into a semantic parser. In experiments on the multilingual GeoQuery corpus we find that our parser is competitive with the state of the art, and in some cases achieves higher accuracy than recently proposed purpose-built systems. These results support the use of machine translation methods as an informative baseline in semantic parsing evaluations, and suggest that research in semantic parsing could benefit from advances in machine translation.

4 0.65086454 275 acl-2013-Parsing with Compositional Vector Grammars

Author: Richard Socher ; John Bauer ; Christopher D. Manning ; Ng Andrew Y.

Abstract: Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.

5 0.64741939 4 acl-2013-A Context Free TAG Variant

Author: Ben Swanson ; Elif Yamangil ; Eugene Charniak ; Stuart Shieber

Abstract: We propose a new variant of TreeAdjoining Grammar that allows adjunction of full wrapping trees but still bears only context-free expressivity. We provide a transformation to context-free form, and a further reduction in probabilistic model size through factorization and pooling of parameters. This collapsed context-free form is used to implement efficient gram- mar estimation and parsing algorithms. We perform parsing experiments the Penn Treebank and draw comparisons to TreeSubstitution Grammars and between different variations in probabilistic model design. Examination of the most probable derivations reveals examples of the linguistically relevant structure that our variant makes possible.

6 0.63277596 318 acl-2013-Sentiment Relevance

7 0.62699211 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

8 0.62521911 225 acl-2013-Learning to Order Natural Language Texts

9 0.62329012 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars

10 0.62219429 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search

11 0.62112707 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

12 0.62035716 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

13 0.61945426 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing

14 0.6192103 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

15 0.6180774 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

16 0.61786354 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

17 0.61740828 85 acl-2013-Combining Intra- and Multi-sentential Rhetorical Parsing for Document-level Discourse Analysis

18 0.61730409 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation

19 0.61691171 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

20 0.61666119 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging