acl acl2013 acl2013-261 knowledge-graph by maker-knowledge-mining

261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars

Source: pdf

Author: Elif Yamangil ; Stuart M. Shieber

Abstract: In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. Our work shows performance improvements on the Penn Treebank and finds more compact yet linguistically rich representations of the data, but more importantly provides techniques in grammar transformation and statistical inference that make practical the use of these more expressive systems, thereby enabling further experimentation along these lines.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu i Abstract In the line of research extending statistical parsing to more expressive grammar formalisms, we demonstrate for the first time the use of tree-adjoining grammars (TAG). [sent-4, score-0.349]

2 We present a Bayesian nonparametric model for estimating a probabilistic TAG from a parsed corpus, along with novel block sampling methods and approximation transformations for TAG that allow efficient parsing. [sent-5, score-0.35]

3 Tree-substitution grammars (TSG), by expanding the domain of locality of context-free grammars (CFG), can achieve better expressivity, and the ability to model more contextual dependencies; the payoff would be better modeling of the data or smaller (sparser) models or both. [sent-8, score-0.315]

4 Recent work that incorporated Dirichlet process (DP) nonparametric models into TSGs has provided an efficient solution to the daunting — — — model selection problem of segmenting training data trees into appropriate elementary fragments to form the grammar (Cohn et al. [sent-10, score-0.538]

5 TSGs are a special case of the more flexible grammar formalism of tree adjoining grammar (TAG) (Joshi et al. [sent-13, score-0.295]

6 TAG augments TSG with an adjunction operator and a set of auxiliary trees in addition to the substitution operator and initial trees of TSG, allowing for “splicing in” of syntactic fragments within trees. [sent-15, score-1.059]

7 This functionality allows for better modeling of linguistic phenomena such as the distinction between modifiers and arguments (Joshi et al. [sent-16, score-0.027]

8 Unfortunately, TAG’s expressivity comes at the cost of greatly increased complexity. [sent-18, score-0.087]

9 Parsing complexity for unconstrained TAG scales as O(n6), impractical as compared to CFG and TSG’s O(n3). [sent-19, score-0.055]

10 In addition, the model selection problem for TAG is significantly more complicated than for TSG since one must reason about many more combinatorial options with two types of derivation operators. [sent-20, score-0.042]

11 For example, one can consider “outsourcing” the auxiliary trees (Shieber, 2007), use template rules and a very small number of grammar categories (Hwa, 1998), or rely on head-words and force lexicalization in order to constrain the problem (Xia et al. [sent-23, score-0.397]

12 However a solution has not been put forward by which a model that maximizes a principled probabilistic objective is sought after. [sent-28, score-0.03]

13 Recent work by Cohn and Blunsom (2010) argued that under highly expressive grammars such as TSGs where exponentially many derivations may be hypothesized of the data, local Gibbs sam- pling is insufficient for effective inference and global blocked sampling strategies will be necessary. [sent-29, score-0.397]

14 For TAG, this problem is only more severe due to its mild context-sensitivity and even richer combinatorial nature. [sent-30, score-0.071]

15 (201 1) and Yamangil and Shieber (2012) used tree-insertion grammar (TIG) as a kind of expressive compromise between TSG and TAG, as a substrate on which to build nonparametric inference. [sent-32, score-0.364]

16 However TIG has the constraint of disallowing wrapping adjunction (coordination between material that falls to the left and right of the point of adjunction, such as parentheticals and quotations) as well as left adjunction along the spine of a right auxiliary tree and vice versa. [sent-33, score-1.155]

17 In this work we formulate a blocked sampling strategy for TAG that is effective and efficient, and prove its superiority against the local Gibbs sampling approach. [sent-34, score-0.174]

18 We show via nonparametric inference that TAG, which contains TSG as a subset, is a better model for treebank data than TSG and leads to improved parsing performance. [sent-35, score-0.29]

19 TAG achieves this by using more compact grammars than TSG and by providing the ability to make finer-grained linguistic distinctions. [sent-36, score-0.176]

20 We explain how our parameter refinement scheme for TAG allows for cubic-time CFG parsing, which is just as efficient as TSG parsing. [sent-37, score-0.116]

21 Our presentation assumes familiarity with prior work on block sampling of TSG and TIG (Cohn and Blunsom, 2010; Shindo et al. [sent-38, score-0.125]

22 Also assume that the node uniquely identified by α[p] has Goodman index i, which we denote as i= G(α[p] ). [sent-41, score-0.116]

23 The general idea of this TAG-TSG approximation is that, for any auxiliary tree that adjoins at a node ν with Goodman index i, we create an initial tree out of it where the root and foot nodes of the auxiliary tree are both replaced by i. [sent-42, score-1.104]

24 Further, we split the subtree rooted at ν from its parent and rename the substitution site that is newly created at ν as i as well. [sent-43, score-0.433]

25 ) We can separate the foot subtree from the rest of the initial tree since it is completely remembered by any adjoined auxiliary trees due to the nature of our refinement scheme. [sent-45, score-0.763]

26 However this method fails for adjunctions that occur at spinal nodes of auxiliary trees that have foot nodes below them since we would not know in which order to do the initial tree creation. [sent-46, score-1.014]

27 However when the spine-adjunction relation is amenable to a topological sort (as is the case in Figure 2), we can apply the method by going in this order and doing some extra bookkeeping: updating the list of Goodman indices and redirecting adjunctions as we go along. [sent-47, score-0.602]

28 When there is no such topological sort, we can approximate the TAG by heuristically dropping low-frequency adjunctions that introduce cycles. [sent-48, score-0.513]

29 In (1) we see the original TAG grammar and its adjunctions (n, m, k are adjunction counts). [sent-50, score-0.829]

30 Note that the adjunction relation has a topological sort of β, γ. [sent-51, score-0.588]

31 We process auxiliary trees in this order and iteratively remove their adjunctions by creating specialized initial tree duplicates. [sent-52, score-0.901]

32 In (2) we first visit β, which has adjunctions into α at the node denoted α[p] where p is the unique path from the root to this node. [sent-53, score-0.537]

33 We retrieve the Goodman index of this node i = G(α[p]), split the subtree rooted at this node as a new initial tree αi, relabel its root as i, and rename the newly-created substitution site at α[p] as i. [sent-54, score-0.949]

34 Since β has only this adjunction, we replace it with initial tree version βi where root/foot labels of β are replaced with i, and update all adjunctions into β as being into βi. [sent-55, score-0.572]

35 In (3) we visit γ which now has adjunctions into α and βi. [sent-56, score-0.417]

36 For the α[p] adjunction we create γi the same way we created βi but this time we cannot remove γ as it still has an adjunction into βi. [sent-57, score-0.797]

37 We retrieve the Goodman index of the node of adjunction j = G(βi [q]), split the subtree rooted at this node as new initial tree βij, relabel its root as j, and rename the newly-created substitution site at βi [q] as j. [sent-58, score-1.33]

38 Since γ now has only this adjunction left, we remove it by also creating initial tree version γj where root/foot labels of γ are reα, placed with j. [sent-59, score-0.595]

39 At this point we have an adjunctionfree TSG with elementary trees (and counts) α(l) , αi (l) , βi(n) , βij (n) , γi(m) , γj (k) where l is the count ofinitial tree α. [sent-60, score-0.266]

40 These counts, when they are normalized, lead to the appropriate adjunc- 1We found that, on average, about half of our grammars have a topological sort of their spine-adjunctions. [sent-61, score-0.351]

41 (On average fewer than 100 spine adjunctions even exist. [sent-62, score-0.434]

42 ) When no such sort exists, only a few low-frequency adjunctions have to be removed to eliminate cycles. [sent-63, score-0.452]

43 600 Sentence elngth (#tokens) × Figure 3: Nonparametric TAG (blue) parsing is efficient and incurs only a small increase in parsing time compared to nonparametric TSG (red). [sent-64, score-0.324]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('tsg', 0.476), ('adjunction', 0.381), ('adjunctions', 0.366), ('auxiliary', 0.188), ('nonparametric', 0.177), ('tag', 0.165), ('shieber', 0.159), ('grammars', 0.144), ('goodman', 0.136), ('yamangil', 0.135), ('tsgs', 0.127), ('rename', 0.124), ('topological', 0.121), ('tree', 0.105), ('tig', 0.101), ('trees', 0.097), ('foot', 0.091), ('expressivity', 0.087), ('sort', 0.086), ('grammar', 0.082), ('dp', 0.079), ('initial', 0.074), ('shindo', 0.073), ('expressive', 0.073), ('cfg', 0.072), ('aj', 0.072), ('node', 0.071), ('substitution', 0.069), ('refinement', 0.069), ('rooted', 0.069), ('subtree', 0.068), ('spine', 0.068), ('relabel', 0.068), ('site', 0.065), ('elementary', 0.064), ('sampling', 0.06), ('cohn', 0.057), ('blocked', 0.054), ('visit', 0.051), ('parsing', 0.05), ('root', 0.049), ('harvard', 0.047), ('efficient', 0.047), ('operator', 0.045), ('ij', 0.045), ('index', 0.045), ('blunsom', 0.044), ('combinatorial', 0.042), ('joshi', 0.041), ('split', 0.038), ('block', 0.038), ('elif', 0.037), ('spinal', 0.037), ('splicing', 0.037), ('xtag', 0.037), ('daunting', 0.037), ('bookkeeping', 0.037), ('pling', 0.037), ('remembered', 0.037), ('specialized', 0.036), ('remove', 0.035), ('treebank', 0.034), ('fragments', 0.034), ('adjoined', 0.034), ('quotations', 0.034), ('tension', 0.034), ('stepwise', 0.034), ('retrieve', 0.033), ('compromise', 0.032), ('wrapping', 0.032), ('outsourcing', 0.032), ('primitives', 0.032), ('gibbs', 0.032), ('compact', 0.032), ('doran', 0.03), ('sparser', 0.03), ('lexicalization', 0.03), ('exchangeable', 0.03), ('sought', 0.03), ('inference', 0.029), ('counts', 0.029), ('amenable', 0.029), ('mild', 0.029), ('augments', 0.029), ('approximation', 0.028), ('nodes', 0.028), ('stuart', 0.028), ('gc', 0.028), ('unconstrained', 0.028), ('replaced', 0.027), ('familiarity', 0.027), ('impractical', 0.027), ('nonzero', 0.027), ('locality', 0.027), ('phenomena', 0.027), ('chiang', 0.027), ('adjoining', 0.026), ('concentration', 0.026), ('preventing', 0.026), ('dropping', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars

Author: Elif Yamangil ; Stuart M. Shieber

2 0.53218663 4 acl-2013-A Context Free TAG Variant

Author: Ben Swanson ; Elif Yamangil ; Eugene Charniak ; Stuart Shieber

Abstract: We propose a new variant of TreeAdjoining Grammar that allows adjunction of full wrapping trees but still bears only context-free expressivity. We provide a transformation to context-free form, and a further reduction in probabilistic model size through factorization and pooling of parameters. This collapsed context-free form is used to implement efficient gram- mar estimation and parsing algorithms. We perform parsing experiments the Penn Treebank and draw comparisons to TreeSubstitution Grammars and between different variations in probabilistic model design. Examination of the most probable derivations reveals examples of the linguistically relevant structure that our variant makes possible.

3 0.2942777 57 acl-2013-Arguments and Modifiers from the Learner's Perspective

Author: Leon Bergen ; Edward Gibson ; Timothy J. O'Donnell

Abstract: We present a model for inducing sentential argument structure, which distinguishes arguments from optional modifiers. We use this model to study whether representing an argument/modifier distinction helps in learning argument structure, and whether a linguistically-natural argument/modifier distinction can be induced from distributional data alone. Our results provide evidence for both hypotheses.

4 0.15072316 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

Author: Matt Post ; Shane Bergsma

Abstract: Syntactic features are useful for many text classification tasks. Among these, tree kernels (Collins and Duffy, 2001) have been perhaps the most robust and effective syntactic tool, appealing for their empirical success, but also because they do not require an answer to the difficult question of which tree features to use for a given task. We compare tree kernels to different explicit sets of tree features on five diverse tasks, and find that explicit features often perform as well as tree kernels on accuracy and always in orders of magnitude less time, and with smaller models. Since explicit features are easy to generate and use (with publicly avail- able tools) , we suggest they should always be included as baseline comparisons in tree kernel method evaluations.

5 0.097321212 357 acl-2013-Transfer Learning for Constituency-Based Grammars

Author: Yuan Zhang ; Regina Barzilay ; Amir Globerson

Abstract: In this paper, we consider the problem of cross-formalism transfer in parsing. We are interested in parsing constituencybased grammars such as HPSG and CCG using a small amount of data specific for the target formalism, and a large quantity of coarse CFG annotations from the Penn Treebank. While all of the target formalisms share a similar basic syntactic structure with Penn Treebank CFG, they also encode additional constraints and semantic features. To handle this apparent discrepancy, we design a probabilistic model that jointly generates CFG and target formalism parses. The model includes features of both parses, allowing trans- fer between the formalisms, while preserving parsing efficiency. We evaluate our approach on three constituency-based grammars CCG, HPSG, and LFG, augmented with the Penn Treebank-1. Our experiments show that across all three formalisms, the target parsers significantly benefit from the coarse annotations.1 —

6 0.086410739 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

7 0.072522946 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

8 0.072370067 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

9 0.069566071 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

10 0.063391864 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

11 0.054843772 80 acl-2013-Chinese Parsing Exploiting Characters

12 0.053646397 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

13 0.051486418 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

14 0.051212829 165 acl-2013-General binarization for parsing and translation

15 0.050448764 311 acl-2013-Semantic Neighborhoods as Hypergraphs

16 0.049240977 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing

17 0.047454286 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

18 0.047127184 136 acl-2013-Enhanced and Portable Dependency Projection Algorithms Using Interlinear Glossed Text

19 0.046693236 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing

20 0.045625579 275 acl-2013-Parsing with Compositional Vector Grammars

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.123), (1, -0.079), (2, -0.092), (3, -0.01), (4, -0.156), (5, 0.031), (6, 0.103), (7, -0.016), (8, 0.011), (9, -0.041), (10, 0.069), (11, 0.036), (12, 0.115), (13, -0.1), (14, -0.104), (15, -0.172), (16, 0.213), (17, 0.213), (18, -0.127), (19, 0.048), (20, 0.119), (21, 0.05), (22, 0.234), (23, 0.073), (24, -0.164), (25, -0.14), (26, 0.026), (27, -0.094), (28, 0.162), (29, 0.05), (30, -0.126), (31, -0.012), (32, -0.006), (33, -0.067), (34, -0.021), (35, -0.002), (36, 0.089), (37, 0.133), (38, 0.032), (39, -0.144), (40, 0.094), (41, -0.167), (42, -0.031), (43, 0.069), (44, -0.115), (45, 0.011), (46, 0.107), (47, -0.048), (48, 0.065), (49, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96580309 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars

Author: Elif Yamangil ; Stuart M. Shieber

2 0.91693461 4 acl-2013-A Context Free TAG Variant

Author: Ben Swanson ; Elif Yamangil ; Eugene Charniak ; Stuart Shieber

3 0.81261957 57 acl-2013-Arguments and Modifiers from the Learner's Perspective

Author: Leon Bergen ; Edward Gibson ; Timothy J. O'Donnell

4 0.4160538 165 acl-2013-General binarization for parsing and translation

Author: Matthias Buchse ; Alexander Koller ; Heiko Vogler

Abstract: Binarization ofgrammars is crucial for improving the complexity and performance of parsing and translation. We present a versatile binarization algorithm that can be tailored to a number of grammar formalisms by simply varying a formal parameter. We apply our algorithm to binarizing tree-to-string transducers used in syntax-based machine translation.

5 0.40615505 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

Author: Matt Post ; Shane Bergsma

6 0.40349111 348 acl-2013-The effect of non-tightness on Bayesian estimation of PCFGs

7 0.37814435 274 acl-2013-Parsing Graphs with Hyperedge Replacement Grammars

8 0.34044957 357 acl-2013-Transfer Learning for Constituency-Based Grammars

9 0.30753863 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models

10 0.27844039 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

11 0.27743739 311 acl-2013-Semantic Neighborhoods as Hypergraphs

12 0.26915526 275 acl-2013-Parsing with Compositional Vector Grammars

13 0.26639652 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation

14 0.25768566 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing

15 0.24727975 349 acl-2013-The mathematics of language learning

16 0.24595408 299 acl-2013-Reconstructing an Indo-European Family Tree from Non-native English Texts

17 0.24457277 270 acl-2013-ParGramBank: The ParGram Parallel Treebank

18 0.2353531 161 acl-2013-Fluid Construction Grammar for Historical and Evolutionary Linguistics

19 0.23082019 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

20 0.23046082 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.05), (2, 0.34), (6, 0.036), (11, 0.068), (14, 0.032), (24, 0.028), (26, 0.067), (28, 0.016), (35, 0.08), (42, 0.036), (48, 0.035), (70, 0.053), (88, 0.029), (95, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83316666 261 acl-2013-Nonparametric Bayesian Inference and Efficient Parsing for Tree-adjoining Grammars

Author: Elif Yamangil ; Stuart M. Shieber

2 0.65381908 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions

Author: Xiaoming Lu ; Lei Xie ; Cheung-Chi Leung ; Bin Ma ; Haizhou Li

Abstract: We present an efficient approach for broadcast news story segmentation using a manifold learning algorithm on latent topic distributions. The latent topic distribution estimated by Latent Dirichlet Allocation (LDA) is used to represent each text block. We employ Laplacian Eigenmaps (LE) to project the latent topic distributions into low-dimensional semantic representations while preserving the intrinsic local geometric structure. We evaluate two approaches employing LDA and probabilistic latent semantic analysis (PLSA) distributions respectively. The effects of different amounts of training data and different numbers of latent topics on the two approaches are studied. Experimental re- sults show that our proposed LDA-based approach can outperform the corresponding PLSA-based approach. The proposed approach provides the best performance with the highest F1-measure of 0.7860.

3 0.58102649 295 acl-2013-Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages

Author: Dan Garrette ; Jason Mielens ; Jason Baldridge

Abstract: Developing natural language processing tools for low-resource languages often requires creating resources from scratch. While a variety of semi-supervised methods exist for training from incomplete data, there are open questions regarding what types of training data should be used and how much is necessary. We discuss a series of experiments designed to shed light on such questions in the context of part-of-speech tagging. We obtain timed annotations from linguists for the low-resource languages Kinyarwanda and Malagasy (as well as English) and eval- uate how the amounts of various kinds of data affect performance of a trained POS-tagger. Our results show that annotation of word types is the most important, provided a sufficiently capable semi-supervised learning infrastructure is in place to project type information onto a raw corpus. We also show that finitestate morphological analyzers are effective sources of type information when few labeled examples are available.

4 0.57981414 4 acl-2013-A Context Free TAG Variant

Author: Ben Swanson ; Elif Yamangil ; Eugene Charniak ; Stuart Shieber

5 0.54307324 275 acl-2013-Parsing with Compositional Vector Grammars

Author: Richard Socher ; John Bauer ; Christopher D. Manning ; Ng Andrew Y.

Abstract: Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.

6 0.45175698 57 acl-2013-Arguments and Modifiers from the Learner's Perspective

7 0.44969621 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition

8 0.41426119 318 acl-2013-Sentiment Relevance

9 0.4116337 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction

10 0.41146499 169 acl-2013-Generating Synthetic Comparable Questions for News Articles

11 0.40638971 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages

12 0.40565625 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification

13 0.40329173 225 acl-2013-Learning to Order Natural Language Texts

14 0.40297535 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions

15 0.40289927 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations

16 0.4028599 272 acl-2013-Paraphrase-Driven Learning for Open Question Answering

17 0.40272868 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model

18 0.40119585 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search

19 0.40100914 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers

20 0.40099046 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing