emnlp emnlp2012 emnlp2012-65 knowledge-graph by maker-knowledge-mining

65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

Source: pdf

Author: Jason Naradowsky ; Sebastian Riedel ; David Smith

Abstract: Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 {narad, , riede l Abstract Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. [sent-5, score-0.771]

2 For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. [sent-6, score-0.629]

3 We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. [sent-7, score-0.24]

4 Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. [sent-8, score-1.3]

5 At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. [sent-9, score-0.583]

6 Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling. [sent-10, score-0.783]

7 edu pling of the end task from its intermediate representation is sometimes known as the two-stage approach (Chang et al. [sent-13, score-0.517]

8 Most notably this decomposition prohibits the learning method from utilizing the labels from the end task when predicting the intermediate representation, a structure which must have some correlation to the end task to provide any benefit. [sent-15, score-0.646]

9 Relying on intermediate representations that are specifically syntactic in nature introduces its own unique set of problems. [sent-16, score-0.471]

10 Large amounts of syntactically annotated data is difficult to obtain, costly to produce, and often tied to a particular domain that may vary greatly from that of the desired end task. [sent-17, score-0.422]

11 For instance, performing named entity recognition (NER) jointly with constituent parsing has been shown to improve performance on both tasks, but the only aspect of the syntax which is leveraged by the NER component is the location of noun phrases (Finkel and Manning, 2009). [sent-19, score-0.343]

12 By instead discovering a latent representation jointly with the end task we address all of these concerns, alleviating the need for any syntactic annotations, while simultaneously attempting to learn a latent syntax relevant to both the particular domain and structure of the end task. [sent-20, score-1.292]

13 Many NLP tasks are inherently tied to syntax, and state-of-the-art solutions to these tasks often rely on syntactic annotations as either a source for useful features (Zhang et al. [sent-21, score-0.637]

14 , 2006, path features in relation extraction) or as a scaffolding upon which a more narrow, specialized classification can occur (as often done in semantic role labeling). [sent-22, score-0.331]

15 This decou- We phrase the joint model as factor graph and marginalize over the hidden structure of the intermediate representation at both training and test time, to optimize performance on the end task. [sent-23, score-0.897]

16 Inference is done via loopy belief propagation, making this framework trivially extensible to most graph structures. [sent-24, score-0.51]

17 Computation over latent syntactic rep- 810 PLraoncgeeuadgineg Lse oafr tnhineg 2,0 p1a2g Jeosin 81t C0–o8n2f0e,re Jnecjue Iosnla Enmd,p Kiroicraela, M 1e2t–h1o4ds Ju ilny N 20a1tu2r. [sent-25, score-0.359]

18 We apply this strategy to two common NLP tasks, coupling a model for the end task prediction with latent and general syntactic representations via specialized logical factors which learn associations between latent and observed structure. [sent-28, score-1.2]

19 The following sections serves as a preliminary, introducing an inventory of factors and variables for constructing factor graph representations of syntactically-coupled NLP tasks. [sent-30, score-0.714]

20 Section 3 explores the benefits of this method on relation extraction (RE), where we compare the use dependency and constituency structure as latent representations. [sent-31, score-0.733]

21 We then turn to a more established semantic role labeling (SRL) task (§4) where we evaluate across a wide range RofL languages. [sent-32, score-0.157]

22 2 Latent Pseudo-Syntactic Structure The models presented in this paper are phrased in terms of variables in an undirected graphical model, Markov random field. [sent-33, score-0.243]

23 More specifically, we implement the model as a factor graph, a bipartite graph composed of factors and variables in which we can efficiently compute the marginal beliefs of any variable set with the sum-product algorithm for cyclic graphs, loopy belief propagation,. [sent-34, score-1.094]

24 We now introduce the basic variable and factor components used throughout the paper. [sent-35, score-0.153]

25 1 Latent Dependency Structure Dependency grammar is a lexically-oriented syntactic formalism in which syntactic relationships are expressed as dependencies between individual words. [sent-37, score-0.436]

26 Each non-root word specifies another as its head, provided that the resulting structure forms 811 a valid directed graph, ie. [sent-38, score-0.205]

27 Due to the flexibility of this representation it is often used to describe free-word-order languages, and increasingly preferred in NLP for more language-in-use scenarios. [sent-40, score-0.229]

28 A dependency graph can be modeled with the following nodes, as first proposed by Smith and Eisner (2008): = • Let {Link(i, j) : 0 ≤ i ≤ j ≤ n, n j} bLee tO {(Lni2n)k b(oi,ojl)ean : 0va ≤riab ile ≤s corresponding jto} the possible links in a dependency parse. [sent-41, score-0.4]

29 Li,j = true implies that there is a dependency from parent ito child j. [sent-42, score-0.163]

30 = • Let {LINK(i,j) : 0 ≤ i ≤ j ≤ n, n j} Lbee tO {(LnI2N) unary factors, ≤ea ich ≤ paired wn,itnh a corresponding Link(i, j) variable and expressing the independent belief that Link(i, j) = true. [sent-43, score-0.392]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('constituency', 0.27), ('marginalization', 0.203), ('latent', 0.197), ('intermediate', 0.178), ('factors', 0.163), ('syntactic', 0.162), ('amherst', 0.158), ('syntactically', 0.146), ('belief', 0.141), ('loopy', 0.137), ('marginalize', 0.137), ('annotations', 0.136), ('representations', 0.131), ('specialized', 0.129), ('nlp', 0.128), ('syntax', 0.126), ('graph', 0.123), ('hidden', 0.118), ('leveraged', 0.117), ('outperforming', 0.117), ('graphs', 0.114), ('link', 0.113), ('end', 0.113), ('formalism', 0.112), ('ner', 0.108), ('tied', 0.104), ('variables', 0.102), ('inherently', 0.101), ('dependency', 0.095), ('valid', 0.091), ('factor', 0.087), ('resentations', 0.087), ('alleviating', 0.087), ('uninformed', 0.087), ('dasmith', 0.087), ('ich', 0.087), ('ile', 0.087), ('phrased', 0.087), ('pling', 0.087), ('prohibits', 0.087), ('riede', 0.087), ('scaffolding', 0.087), ('representation', 0.081), ('implement', 0.08), ('couples', 0.079), ('comparably', 0.073), ('naradowsky', 0.073), ('span', 0.07), ('ean', 0.068), ('ito', 0.068), ('observe', 0.068), ('tasks', 0.067), ('variable', 0.066), ('beliefs', 0.065), ('bipartite', 0.065), ('realize', 0.065), ('cyclic', 0.065), ('cpoumtaptiuotnaatilo', 0.065), ('plraoncgeeuadgineg', 0.065), ('inventory', 0.061), ('attempting', 0.061), ('coupling', 0.061), ('structure', 0.06), ('relation', 0.059), ('constraining', 0.059), ('increasingly', 0.059), ('costly', 0.059), ('srl', 0.059), ('extensible', 0.059), ('sometimes', 0.058), ('smith', 0.057), ('labeling', 0.057), ('massachusetts', 0.056), ('cycles', 0.056), ('coupled', 0.056), ('combinatorial', 0.056), ('role', 0.056), ('specifies', 0.054), ('undirected', 0.054), ('narrow', 0.054), ('concerns', 0.052), ('unclear', 0.052), ('explores', 0.052), ('unary', 0.052), ('performing', 0.051), ('trivially', 0.05), ('notably', 0.05), ('jointly', 0.049), ('avoids', 0.049), ('alternatively', 0.047), ('serves', 0.047), ('associations', 0.047), ('preferred', 0.046), ('discovering', 0.046), ('expressing', 0.046), ('predicting', 0.045), ('established', 0.044), ('tractable', 0.044), ('flexibility', 0.043), ('finkel', 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

Author: Jason Naradowsky ; Sebastian Riedel ; David Smith

2 0.15680528 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

Author: David Hall ; Dan Klein

Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.

3 0.13645248 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

Author: Sameer Singh ; Michael Wick ; Andrew McCallum

Abstract: Conditional random fields and other graphical models have achieved state of the art results in a variety of tasks such as coreference, relation extraction, data integration, and parsing. Increasingly, practitioners are using models with more complex structure—higher treewidth, larger fan-out, more features, and more data—rendering even approximate inference methods such as MCMC inefficient. In this paper we propose an alternative MCMC sam- pling scheme in which transition probabilities are approximated by sampling from the set of relevant factors. We demonstrate that our method converges more quickly than a traditional MCMC sampler for both marginal and MAP inference. In an author coreference task with over 5 million mentions, we achieve a 13 times speedup over regular MCMC inference.

4 0.12524177 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

Author: Fei Huang ; Alexander Yates

Abstract: Representation learning is a promising technique for discovering features that allow supervised classifiers to generalize from a source domain dataset to arbitrary new domains. We present a novel, formal statement of the representation learning task. We argue that because the task is computationally intractable in general, it is important for a representation learner to be able to incorporate expert knowledge during its search for helpful features. Leveraging the Posterior Regularization framework, we develop an architecture for incorporating biases into representation learning. We investigate three types of biases, and experiments on two domain adaptation tasks show that our biased learners identify significantly better sets of features than unbiased learners, resulting in a relative reduction in error of more than 16% for both tasks, with respect to existing state-of-the-art representation learning techniques.

5 0.10226524 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

Abstract: We present a method for training a semantic parser using only a knowledge base and an unlabeled text corpus, without any individually annotated sentences. Our key observation is that multiple forms ofweak supervision can be combined to train an accurate semantic parser: semantic supervision from a knowledge base, and syntactic supervision from dependencyparsed sentences. We apply our approach to train a semantic parser that uses 77 relations from Freebase in its knowledge representation. This semantic parser extracts instances of binary relations with state-of-theart accuracy, while simultaneously recovering much richer semantic structures, such as conjunctions of multiple relations with partially shared arguments. We demonstrate recovery of this richer structure by extracting logical forms from natural language queries against Freebase. On this task, the trained semantic parser achieves 80% precision and 56% recall, despite never having seen an annotated logical form.

6 0.095232755 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

7 0.081239425 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

8 0.076982141 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

9 0.074277498 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

10 0.071590647 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

11 0.071185477 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

12 0.06930314 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction

13 0.067133747 93 emnlp-2012-Multi-instance Multi-label Learning for Relation Extraction

14 0.06688977 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

15 0.066668853 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

16 0.06378331 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

17 0.06260033 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints

18 0.060029786 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

19 0.058090948 98 emnlp-2012-No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

20 0.057518139 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.227), (1, -0.002), (2, 0.115), (3, -0.044), (4, 0.009), (5, 0.05), (6, 0.004), (7, 0.088), (8, 0.026), (9, 0.094), (10, -0.053), (11, 0.201), (12, 0.011), (13, -0.046), (14, -0.076), (15, 0.152), (16, -0.058), (17, -0.028), (18, -0.097), (19, -0.03), (20, -0.052), (21, 0.084), (22, -0.072), (23, -0.091), (24, 0.12), (25, 0.056), (26, 0.119), (27, -0.291), (28, 0.133), (29, 0.089), (30, 0.096), (31, -0.058), (32, 0.171), (33, 0.005), (34, 0.092), (35, -0.008), (36, 0.1), (37, -0.173), (38, -0.013), (39, -0.031), (40, -0.011), (41, 0.055), (42, -0.14), (43, -0.072), (44, -0.062), (45, -0.032), (46, 0.014), (47, 0.043), (48, -0.156), (49, 0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98261589 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

Author: Jason Naradowsky ; Sebastian Riedel ; David Smith

2 0.70496273 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

Author: David Hall ; Dan Klein

3 0.57279617 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

Author: Paramveer Dhillon ; Jordan Rodu ; Michael Collins ; Dean Foster ; Lyle Ungar

Abstract: Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to re-rank the outputs of a near state-of-theart parser. Our approach gives us a moderate reduction in error of up to 4.6% over the baseline re-ranker. .

4 0.54937595 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

Author: Sameer Singh ; Michael Wick ; Andrew McCallum

5 0.5285126 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

Author: Taylor Berg-Kirkpatrick ; David Burkett ; Dan Klein

Abstract: We investigate two aspects of the empirical behavior of paired significance tests for NLP systems. First, when one system appears to outperform another, how does significance level relate in practice to the magnitude of the gain, to the size of the test set, to the similarity of the systems, and so on? Is it true that for each task there is a gain which roughly implies significance? We explore these issues across a range of NLP tasks using both large collections of past systems’ outputs and variants of single systems. Next, once significance levels are computed, how well does the standard i.i.d. notion of significance hold up in practical settings where future distributions are neither independent nor identically distributed, such as across domains? We explore this question using a range of test set variations for constituency parsing.

6 0.40265477 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

7 0.36834097 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

8 0.35543993 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction

9 0.30998346 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

10 0.29878116 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling

11 0.29243681 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories

12 0.29115385 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

13 0.28972271 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

14 0.28431419 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

15 0.25790474 88 emnlp-2012-Minimal Dependency Length in Realization Ranking

16 0.2509408 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

17 0.2503303 7 emnlp-2012-A Novel Discriminative Framework for Sentence-Level Discourse Analysis

18 0.24966338 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers

19 0.24358119 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

20 0.23900326 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.034), (25, 0.635), (34, 0.059), (60, 0.038), (63, 0.018), (65, 0.027), (74, 0.032), (76, 0.047), (80, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9760775 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

Author: Jason Naradowsky ; Sebastian Riedel ; David Smith

2 0.70052326 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

Author: Sameer Singh ; Michael Wick ; Andrew McCallum

3 0.28964007 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers

Author: Sebastian Riedel ; David Smith ; Andrew McCallum

Abstract: Graph-based dependency parsers suffer from the sheer number of higher order edges they need to (a) score and (b) consider during optimization. Here we show that when working with LP relaxations, large fractions of these edges can be pruned before they are fully scored—without any loss of optimality guarantees and, hence, accuracy. This is achieved by iteratively parsing with a subset of higherorder edges, adding higher-order edges that may improve the score of the current solution, and adding higher-order edges that are implied by the current best first order edges. This amounts to delayed column and row generation in the LP relaxation and is guaranteed to provide the optimal LP solution. For second order grandparent models, our method considers, or scores, no more than 6–13% of the second order edges of the full model. This yields up to an eightfold parsing speedup, while pro- viding the same empirical accuracy and certificates of optimality as working with the full LP relaxation. We also provide a tighter LP formulation for grandparent models that leads to a smaller integrality gap and higher speed.

4 0.27205724 24 emnlp-2012-Biased Representation Learning for Domain Adaptation

Author: Fei Huang ; Alexander Yates

5 0.27099168 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

Author: Jayant Krishnamurthy ; Tom Mitchell

6 0.26893547 132 emnlp-2012-Universal Grapheme-to-Phoneme Prediction Over Latin Alphabets

7 0.26556283 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints

8 0.2617102 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections

9 0.24815501 133 emnlp-2012-Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision

10 0.24235861 8 emnlp-2012-A Phrase-Discovering Topic Model Using Hierarchical Pitman-Yor Processes

11 0.24220076 73 emnlp-2012-Joint Learning for Coreference Resolution with Markov Logic

12 0.23725614 10 emnlp-2012-A Statistical Relational Learning Approach to Identifying Evidence Based Medicine Categories

13 0.23428908 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing