emnlp emnlp2012 emnlp2012-126 knowledge-graph by maker-knowledge-mining

126 emnlp-2012-Training Factored PCFGs with Expectation Propagation


Source: pdf

Author: David Hall ; Dan Klein

Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. [sent-3, score-0.377]

2 We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. [sent-4, score-0.546]

3 We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. [sent-6, score-0.497]

4 Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88. [sent-9, score-0.392]

5 Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89. [sent-11, score-0.175]

6 1 Introduction Many high-performance PCFG parsers take an initially simple base grammar over treebank labels like NP and enrich it with deeper syntactic features to improve accuracy. [sent-13, score-0.379]

7 This broad characterization includes lexicalized parsers (Collins, 1997), unlexicalized parsers (Klein and Manning, 2003), and latent variable parsers (Matsuzaki et al. [sent-14, score-0.517]

8 When multi-part annotations are used in the same grammar, systems have generally multiplied these annotations together, in the sense that an NP that 1146 was definite, possessive, and VP-dominated would have a single unstructured PCFG symbol that encoded all three facts. [sent-17, score-0.724]

9 In addition, modulo backoff or smoothing, that unstructured symbol would often have rewrite parameters entirely distinct from, say, the indefinite but otherwise similar variant of the symbol (Klein and Manning, 2003). [sent-18, score-0.277]

10 Should a definiteness annotation be included, doubling the number of NPs in the grammar and perhaps overly fragmenting statistics? [sent-20, score-0.369]

11 Klein and Manning (2003) discuss exactly such trade-offs and omit annotations that were helpful on their own because they were not worth the combinatorial or statistical cost when combined with other annotations. [sent-22, score-0.342]

12 In this paper, we argue for grammars with factored annotations, that is, grammars with annotations that have structured component parts that are partially decoupled. [sent-23, score-0.936]

13 Our annotated grammars can include both latent and explicit annotations, as illustrated in Figure 1(d), and we demonstrate that these factored grammars outperform parsers with unstructured annotations. [sent-24, score-0.946]

14 After discussing the factored representation, we describe a method for parsing with factored annotations, using an approximate inference technique called expectation propagation (Minka, 2001). [sent-25, score-0.895]

15 Our algorithm has runtime linear in the number of an- notation factors in the grammar, improving on the na¨ ıve algorithm, which has runtime exponential in the number of annotations. [sent-26, score-0.268]

16 Our method, the Expectation Propagation for Inferring Constituency (EPIC) parser, jointly trains a model over factored annotations, where each factor naturally leverages information from other annotation factors and improves on their mistakes. [sent-27, score-0.602]

17 (2005); and (d) the factored, mixed annotations we argue for in our paper. [sent-31, score-0.27]

18 First, we efficiently train a latent-variable grammar with 8 disjoint one-bit latent annotation factors, with scores as high as 89. [sent-33, score-0.532]

19 Second, we combine our latent variable factors with lexicalized and unlexicalized annotations, resulting in our best F1 score of 89. [sent-38, score-0.442]

20 2 Intuitions Modern theories of grammar such as HPSG (Pollard and Sag, 1994) and Minimalism (Chomsky, 1992) do not ascribe unstructured conjunctions of annotations to phrasal categories. [sent-40, score-0.575]

21 For instance, an NP might have annotations to the effect that it is singular, masculine, and nominative, with perhaps further information about its animacy or other aspects of the head noun. [sent-42, score-0.27]

22 Thus, it is appealing for a grammar to be able to model these (somewhat) orthogonal notions, but most models have no mechanism to encourage this. [sent-43, score-0.214]

23 As a notable exception, Dreyer and Eisner (2006) tried to capture this kind ofinsight by allowing factored annotations to pass unchanged from parent label to child label, though they were not 1147 able to demonstrate substantial gains in accuracy. [sent-44, score-0.514]

24 Moreover, there has been to our knowledge no attempt to employ both latent and non-latent annotations at the same time. [sent-45, score-0.392]

25 There is good reason for this: lexicalized or highly annotated grammars like those of Collins (1997) or Klein and Manning (2003) have a very large number of states and an even larger number of rules. [sent-46, score-0.37]

26 Further annotating these rules with latent annotations would produce an infeasibly large grammar. [sent-47, score-0.392]

27 Nevertheless, it is a shame to sacrifice expert annotation just to get latent annotations. [sent-48, score-0.277]

28 We are interested in the specific case where each x is actually factored into M disjoint parts: A[x1, x2, . [sent-56, score-0.285]

29 ) We call each component of x an annotation factor or an annotation component. [sent-61, score-0.452]

30 1 Annotation Classes In this paper, we consider three kinds of annotation models, representing three of the major traditions in constituency parsing. [sent-63, score-0.201]

31 This parser starts from a grammar with labels annotated with sibling and parent information, and then adds specific annotations, such as whether an NP is possessive or whether a symbol rewrites as a unary. [sent-71, score-0.486]

32 Here, each symbol is given a latent annotation, referred to as a substate. [sent-75, score-0.215]

33 Often, these latent integers are considered as bit strings, with each bit indicating one latent annotation. [sent-78, score-0.332]

34 , 2006; Petrov and Klein, 2007), as well as “multiscale” grammars (Petrov and Klein, 2008b). [sent-80, score-0.18]

35 With two states (or one bit of annotation), our version of this parser gets 81. [sent-81, score-0.186]

36 Formally, we begin with a parse tree T over base symbols for some sentence w, and we decorate the tree with annotations X, giving a parse tree T[X]. [sent-91, score-0.526]

37 These components are decoupled in the sense that, conditioned on the coarse tree T, each column of the annotation is independent of every other column. [sent-96, score-0.292]

38 The conditional probability P(T[X] |w, θ) of an annTohteate codn ntrdeieti given wroboardbsi i ts:y P(T[X]|w, θ) =PT0Q,Xm0Qfmm(fTm[(XTm0[]X;wm0,]θ;wm),θm) (1) =PZ(w1,θ)QYmfm(T[Xm];w,θm) where the factors fm for each model take the form: fm(T[Xm]; w, θm) = exp ? [sent-99, score-0.322]

39 The features ϕ need to decompose into features for each factor fm; we do not allow features that take into account the annotation from two different components. [sent-103, score-0.235]

40 We further add a pruning filter that assigns zero weight to any tree with a constituent that a baseline unannotated grammar finds sufficiently unlikely, and a weight ofone to any other tree. [sent-104, score-0.433]

41 (a) The full joint distribution consists of a product of three grammars with different annotations, here lexicalized, latent, and unlexicalized. [sent-109, score-0.315]

42 (b) The core approximation is an anchored PCFG with one factor corresponding to each annotation component, described in Section 5. [sent-112, score-0.723]

43 (c) Fitting the approximation with expectation propagation, as described in Section 5. [sent-114, score-0.243]

44 During each step, an “augmented” distribution qm is created by taking one annotation factor from the full grammar and the rest from the approximate grammar. [sent-117, score-0.797]

45 For instance, in upper left hand corner the full fLEX is substituted for This new augmented distribution is projected back to the core approximation. [sent-118, score-0.212]

46 4 The Complexity of Annotated Grammars Note that the first term of Equation 3—which is conditioned on the coarse tree T—factors into M pieces, one for each of the annotation components. [sent-129, score-0.252]

47 Let G0 be the number of binary rules in the unannotated “base” grammar. [sent-133, score-0.174]

48 Each annotation component can have up to A primi- tive annotations per rule. [sent-135, score-0.487]

49 For instance, a latent variable grammar will have A = 8b where b is the number of bits of annotation. [sent-136, score-0.439]

50 If we compile all annotation components into unstructured annotations, we can end up with a total grammar size of O(AMG0), and so in general parsing time scales exponentially with the number of annotation components. [sent-137, score-0.753]

51 Thus, if we use latent annotations and the hierarchical splitting approach of Petrov et al. [sent-138, score-0.423]

52 (2006), then the grammar has size O(8SG0), where S is the number of times the grammar was split in two. [sent-139, score-0.428]

53 Therefore, the size of annotated grammars can reach intractable levels very quickly, particularly in the case of latent annotations, where all combinations of annotations are possible. [sent-140, score-0.635]

54 Petrov (2010) considered an approach to slowing this growth down by using a set of M independently trained parsers Pm, and parsed using the product of the scores from each parser as the score for the tree. [sent-141, score-0.215]

55 In what follows, we propose another solution that exploits the factored structure of our grammar with expectation propagation. [sent-144, score-0.601]

56 Crucially, we are able to jointly train and parse with all annotation factors, minimizing redundancy across the models. [sent-145, score-0.229]

57 While not exact, we will see that expectation propagation is indeed effective. [sent-146, score-0.253]

58 5 Factored Inference The key insight behind the approximate inference methods we consider here is that the full model is a product of complex factors that interact in complicated ways, and we will approximate it with a product of corresponding simple factors that interact in simple ways. [sent-147, score-0.684]

59 Since each annotation factor is a reasonable model in both power and complexity on its own, we can consider them one at a time, replacing all others with their approximations, as shown in Figure 2(c). [sent-148, score-0.267]

60 The way we will build these approximations is 1150 with expectation propagation (Minka, 2001). [sent-149, score-0.363]

61 Expectation propagation (EP) is a general method for approximate inference that generalizes belief propagation. [sent-150, score-0.207]

62 To begin, we note that each of these components captures different phenomena: an unlexicalized grammar is good at capturing structural relationships in a parse tree (e. [sent-157, score-0.405]

63 subject noun phrases have different distributions than object noun phrases), while a lexicalized grammar captures preferred attachments for different verbs. [sent-159, score-0.313]

64 At the same time, each ofthese component grammars can be thought of as a refinement of the raw unannotated treebank grammar. [sent-160, score-0.478]

65 By itself, each of these grammars induces a different posterior distribution over unannotated trees for each sen- tence. [sent-161, score-0.544]

66 If we can approximate each model’s contribution by using only unannotated symbols, we can define an algorithm that avoids the exponential overhead of parsing with the full grammar, and instead works with each factor in turn. [sent-162, score-0.453]

67 To do so, we define a sentence specific core aQpproximation over unannotated trees q(T|w) = Qm of˜mxi(mTa,t w). [sent-163, score-0.275]

68 We will approximate each model fm by its corresponding f˜m. [sent-166, score-0.296]

69 Thus, there is one colorcoordinated approximate factor for each component of the model in Figure 2(a). [sent-167, score-0.239]

70 Here, iAj is a symbol representing building the base symbol A over the span [i, j] . [sent-170, score-0.217]

71 Billott and Lang (1989) introduced anchored CFGs as “shared forests,” and Matsuzaki et al. [sent-171, score-0.335]

72 (2005) have previously used these grammars for finding an approximate one-best tree in a latent vari- able parser. [sent-172, score-0.444]

73 Note that, even though an anchored grammar is unannotated, because it is sentence specific it can represent many complex properties of the full grammar’s posterior distribution for a given sentence. [sent-173, score-0.691]

74 Before continuing, note that a pointwise product of anchored grammars is still an anchored grammar. [sent-175, score-0.935]

75 The complexity of parsing with a product of these grammars is therefore no more expensive than parsing with just one. [sent-176, score-0.411]

76 Indeed, anchoring adds no inferential cost at all over parsing with an unannotated grammar: the anchored indices i,j,k have to be computed just to parse the sentence at all. [sent-177, score-0.611]

77 2 Assumed Density Filtering We now describe a simplified version of EP: parsing with assumed density filtering (Boyen and Koller, 1998). [sent-180, score-0.224]

78 Then, we factor in one of the annotated grammars and parse with this new augmented grammar. [sent-185, score-0.445]

79 This gives us a new posterior distribution for this sentence over trees annotated with just that annotation component. [sent-186, score-0.408]

80 Then, we can marginalize out the annotations, giving us a new q that approximates the annotated grammar as closely as possible without using any annotations. [sent-187, score-0.277]

81 In this way, information from all grammars is incorporated into a final posterior distribution over trees 1151 using only unannotated symbols. [sent-189, score-0.544]

82 by Step 1 of the inner looQp forms an approximate posterior distribution using fm, which is the parsing model associated with component m, and q, which is the anchored core approximation to the posterior induced by the first m − 1 models. [sent-200, score-0.938]

83 Then, the marginals are computed, mand − th 1e m new posterior ,d tihsetribution is projected to an anchored grammar, creating More intuitively, we create an anchored PCFG that makes the approximation close as possible” to the augmented grammar. [sent-201, score-0.971]

84 ) Thus, each term fm is approximated in the context of the terms that come before it. [sent-204, score-0.199]

85 would have complexity O Critically, for a latent variable parser? [sent-222, score-0.191]

86 otation bits, the exact algorithm takes time exponential in M, while this approximate algorithm takes time linear in M. [sent-224, score-0.208]

87 At each step, we have in q an approximation to what the posterior distribution looks like with the first m − 1 models. [sent-226, score-0.242]

88 Intuitively, EP cycles among the models, updating the approximation for that model in turn so that it closely resembles the predictions made by fm in the context of all other approximations, as in Figure 2(c). [sent-234, score-0.362]

89 Thus, each approximate term is created using information from all other meaning that the different annotation factors can still “talk” to each other. [sent-235, score-0.375]

90 The product of these approximations q will therefore come to act as an approximation to the true posterior: it takes into account joint infor- f˜m f˜m0, mation about all annotation components, all within one tractable anchored grammar. [sent-236, score-0.818]

91 With that intuition in mind, EP is defined as follows: • • Initialize contributions f˜m to the approximate posterior q. [sent-237, score-0.189]

92 Create the augmentQed distribution by including the actual factor for component m fm(T[Xm])q\m(T) qm(T[Xm]) ∝ and compute ]i)ns ∝ide f and outside scores. [sent-242, score-0.192]

93 1152 Step 2 creates the augmented distribution qm, which includes fm along with the approximate factors for all models except the current model. [sent-250, score-0.546]

94 Step 3 creates f˜m a new anchored that has the same marginal distribution as the true model fm in the context of the other approximations, just as we did in ADF. [sent-251, score-0.584]

95 At each step, one piece of the core approximation is replaced with the corresponding component from the full model. [sent-255, score-0.215]

96 This augmented model is then reapproximated by a new core approximation q after updating the corresponding This process repeats until convergence. [sent-256, score-0.261]

97 q and each of the are anchored grammars that assign weights to unannotated rules. [sent-260, score-0.689]

98 The product of anchored grammars with the annotated factor fm need not be carried out explicitly. [sent-261, score-0.942]

99 Instead, note that an anchored grammar is just a function q(A → f˜m tBh C, i, k, j) ∈ dR g+r tahmatm returns a score cfotiro every anBcho Cre,di, binary r uRle. [sent-262, score-0.549]

100 This function can be easily integrated into the CKY algorithm for a single annotated grammar by simply multiplying in the value of q whenever computing the score of the respective production over some span. [sent-263, score-0.316]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('anchored', 0.335), ('annotations', 0.27), ('factored', 0.244), ('grammar', 0.214), ('qm', 0.201), ('fm', 0.199), ('grammars', 0.18), ('unannotated', 0.174), ('xm', 0.174), ('agenda', 0.155), ('annotation', 0.155), ('expectation', 0.143), ('ep', 0.133), ('petrov', 0.126), ('klein', 0.123), ('factors', 0.123), ('latent', 0.122), ('propagation', 0.11), ('approximations', 0.11), ('np', 0.104), ('approximation', 0.1), ('lexicalized', 0.099), ('approximate', 0.097), ('symbol', 0.093), ('posterior', 0.092), ('unstructured', 0.091), ('adf', 0.086), ('product', 0.085), ('factor', 0.08), ('augmented', 0.077), ('minka', 0.067), ('matsuzaki', 0.066), ('bits', 0.066), ('parsers', 0.066), ('parser', 0.064), ('annotated', 0.063), ('binarization', 0.062), ('component', 0.062), ('unlexicalized', 0.061), ('inside', 0.059), ('boyen', 0.057), ('iaj', 0.057), ('parsing', 0.057), ('pcfg', 0.055), ('density', 0.055), ('manning', 0.054), ('president', 0.053), ('core', 0.053), ('possessive', 0.052), ('conditioned', 0.052), ('gets', 0.05), ('distribution', 0.05), ('runtime', 0.05), ('epic', 0.049), ('dkl', 0.049), ('derivative', 0.049), ('pcfgs', 0.048), ('trees', 0.048), ('constituency', 0.046), ('tree', 0.045), ('exponential', 0.045), ('parse', 0.045), ('schematically', 0.045), ('bit', 0.044), ('filtering', 0.043), ('exponentially', 0.041), ('disjoint', 0.041), ('lexicalization', 0.041), ('components', 0.04), ('omit', 0.04), ('assumed', 0.039), ('xd', 0.039), ('multiplying', 0.039), ('variable', 0.037), ('beliefs', 0.037), ('koller', 0.037), ('crucially', 0.037), ('interact', 0.037), ('collins', 0.036), ('na', 0.036), ('penn', 0.035), ('fitting', 0.035), ('initially', 0.035), ('nn', 0.034), ('productions', 0.033), ('takes', 0.033), ('treebank', 0.033), ('complexity', 0.032), ('resembles', 0.032), ('combinatorial', 0.032), ('projected', 0.032), ('base', 0.031), ('splitting', 0.031), ('updating', 0.031), ('simplified', 0.03), ('division', 0.03), ('raw', 0.029), ('nps', 0.029), ('redundancy', 0.029), ('states', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

Author: David Hall ; Dan Klein

Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.

2 0.15680528 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

Author: Jason Naradowsky ; Sebastian Riedel ; David Smith

Abstract: Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling.

3 0.12253343 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

Author: Jonathan K. Kummerfeld ; David Hall ; James R. Curran ; Dan Klein

Abstract: Constituency parser performance is primarily interpreted through a single metric, F-score on WSJ section 23, that conveys no linguistic information regarding the remaining errors. We classify errors within a set of linguistically meaningful types using tree transformations that repair groups of errors together. We use this analysis to answer a range of questions about parser behaviour, including what linguistic constructions are difficult for stateof-the-art parsers, what types of errors are being resolved by rerankers, and what types are introduced when parsing out-of-domain text.

4 0.11596425 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

5 0.10434116 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

Author: Kewei Tu ; Vasant Honavar

Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.

6 0.10245813 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

7 0.10047762 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

8 0.095851392 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation

9 0.095250569 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

10 0.094397753 81 emnlp-2012-Learning to Map into a Universal POS Tagset

11 0.093108088 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules

12 0.090285257 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification

13 0.08627478 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

14 0.082073763 68 emnlp-2012-Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation

15 0.081090763 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

16 0.078423895 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence

17 0.074741706 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

18 0.064525865 12 emnlp-2012-A Transition-Based System for Joint Part-of-Speech Tagging and Labeled Non-Projective Dependency Parsing

19 0.062583245 67 emnlp-2012-Inducing a Discriminative Parser to Optimize Machine Translation Reordering

20 0.060303375 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.237), (1, -0.119), (2, 0.118), (3, -0.042), (4, -0.022), (5, 0.019), (6, -0.051), (7, 0.017), (8, -0.002), (9, 0.15), (10, -0.014), (11, 0.186), (12, -0.023), (13, 0.056), (14, -0.129), (15, 0.136), (16, 0.015), (17, 0.039), (18, -0.231), (19, -0.06), (20, -0.049), (21, 0.145), (22, -0.042), (23, 0.091), (24, 0.05), (25, 0.075), (26, 0.062), (27, -0.022), (28, -0.001), (29, 0.07), (30, 0.118), (31, -0.124), (32, 0.144), (33, 0.031), (34, 0.14), (35, -0.072), (36, 0.147), (37, -0.205), (38, -0.009), (39, -0.095), (40, 0.014), (41, -0.08), (42, -0.055), (43, -0.026), (44, -0.052), (45, 0.009), (46, -0.104), (47, -0.047), (48, -0.031), (49, -0.04)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97781229 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

Author: David Hall ; Dan Klein

Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.

2 0.68418735 65 emnlp-2012-Improving NLP through Marginalization of Hidden Syntactic Structure

Author: Jason Naradowsky ; Sebastian Riedel ; David Smith

Abstract: Many NLP tasks make predictions that are inherently coupled to syntactic relations, but for many languages the resources required to provide such syntactic annotations are unavailable. For others it is unclear exactly how much of the syntactic annotations can be effectively leveraged with current models, and what structures in the syntactic trees are most relevant to the current task. We propose a novel method which avoids the need for any syntactically annotated data when predicting a related NLP task. Our method couples latent syntactic representations, constrained to form valid dependency graphs or constituency parses, with the prediction task via specialized factors in a Markov random field. At both training and test time we marginalize over this hidden structure, learning the optimal latent representations for the problem. Results show that this approach provides significant gains over a syntactically uninformed baseline, outperforming models that observe syntax on an English relation extraction task, and performing comparably to them in semantic role labeling.

3 0.52945399 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables

Author: Paramveer Dhillon ; Jordan Rodu ; Michael Collins ; Dean Foster ; Lyle Ungar

Abstract: Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn dependency tree structures. We propose a simple yet powerful latent variable generative model for dependency parsing, and a spectral learning method to efficiently estimate it. As a pilot experimental evaluation, we use the spectral tree probabilities estimated by our model to re-rank the outputs of a near state-of-theart parser. Our approach gives us a moderate reduction in error of up to 4.6% over the baseline re-ranker. .

4 0.52024984 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

5 0.50552291 130 emnlp-2012-Unambiguity Regularization for Unsupervised Learning of Probabilistic Grammars

Author: Kewei Tu ; Vasant Honavar

Abstract: We introduce a novel approach named unambiguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learn- ing, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.

6 0.44830438 48 emnlp-2012-Exploring Adaptor Grammars for Native Language Identification

7 0.39717442 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output

8 0.39154387 91 emnlp-2012-Monte Carlo MCMC: Efficient Inference by Approximate Sampling

9 0.33887061 1 emnlp-2012-A Bayesian Model for Learning SCFGs with Discontiguous Rules

10 0.33271188 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning

11 0.33108985 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence

12 0.32930484 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures

13 0.29676658 68 emnlp-2012-Iterative Annotation Transformation with Predict-Self Reestimation for Chinese Word Segmentation

14 0.29673254 81 emnlp-2012-Learning to Map into a Universal POS Tagset

15 0.29412025 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage

16 0.28423783 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon

17 0.2821506 3 emnlp-2012-A Coherence Model Based on Syntactic Patterns

18 0.28034201 18 emnlp-2012-An Empirical Investigation of Statistical Significance in NLP

19 0.26606482 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure

20 0.25918594 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.014), (16, 0.032), (25, 0.034), (34, 0.061), (60, 0.063), (63, 0.034), (64, 0.011), (65, 0.02), (74, 0.058), (76, 0.549), (80, 0.017), (86, 0.018), (95, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95223862 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation

Author: David Hall ; Dan Klein

Abstract: PCFGs can grow exponentially as additional annotations are added to an initially simple base grammar. We present an approach where multiple annotations coexist, but in a factored manner that avoids this combinatorial explosion. Our method works with linguisticallymotivated annotations, induced latent structure, lexicalization, or any mix of the three. We use a structured expectation propagation algorithm that makes use of the factored structure in two ways. First, by partitioning the factors, it speeds up parsing exponentially over the unfactored approach. Second, it minimizes the redundancy of the factors during training, improving accuracy over an independent approach. Using purely latent variable annotations, we can efficiently train and parse with up to 8 latent bits per symbol, achieving F1 scores up to 88.4 on the Penn Treebank while using two orders of magnitudes fewer parameters compared to the na¨ ıve approach. Combining latent, lexicalized, and unlexicalized anno- tations, our best parser gets 89.4 F1 on all sentences from section 23 of the Penn Treebank.

2 0.92016572 116 emnlp-2012-Semantic Compositionality through Recursive Matrix-Vector Spaces

Author: Richard Socher ; Brody Huval ; Christopher D. Manning ; Andrew Y. Ng

Abstract: Single-word vector space models have been very successful at learning lexical information. However, they cannot capture the compositional meaning of longer phrases, preventing them from a deeper understanding of language. We introduce a recursive neural network (RNN) model that learns compositional vector representations for phrases and sentences of arbitrary syntactic type and length. Our model assigns a vector and a matrix to every node in a parse tree: the vector captures the inherent meaning of the constituent, while the matrix captures how it changes the meaning of neighboring words or phrases. This matrix-vector RNN can learn the meaning of operators in propositional logic and natural language. The model obtains state of the art performance on three different experiments: predicting fine-grained sentiment distributions of adverb-adjective pairs; classifying sentiment labels of movie reviews and classifying semantic relationships such as cause-effect or topic-message between nouns using the syntactic path between them.

3 0.9077273 32 emnlp-2012-Detecting Subgroups in Online Discussions by Modeling Positive and Negative Relations among Participants

Author: Ahmed Hassan ; Amjad Abu-Jbara ; Dragomir Radev

Abstract: A mixture of positive (friendly) and negative (antagonistic) relations exist among users in most social media applications. However, many such applications do not allow users to explicitly express the polarity of their interactions. As a result most research has either ignored negative links or was limited to the few domains where such relations are explicitly expressed (e.g. Epinions trust/distrust). We study text exchanged between users in online communities. We find that the polarity of the links between users can be predicted with high accuracy given the text they exchange. This allows us to build a signed network representation of discussions; where every edge has a sign: positive to denote a friendly relation, or negative to denote an antagonistic relation. We also connect our analysis to social psychology theories of balance. We show that the automatically predicted networks are consistent with those theories. Inspired by that, we present a technique for identifying subgroups in discussions by partitioning singed networks representing them.

4 0.54886609 4 emnlp-2012-A Comparison of Vector-based Representations for Semantic Composition

Author: William Blacoe ; Mirella Lapata

Abstract: In this paper we address the problem of modeling compositional meaning for phrases and sentences using distributional methods. We experiment with several possible combinations of representation and composition, exhibiting varying degrees of sophistication. Some are shallow while others operate over syntactic structure, rely on parameter learning, or require access to very large corpora. We find that shallow approaches are as good as more computationally intensive alternatives with regards to two particular tests: (1) phrase similarity and (2) paraphrase detection. The sizes of the involved training corpora and the generated vectors are not as important as the fit between the meaning representation and compositional method.

5 0.49054518 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT

Author: Shujie Liu ; Chi-Ho Li ; Mu Li ; Ming Zhou

Abstract: The training of most syntactic SMT approaches involves two essential components, word alignment and monolingual parser. In the current state of the art these two components are mutually independent, thus causing problems like lack of rule generalization, and violation of syntactic correspondence in translation rules. In this paper, we propose two ways of re-training monolingual parser with the target of maximizing the consistency between parse trees and alignment matrices. One is targeted self-training with a simple evaluation function; the other is based on training data selection from forced alignment of bilingual data. We also propose an auxiliary method for boosting alignment quality, by symmetrizing alignment matrices with respect to parse trees. The best combination of these novel methods achieves 3 Bleu point gain in an IWSLT task and more than 1 Bleu point gain in NIST tasks. 1

6 0.48772421 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers

7 0.47143331 30 emnlp-2012-Constructing Task-Specific Taxonomies for Document Collection Browsing

8 0.46908474 134 emnlp-2012-User Demographics and Language in an Implicit Social Network

9 0.46064299 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews

10 0.45461836 120 emnlp-2012-Streaming Analysis of Discourse Participants

11 0.45087811 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

12 0.44971392 52 emnlp-2012-Fast Large-Scale Approximate Graph Construction for NLP

13 0.44093335 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields

14 0.4396255 107 emnlp-2012-Polarity Inducing Latent Semantic Analysis

15 0.437837 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts

16 0.43653455 139 emnlp-2012-Word Salad: Relating Food Prices and Descriptions

17 0.43206254 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics

18 0.41978517 133 emnlp-2012-Unsupervised PCFG Induction for Grounded Language Learning with Highly Ambiguous Supervision

19 0.4197135 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction

20 0.40670332 122 emnlp-2012-Syntactic Surprisal Affects Spoken Word Duration in Conversational Contexts