emnlp emnlp2012 emnlp2012-57 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
Reference: text
sentIndex sentText sentNum sentScore
1 On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. [sent-4, score-0.349]
2 In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. [sent-5, score-0.307]
3 To account for this, cube pruning for decoding is utilized (Chiang, 2007). [sent-7, score-0.615]
4 1 Introduction The trade-off between rich features and exact decoding in dependency parsing has been well documented (McDonald and Nivre, 2007; Nivre and McDonald, 2008). [sent-13, score-0.639]
5 In the transition-based parsing literature, the focus has been on increasing the search space of the 320 system at decoding time, as expanding the feature scope is often trivial and in most cases only leads to a constant-time increase in parser complexity. [sent-16, score-0.502]
6 A similar line of research investigated the use of integer linear programming (ILP) formulations of parsing (Riedel and Clarke, 2006; Martins et al. [sent-21, score-0.293]
7 Upgrading a parser to score new types of higher-order dependencies thus requires significant changes to the underlying decoding algorithm. [sent-26, score-0.277]
8 In this paper, we abandon exact search in graphbased parsing in favor of freedom in feature scope. [sent-28, score-0.365]
9 We propose a parsing algorithm that keeps the backbone Eisner chart-parsing algorithm for first-order parsing unchanged. [sent-29, score-0.428]
10 lc L2a0n1g2ua Agseso Pcrioactieosnsi fnogr a Cnodm Cpoumtaptiuotna tilo Lnianlg Nuaist uircasl potential parses in each chart cell by expanding the signature of each chart item to include all the nonlocal context required to compute features. [sent-32, score-0.636]
11 To control complexity we use cube pruning (Chiang, 2007) with the beam size k in each cell. [sent-34, score-0.705]
12 Thus, our method is an application of integrated decoding with a language model in MT (Chiang, 2007) to dependency parsing, which has previously been applied to constituent parsing (Huang, 2008). [sent-36, score-0.499]
13 Additionally, we look at higher-order dependency arclabel features, which is novel to graph-based parsing, though commonly exploited in transition-based parsing (Zhang and Nivre, 2011). [sent-41, score-0.376]
14 In addition, we explore the use of valency features counting how many modifiers a word can have on its left and right side. [sent-45, score-0.273]
15 The speed of our parser is 220 tokens per second, which is over 4 times faster than an exact third-order parser that at321 Figure 1: Example Sentence. [sent-52, score-0.291]
16 The directed syntactic rela- tionships, aka dependency arcs or dependencies for short, can often be labeled to indicate their syntactic role. [sent-59, score-0.413]
17 xn, dependency parsing is the search for the set of head-modifier dependency arcs y∗ such that y∗ = argmaxy∈Y(x) f(x, y), where f is a scoring function. [sent-64, score-0.814]
18 ls Waned use the notation (i j) ∈ y to indicate that there is a dependency fro− m→ h je)ad ∈ w yo rtod xi itoc mteod thifaiter th xj ew isith a label lin dependency tree y. [sent-69, score-0.413]
19 For example, in firstorder dependency parsing (McDonald et al. [sent-71, score-0.376]
20 It has two types of dynamic programming states: complete items and incomplete items. [sent-74, score-0.373]
21 Second-order sibling models (McDonald and Pereira, 2006) score adjacent arcs with a common head. [sent-78, score-0.375]
22 The resulting scoring afulgnoctriitohnm mis: r y∗= ayr∈gYm(xa)x(i −→lj,Xi →−l′k)∈yf(i −→l j,i →−l′ k) where (i →l j,i →l′ k) ∈ y indicates two adja- cent head-m −→od ji,fiier relationships nind dependency tdrjeaey, one from xi to xj with label l and another from xi to xk with label l′. [sent-81, score-0.405]
23 In order to maintain cubic parsing complexity, adjacent dependencies are scored only if the modifiers occur on the same side in the sentence relative to the head. [sent-83, score-0.41]
24 Second-order grandchild models (Carreras, 2007) score adjacent arcs in length-two head-modifier chains. [sent-84, score-0.391]
25 For example, if word xi modifies word xj with label l, but itself has a dependency to modifier xk with label l′, then we would add a scoring function f(j →l i k). [sent-85, score-0.616]
26 The states in the Eisner algorithm need to be augmented with the indices to the outermost modifiers in order to score the outermost grandchildren. [sent-89, score-0.569]
27 Finally, tehcoirdm-eosrd Oer( xm|odels (Koo and Collins, 2010) score arc triples such as three adjacent sibling modifiers, called tri-siblings, or structures looking at both horizontal contexts and vertical contexts, e. [sent-91, score-0.291]
28 , grand-siblings that score a word, its modifier →l′ and its adjacent grandchildren. [sent-93, score-0.289]
29 Esa wchit o Of |txhe|se higher-order parsing algorithms makes a clever factorization for the specific model in consideration to keep complexity as low as possible. [sent-98, score-0.358]
30 This is done on top of the two distinct chart items in the O( |x|3) Eisner chart-parsing algorithm (Figure 2). [sent-102, score-0.323]
31 Unfortunately, it can increase the run-time complexity of the algorithm substantially, but we will employ cube pruning to regain tractability. [sent-104, score-0.582]
32 Because our higher-order dependency parsing algorithm is based the Eisner algorithm, it is currently limited to produce projective trees only. [sent-105, score-0.448]
33 1 Arbitrary n-th-order dependency parsing We start with the simplest case of sibling models. [sent-107, score-0.515]
34 This fact suggests that, in order to score modifier bigrams, the complete item states should be augmented by the outermost modifier. [sent-109, score-0.594]
35 We can augment the chart items with such information, which (a)=+ (b)=+ Figure 3: Structures and rules for parsing models based on modifier bigrams, with a generalized (Eisner, 1996) algorithm. [sent-110, score-0.796]
36 It refines the complete items by storing the previously constructed dependency to the outermost modifiers. [sent-114, score-0.5]
37 Using this chart item augmentation it is now possible to score both first-order arcs as well as secondorder sibling arcs. [sent-116, score-0.601]
38 By counting the number of free variables in each parsing rule, we see that the parsing complexity is O(|x|5), which is higher tth tahen bpaorthsi nMgc cDomonpalledx iatynd i sP eOre(|ixra| (2006) and Carreras (2007). [sent-118, score-0.518]
39 The added complexity comes from the fact that it is now possible to score a third-order dependency consisting of the head, the modifier, the sibling, and the outermost grandchild jointly. [sent-119, score-0.626]
40 We can go further to augment the complete and incomplete states with more parsing history. [sent-120, score-0.394]
41 As a result, it becomes possible to score tri-siblings involving three adjacent modifiers and grand-siblings involving two outermost grandchildren both of which comprise the third-order Model 2 of Koo and Collins (2010) plus potentially any additional interactions of these roles. [sent-123, score-0.398]
42 In general, we can augment the complete and incomplete states with n variables representing the 323 (a)=+ (b)=+ Figure 4: Structures and rules for parsing models based on modifier trigrams in horizontal contexts, with a generalized (Eisner, 1996) algorithm. [sent-127, score-0.686]
43 Here the dashed arrows indicate the previous two modifiers to the head in each chart item. [sent-128, score-0.453]
44 (a)=+ (b)=+ Figure 5: Structures and rules for parsing models based on modifier trigrams in vertical contexts, with a generalized (Eisner, 1996) algorithm. [sent-129, score-0.514]
45 Here the dashed arrows indicate the modifier to the head and the modifier’s modifier, forming a modifier chain of length two. [sent-130, score-0.549]
46 possible parsing histories and loop over the cross product of the histories in the innermost loop of Eisner algorithm. [sent-131, score-0.42]
47 2 History-based dependency parsing The previous n modifiers, either horizontal or vertical, is a potential signature of parsing history. [sent-137, score-0.731]
48 We can put arbitrary signatures of parsing history into the chart items so that when we score a new item, we can draw the distinguishing power of features based on an arbitrarily deep history. [sent-138, score-0.72]
49 We can store the position of the last modifier into both chart states. [sent-140, score-0.431]
50 In complete states, this signature tells us the position of the outermost modifier, which is the valency of the head in the left or right half-constituent. [sent-141, score-0.515]
51 A natural grouping of states follows where all sub-parses sharing the same chart boundaries are grouped together. [sent-149, score-0.284]
52 This grouping will enable the cube pruning in Section 4 for approximate search. [sent-150, score-0.559]
53 There is another advantage of keeping the Eisner parsing logic unchanged: derivations one-to-one correspond to dependency parse trees. [sent-151, score-0.376]
54 Introducing higher order features in each chart item will cause sub-derivations to be re-ranked only. [sent-154, score-0.311]
55 With cube pruning, we only explore cells at (0, 0), (0, 1), (1, 0), (2, 0), and (1, 1), without the need to evaluate scoring functions for the remaining cells in the table. [sent-175, score-0.483]
56 In this example cube pruning does find the high- est scoring combination, i. [sent-177, score-0.553]
57 Thus, cube pruning may not find the highest scoring combination. [sent-181, score-0.553]
58 This approximation is at the heart of cube pruning. [sent-182, score-0.324]
59 In cube pruning, with recombination, the k-best items in each chart cell are locally optimal (in the pruned search space) over all sub-trees with an equivalent state for future combinations. [sent-188, score-0.747]
60 The cube pruning algorithm without recombination degenerates to a recursive k-best re-scoring algorithm since each of the k-best items would be unique by itself as a sub-tree. [sent-189, score-0.767]
61 It should be noted that by working on a chart (or a forest, equivalently) the algorithm is already applying recombination at a coarser level. [sent-190, score-0.392]
62 In constituent parser reranking (Huang, 2008), recombination is less likely to happen since the reranking features capture peculiarities of local tree structures. [sent-192, score-0.282]
63 For dependency parsing, we hypothesize that the higher-order features are more similar to the ngram language model features in MT as they tend to be common features among many sub-trees. [sent-193, score-0.3]
64 , 2006) with a hamming-loss margin as is common in the dependency parsing literature (Martins et al. [sent-203, score-0.376]
65 MIRA only requires a first-best decoding algorithm, which in our case is the approximate chart-based parsing algorithms defined in Sections 3 and 4. [sent-206, score-0.404]
66 Because our decoding algorithm is approximate, this may lead to invalid updates given to the optimizer (Huang and 325 cube pruning. [sent-207, score-0.447]
67 These atomic features are conjoined with the directions of arcs to create composite n-gram features. [sent-216, score-0.285]
68 The higher-order dependency features can be categorized into the following sub-groups, where we use h to indicate the head, m the modifier, s the modifier’s sibling and gc a grandchild word in a dependency part. [sent-217, score-0.726]
69 The positions of the modifiers are also conjoined with the higher-order dependency features in the previous list. [sent-219, score-0.395]
70 The features that are new compared to Koo and Collins (2010) are the label tuple features, the sibling and grandchild conjoined features, and the valency features. [sent-220, score-0.62]
71 We compare our method to a state-of-the-art graph-based parser (Koo and Collins, 2010) as well as a state-of-the-art transition-based parser that uses a beam (Zhang and Nivre, 2011) and the dynamic programming transition-based parser of Huang and Sagae (2010). [sent-232, score-0.469]
72 Additionally, we compare to our own implementation of exact first to third-order graph-based parsing and the transition-based system of Zhang and Nivre (201 1) with varying beam sizes. [sent-233, score-0.431]
73 First, approximate decoding with rich features and cube pruning gives state-of-the-art labeled and unlabeled parsing accuracies relative to previously reported results. [sent-235, score-1.024]
74 We also report results for a re-implementation of exact first to third-order graph-based parsing and a reimplementation of Zhang and Nivre (201 1) in order to compare parser speed. [sent-247, score-0.43]
75 Finally, compared to an implementation of an exact third-order parser which provides us with an apples-to-apples comparison in terms of features and runtime approximate decoding with cube pruning is both more accurate and while being 4-5 times as fast. [sent-251, score-0.886]
76 We should point out that our third-order reimplementation is a purely unlabeled parser as we do not have an implementation of an exact labeled thirdorder parser. [sent-253, score-0.298]
77 Currently our method is restricted to predicting strictly projective trees as it uses the Eisner chart parsing algorithm as its backbone. [sent-263, score-0.506]
78 Here we compare to our re-implementations of Zhang and Nivre (201 1), exact first to third-order parsing and Rush and Petrov (2012) for the data sets in which they reported results. [sent-268, score-0.308]
79 We again see that approximate decoding with rich features and cube pruning has higher accu- racy than transition-based parsing with a large beam. [sent-269, score-0.942]
80 Putting in the sibling and grandchild conjoined features and the valency features yields a further improvement over the approximation of Koo and Collins (2010). [sent-285, score-0.588]
81 We vary the beam size at each cell and switch the option for signature-based recombination to make search better or worse to see how much impact it has on the final accuracy. [sent-300, score-0.395]
82 Going from a beam of 2 to 5 increases accuracy notably, but going to a larger beam size has little effect but at a cost in terms of efficiency. [sent-302, score-0.292]
83 This suggests that most of the parser ambiguity is represented in the top-5 feature signatures at each chart cell. [sent-303, score-0.345]
84 Furthermore, recombination does help slightly, but more so at smaller beam sizes. [sent-304, score-0.295]
85 Indeed, larger feature scopes do lead to more search errors, but the absolute number of search errors is usually quite small there are only 19 search errors using second-order features and 32 search errors using third-order plus valency features out of 2416 English test sentences. [sent-307, score-0.543]
86 6 Related Work As mentioned in the introduction, there has been numerous studies on trying to reconcile the richfeatures versus exact decoding trade-off in dependency parsing. [sent-323, score-0.379]
87 In the transition-based parsing literature this has included the use of beam search to increase the search space (Duan et al. [sent-324, score-0.451]
88 Huang and Sagae (2010) took a more principled approach proposing a method combining shift-reduce parsing with dynamic programming. [sent-326, score-0.289]
89 In our method, the same parsing algorithm can be utilized (Eisner’s + cube pruning) just with slight different feature signatures. [sent-347, score-0.538]
90 Huang introduced the idea of “forest rescoring”, which uses cube pruning to enable the incorporation of non-local features into a constituency parsing model providing state-of-the art performance. [sent-349, score-0.752]
91 An important difference between our formulation and forest rescoring is that we only have one decoding pass and a single trained model, while forest rescoring, as formulated by Huang (2008), separates a generative base model from a following discriminative re-ranking model. [sent-351, score-0.281]
92 This also distinguishes it from previous work on dependency parse re-ranking (Hall, 2007) as we are not re-ranking/re-scoring the output of a base model but using a single decoding algorithm and learned model at training and testing. [sent-353, score-0.285]
93 This work is largely orthogonal to other attempts to speed up chart parsing algorithms. [sent-354, score-0.469]
94 This includes work on coarse-to-fine parsing (Charniak and Johnson, 2005; Petrov and Klein, 2007; Rush and Petrov, 2012), chart-cell closing and pruning (Roark and Hollingshead, 2008; Roark and Hollingshead, 329 2009), and dynamic beam-width prediction (Bodenstab et al. [sent-355, score-0.457]
95 Of particular note, Rush and Petrov (2012) report run-times far better than our cube pruning system. [sent-357, score-0.492]
96 In our study we use cube pruning only for decoding and rely on inference-based learning algorithms to train model parameters. [sent-361, score-0.615]
97 Gimpel and Smith (2009) extended cube pruning concepts to partitionfunction and marginal calculations, which would enable the training of probabilistic graphical models. [sent-362, score-0.492]
98 The method works by augmenting the dynamic programming signatures of the Eisner chart-parsing algorithm and then controlling complexity via cube pruning. [sent-369, score-0.664]
99 Empirical results show that the system gives state-of-the-art accuracies across numerous data sets while still maintaining practical parsing speeds as much as 4-5 times faster than exact third-order decoding. [sent-371, score-0.342]
100 Fast and robust multilingual dependency parsing with a generative latent variable model. [sent-657, score-0.376]
wordName wordTfidf (topN-words)
[('cube', 0.324), ('chart', 0.22), ('parsing', 0.214), ('modifier', 0.211), ('eisner', 0.186), ('outermost', 0.18), ('recombination', 0.172), ('pruning', 0.168), ('dependency', 0.162), ('arcs', 0.158), ('grandchild', 0.155), ('koo', 0.155), ('rush', 0.153), ('sibling', 0.139), ('nivre', 0.136), ('decoding', 0.123), ('beam', 0.123), ('valency', 0.121), ('signature', 0.108), ('martins', 0.107), ('modifiers', 0.106), ('histories', 0.103), ('items', 0.103), ('uas', 0.101), ('huang', 0.095), ('exact', 0.094), ('complexity', 0.09), ('petrov', 0.088), ('zhang', 0.083), ('sagae', 0.081), ('conjoined', 0.081), ('programming', 0.079), ('asymptotic', 0.078), ('dynamic', 0.075), ('collins', 0.075), ('projective', 0.072), ('kuhlmann', 0.069), ('approximate', 0.067), ('mcdonald', 0.067), ('states', 0.064), ('parser', 0.064), ('chiang', 0.063), ('gc', 0.062), ('guez', 0.062), ('incomplete', 0.061), ('scoring', 0.061), ('signatures', 0.061), ('gym', 0.06), ('reimplementation', 0.058), ('search', 0.057), ('complete', 0.055), ('factorization', 0.054), ('xk', 0.054), ('forest', 0.053), ('ayr', 0.052), ('rescoring', 0.052), ('parsers', 0.051), ('dependencies', 0.051), ('head', 0.051), ('xj', 0.05), ('cells', 0.049), ('generalized', 0.048), ('titov', 0.047), ('higherorder', 0.047), ('features', 0.046), ('cost', 0.046), ('item', 0.045), ('scope', 0.044), ('carreras', 0.043), ('cell', 0.043), ('roark', 0.043), ('labeled', 0.042), ('las', 0.041), ('arrows', 0.041), ('vertical', 0.041), ('valencies', 0.04), ('unlabeled', 0.04), ('adjacent', 0.039), ('score', 0.039), ('label', 0.039), ('tuple', 0.039), ('johansson', 0.038), ('history', 0.037), ('duan', 0.036), ('speed', 0.035), ('augmenting', 0.035), ('dashed', 0.035), ('mira', 0.035), ('bodenstab', 0.034), ('grandchildren', 0.034), ('popped', 0.034), ('yf', 0.034), ('specialized', 0.034), ('errors', 0.034), ('faster', 0.034), ('smith', 0.034), ('horizontal', 0.033), ('accounts', 0.033), ('conll', 0.033), ('structural', 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
3 0.15768485 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques
Author: Richard Zens ; Daisy Stanton ; Peng Xu
Abstract: When trained on very large parallel corpora, the phrase table component of a machine translation system grows to consume vast computational resources. In this paper, we introduce a novel pruning criterion that places phrase table pruning on a sound theoretical foundation. Systematic experiments on four language pairs under various data conditions show that our principled approach is superior to existing ad hoc pruning methods.
4 0.1562833 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
Author: Kenneth Heafield ; Philipp Koehn ; Alon Lavie
Abstract: Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabilities of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lowerorder entries in an N-gram model to score the first few words of a fragment; this violates assumptions made by common smoothing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This improves search at the expense of memory. Conversely, we show how to save memory by collapsing probability and backoff into a single value without changing sentence-level scores, at the expense of less accurate estimates for sentence fragments. These changes can be stacked, achieving better estimates with unchanged memory usage. In order to interpret changes in search accuracy, we adjust the pop limit so that accuracy is unchanged and report the change in CPU time. In a GermanEnglish Moses system with target-side syntax, improved estimates yielded a 63% reduction in CPU time; for a Hiero-style version, the reduction is 21%. The compressed language model uses 26% less RAM while equivalent search quality takes 27% more CPU. Source code is released as part of KenLM.
5 0.14372796 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
Author: Alexander Rush ; Roi Reichart ; Michael Collins ; Amir Globerson
Abstract: State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages.
6 0.13909617 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
7 0.13065703 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
8 0.12853558 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
9 0.12706383 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
10 0.1204559 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
11 0.11814558 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
12 0.10634258 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
13 0.10407542 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
14 0.10340659 55 emnlp-2012-Forest Reranking through Subtree Ranking
15 0.10047762 126 emnlp-2012-Training Factored PCFGs with Expectation Propagation
16 0.099595867 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
17 0.097186096 70 emnlp-2012-Joint Chinese Word Segmentation, POS Tagging and Parsing
18 0.094454095 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
19 0.089655019 81 emnlp-2012-Learning to Map into a Universal POS Tagset
20 0.088789053 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
topicId topicWeight
[(0, 0.286), (1, -0.261), (2, 0.152), (3, -0.107), (4, 0.054), (5, -0.053), (6, -0.084), (7, 0.081), (8, 0.006), (9, 0.257), (10, 0.098), (11, -0.063), (12, 0.189), (13, -0.103), (14, 0.192), (15, -0.077), (16, -0.037), (17, -0.019), (18, -0.062), (19, -0.014), (20, -0.036), (21, -0.031), (22, -0.015), (23, -0.044), (24, -0.057), (25, -0.017), (26, 0.067), (27, -0.057), (28, 0.037), (29, 0.025), (30, -0.019), (31, 0.007), (32, 0.061), (33, -0.021), (34, 0.066), (35, 0.094), (36, -0.044), (37, 0.006), (38, 0.046), (39, -0.066), (40, -0.047), (41, 0.037), (42, 0.072), (43, 0.057), (44, -0.049), (45, 0.069), (46, -0.073), (47, 0.032), (48, 0.063), (49, -0.084)]
simIndex simValue paperId paperTitle
same-paper 1 0.964715 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
3 0.60187685 74 emnlp-2012-Language Model Rest Costs and Space-Efficient Storage
Author: Kenneth Heafield ; Philipp Koehn ; Alon Lavie
Abstract: Approximate search algorithms, such as cube pruning in syntactic machine translation, rely on the language model to estimate probabilities of sentence fragments. We contribute two changes that trade between accuracy of these estimates and memory, holding sentence-level scores constant. Common practice uses lowerorder entries in an N-gram model to score the first few words of a fragment; this violates assumptions made by common smoothing strategies, including Kneser-Ney. Instead, we use a unigram model to score the first word, a bigram for the second, etc. This improves search at the expense of memory. Conversely, we show how to save memory by collapsing probability and backoff into a single value without changing sentence-level scores, at the expense of less accurate estimates for sentence fragments. These changes can be stacked, achieving better estimates with unchanged memory usage. In order to interpret changes in search accuracy, we adjust the pop limit so that accuracy is unchanged and report the change in CPU time. In a GermanEnglish Moses system with target-side syntax, improved estimates yielded a 63% reduction in CPU time; for a Hiero-style version, the reduction is 21%. The compressed language model uses 26% less RAM while equivalent search quality takes 27% more CPU. Source code is released as part of KenLM.
4 0.59784657 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
Author: Daniel Fernandez-Gonzalez ; Carlos Gomez-Rodriguez
Abstract: In this paper, we show that significant improvements in the accuracy of well-known transition-based parsers can be obtained, without sacrificing efficiency, by enriching the parsers with simple transitions that act on buffer nodes. First, we show how adding a specific transition to create either a left or right arc of length one between the first two buffer nodes produces improvements in the accuracy of Nivre’s arc-eager projective parser on a number of datasets from the CoNLL-X shared task. Then, we show that accuracy can also be improved by adding transitions involving the topmost stack node and the second buffer node (allowing a limited form of non-projectivity). None of these transitions has a negative impact on the computational complexity of the algorithm. Although the experiments in this paper use the arc-eager parser, the approach is generic enough to be applicable to any stackbased dependency parser.
5 0.56402695 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
Author: Alexander Rush ; Roi Reichart ; Michael Collins ; Amir Globerson
Abstract: State-of-the-art statistical parsers and POS taggers perform very well when trained with large amounts of in-domain data. When training data is out-of-domain or limited, accuracy degrades. In this paper, we aim to compensate for the lack of available training data by exploiting similarities between test set sentences. We show how to augment sentencelevel models for parsing and POS tagging with inter-sentence consistency constraints. To deal with the resulting global objective, we present an efficient and exact dual decomposition decoding algorithm. In experiments, we add consistency constraints to the MST parser and the Stanford part-of-speech tagger and demonstrate significant error reduction in the domain adaptation and the lightly supervised settings across five languages.
6 0.55610728 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
7 0.53352833 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
8 0.47356853 55 emnlp-2012-Forest Reranking through Subtree Ranking
9 0.45447114 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
10 0.44242454 99 emnlp-2012-On Amortizing Inference Cost for Structured Prediction
11 0.43621582 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
12 0.43415201 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
13 0.42109114 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
14 0.41922668 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
15 0.40772071 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
16 0.40735278 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
17 0.38134128 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
18 0.38080218 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
19 0.37468186 81 emnlp-2012-Learning to Map into a Universal POS Tagset
20 0.362241 11 emnlp-2012-A Systematic Comparison of Phrase Table Pruning Techniques
topicId topicWeight
[(2, 0.013), (11, 0.013), (14, 0.013), (16, 0.498), (25, 0.025), (34, 0.078), (47, 0.011), (60, 0.052), (63, 0.031), (64, 0.015), (65, 0.012), (70, 0.012), (74, 0.046), (76, 0.057), (80, 0.017), (86, 0.014), (95, 0.018)]
simIndex simValue paperId paperTitle
1 0.9877314 56 emnlp-2012-Framework of Automatic Text Summarization Using Reinforcement Learning
Author: Seonggi Ryang ; Takeshi Abekawa
Abstract: We present a new approach to the problem of automatic text summarization called Automatic Summarization using Reinforcement Learning (ASRL) in this paper, which models the process of constructing a summary within the framework of reinforcement learning and attempts to optimize the given score function with the given feature representation of a summary. We demonstrate that the method of reinforcement learning can be adapted to automatic summarization problems naturally and simply, and other summarizing techniques, such as sentence compression, can be easily adapted as actions of the framework. The experimental results indicated ASRL was superior to the best performing method in DUC2004 and comparable to the state of the art ILP-style method, in terms of ROUGE scores. The results also revealed ASRL can search for sub-optimal solutions efficiently under conditions for effectively selecting features and the score function.
same-paper 2 0.95289218 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
3 0.88307703 113 emnlp-2012-Resolving This-issue Anaphora
Author: Varada Kolhatkar ; Graeme Hirst
Abstract: We annotate and resolve a particular case of abstract anaphora, namely, thisissue anaphora. We propose a candidate ranking model for this-issue anaphora resolution that explores different issuespecific and general abstract-anaphora features. The model is not restricted to nominal or verbal antecedents; rather, it is able to identify antecedents that are arbitrary spans of text. Our results show that (a) the model outperforms the strong adjacent-sentence baseline; (b) general abstract-anaphora features, as distinguished from issue-specific features, play a crucial role in this-issue anaphora resolution, suggesting that our approach can be generalized for other NPs such as this problem and this debate; and (c) it is possible to reduce the search space in order to improve performance.
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
5 0.51778436 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
Author: Daniel Fernandez-Gonzalez ; Carlos Gomez-Rodriguez
Abstract: In this paper, we show that significant improvements in the accuracy of well-known transition-based parsers can be obtained, without sacrificing efficiency, by enriching the parsers with simple transitions that act on buffer nodes. First, we show how adding a specific transition to create either a left or right arc of length one between the first two buffer nodes produces improvements in the accuracy of Nivre’s arc-eager projective parser on a number of datasets from the CoNLL-X shared task. Then, we show that accuracy can also be improved by adding transitions involving the topmost stack node and the second buffer node (allowing a limited form of non-projectivity). None of these transitions has a negative impact on the computational complexity of the algorithm. Although the experiments in this paper use the arc-eager parser, the approach is generic enough to be applicable to any stackbased dependency parser.
6 0.51062894 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
7 0.48874971 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
8 0.48662385 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
9 0.48157784 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
10 0.47531065 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
11 0.47291791 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
12 0.46361306 45 emnlp-2012-Exploiting Chunk-level Features to Improve Phrase Chunking
13 0.45917505 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
14 0.45798549 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
15 0.45797119 55 emnlp-2012-Forest Reranking through Subtree Ranking
16 0.43869609 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
17 0.42822522 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
18 0.42567086 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
19 0.42442355 2 emnlp-2012-A Beam-Search Decoder for Grammatical Error Correction
20 0.4208467 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields