emnlp emnlp2011 emnlp2011-137 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
Reference: text
sentIndex sentText sentNum sentScore
1 Training dependency parsers by jointly optimizing multiple objectives Keith Hall Ryan McDonald Jason Katz-Brown Michael Ringgaard Google Research {kbhal l| ryanmcd | j as onkb | ringgaard} @ google . [sent-1, score-0.387]
2 com Abstract We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. [sent-2, score-0.327]
3 The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. [sent-3, score-0.658]
4 Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation. [sent-4, score-0.586]
5 1 Introduction The accuracy and speed of state-of-the-art dependency parsers has motivated a resumed interest in utilizing the output of parsing as an input to many downstream natural language processing tasks. [sent-5, score-0.585]
6 Furthermore, when evaluated on a downstream task, often the optimal parse output has a model score lower than the best parse as predicted by the parsing model. [sent-16, score-0.669]
7 While this means 1489 that we are not properly modeling the downstream task in the parsers, it also means that there is some information from small task or domain-specific data sets which could help direct our search for optimal parameters during parser training. [sent-17, score-0.417]
8 The goal being not necessarily to obtain better parse performance, but to exploit the structure induced from human labeled treebank data while targeting specific extrinsic metrics of quality, which can include task specific metrics or external weak constraints on the parse structure. [sent-18, score-0.784]
9 The reranker can then be trained to optimize for the downstream or extrinsic objective. [sent-22, score-0.621]
10 In this paper, we propose a training algorithm for statistical dependency parsers (K¨ ubler et al. [sent-24, score-0.327]
11 , 2009) in which a single model is jointly optimized for a regular supervised training objective over the treebank data as well as a task-specific objective or more generally an extrinsic objective on an additional data set. [sent-25, score-0.598]
12 tc ho2d0s1 in A Nsasotucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag uesis 1ti4c8s9–149 , eters are optimized based on the objective function associated with the instance (either intrinsic or extrinsic), thus jointly optimizing multiple objectives. [sent-29, score-0.35]
13 We call our algorithm augmented-loss training as it optimizes multiple losses to augment the traditional supervised parser loss. [sent-31, score-0.386]
14 In Section 3 and 4 we present a set of experiments defining diffent augmented losses covering a task-specific extrinsic loss (MT reordering), a domain adaptation loss, and an alternate intrinsic parser loss. [sent-43, score-1.109]
15 The structured perceptron (Algorithm 1) is an on-line learning algorithm which takes as input: 1) a set of training examples di = (xi, yi) consisting of an input sen1490 Algorithm 1 Structured Perceptron {Input data sets: D = {d1= (x1,y1). [sent-47, score-0.311]
16 For dependency parser training, this set-up consists of input sentences x and the corresponding gold dependency tree y ∈ Yx, where Yx is the space of possible parse ytre ∈es Yfor, ,sew nhteenrece Y x. [sent-55, score-0.533]
17 In the perceptron setting, Fθ (x) = arg maxy∈Yx θ · Φ(y) where Φ is mapping from a parse tree y fYor θse ·n Φte(ync)e w x etroe a high daipmpeinngsional feature space. [sent-56, score-0.33]
18 We will show in the next section that our augmentedloss method is general and can be applied to any dependency parsing framework that can be trained by the perceptron algorithm, such as transition-based parsers (Nivre, 2008; Zhang and Clark, 2008) and graph-based parsers (McDonald et al. [sent-60, score-0.661]
19 1 Augmented-Loss Training The augmented-loss training algorithm that we propose is based on the structured perceptron; however, the augmented-loss training framework is a general mechanism to incorporate multiple loss functions in online learner training. [sent-63, score-0.729]
20 The algorithm is an extension to Algorithm 1 where there are 1) multiple loss functions being evaluated L1, . [sent-65, score-0.555]
21 , LM; 2) there are multiple datasets associated with each of these loss functions D1, . [sent-68, score-0.509]
22 , DM; and 3) there is a schedule for processing examples from each of these datasets, where Sched(j, i) is true if the jth loss function should be updated on the ith iteration of training. [sent-71, score-0.545]
23 It can either be a task-specific output of interest, a partial tree, or even null, in the case where learning will be guided strictly by the loss Lj. [sent-73, score-0.502]
24 The training algorithm is effectively the same as the perceptron, the primary difference is that if Lj is an extrinsic loss, we cannot compute the standard updates since we do not necessarily know the correct parse (the line indicated by †). [sent-74, score-0.578]
25 In the experiments in this paper, we only consider dij the case where there are two loss functions: a supervised dependency parsing labeled-attachment loss; and an additional loss, examples of which are presented in Section 3. [sent-77, score-0.65]
26 2 Inline Ranker Training In order to make Algorithm 2 more concrete, we need a way of defining the loss and resulting parameter updates for the case when Lj is not a standard supervised parsing loss († from Algorithm 2). [sent-79, score-1.089]
27 For example, consider a machine translation reordering system which uses the parse ˆy to reorder the words of xi, the optimal reordering being yi. [sent-82, score-0.558]
28 Then C(xi, yˆ, yi) is a reordering cost which is large if the predicted parse induces a poor reordering of xi. [sent-83, score-0.54]
29 We propose a general purpose loss function which is based on parser k-best lists. [sent-84, score-0.66]
30 The inline reranker uses the currently trained parser model θ to parse 1491 Algorithm 2 Augmented-Loss Perceptron {Input data sets}: nDpu1 = {tda 11s = }(:x11, y11) . [sent-85, score-0.606]
31 f i0n : cj + 1] {Compute ≡st Nructured loss for instance} {ifC CLojm isp uintetr sitnrsuicct luoressd t lhoessn yˆ = Fθ (xjcj ) if Lj ( yˆ, ycjj ) > 0 then }0~} enθd i =f θ + Φ(ycjj) − Φ( yˆ) {yjcjis a tree} else if Lj is an extrinsic loss then {See Section 2. [sent-103, score-1.331]
32 We can compute the cost function C(xi, yˆ, yi) for all ˆy ∈ If the 1-best parse, ˆy 1, has t)h feo rlo awlles yˆt cost, then there is no lower cost parse in this k-best list. [sent-108, score-0.316]
33 Otherwise, the lowest-cost parse in is taken to be the correct output structure yi, and the 1-best parse is taken to be an incorrect prediction. [sent-109, score-0.379]
34 The intuition behind this method is that in the presence of only a cost function and a k-best list, the parameters will be updated towards the parse structure that has the lowest cost, which over time will move the parameters of the model to a place with low extrinsic loss. [sent-116, score-0.443]
35 We exploit this formulation of the generalpurpose augmented-loss function as it allows one to include any extrinsic cost function which is dependent of parses. [sent-117, score-0.357]
36 There are two significant differences from the inline reranker loss function and standard reranker training. [sent-123, score-0.945]
37 Second, the feedback function for selecting a parse is based on an external objective function. [sent-125, score-0.335]
38 Let m be the number of mistakes made when training the perceptron (Algorithm 2) with inline ranker loss (Algorithm 3) on D, where a mistake occurs for (x, y) ∈ D with parameter vector θ when ∃ yˆj ∈ where yˆj yˆ1 earn dve L(ˆ yj , y) < L(ˆ y1 , y). [sent-134, score-0.939]
39 Assumption 2 states that for any θ that exists during training, but before convergence, there is at least one example in the training data where k is large enough to include one output with a lower loss when yˆ1 does not have the optimal minimal loss. [sent-148, score-0.581]
40 If k = ∞, then this is the standard perceptron as it guarantees htehen optimal tlhoess s output dto p beer cienp tthreo nk- abses itt list. [sent-149, score-0.293]
41 , not that the k-best list will include the minimal loss output, only a single output with a lower loss than the current best guess. [sent-152, score-0.951]
42 Training the perceptron (Algorithm 2) with inline ranker loss (Algorithm 3) on D 1) converges in firnaitnek time, asn (dA 2) produces parameters θe sguecsh i nth fai-t for all (x, y) ∈ D, if ˆy = arg max yˆ θ · Φ(ˆ y) then L(ˆ y, y) = 0y (or equivalent m airngimmaalx loss). [sent-155, score-0.882]
43 Thus, the perceptron algorithm will converge to optimal minimal loss under the assumption that k is large enough so that the model can keep improv- ing. [sent-160, score-0.746]
44 Note that this does not mean k must be large enough to include a zero or minimum loss output, just large enough to include a better output than the current best hypothesis. [sent-161, score-0.502]
45 This analysis does not assume anything about the loss L. [sent-163, score-0.449]
46 It is only required that the loss for a specific inputoutput pair is fixed throughout training. [sent-165, score-0.449]
47 Thus, the above analysis covers the case where some training instances use an extrinsic loss and others an intrinsic parsing loss. [sent-166, score-0.943]
48 until an k is reached that includes a lower loss parse. [sent-171, score-0.449]
49 1 Dependency Parsers The augmented-loss framework we present is general in the sense that it can be combined with any loss function and any parser, provided the parser can be parameterized as a linear classifier, trained with the perceptron and is capable of producing a k-best list of trees. [sent-174, score-0.829]
50 • Transition-based: An implementation of the tTrranansistiitoino-nb-absaedse dependency parsing f orfam theework (Nivre, 2008) using an arc-eager transition strategy and are trained using the perceptron algorithm as in Zhang and Clark (2008) with a beam size of 8. [sent-176, score-0.448]
51 The graphbased parser features used in the experiments in this paper are defined over a word, wi at position i; the head of this word wρ(i) where ρ(i) provides the index of the head word; and partof-speech tags of these words ti. [sent-183, score-0.312]
52 2 Data and Tasks In the next section, we present a set of scoring functions that can be used in the inline reranker loss framework, resulting in a new augmented-loss for each one. [sent-186, score-0.818]
53 Augmented-loss learning is then applied to target a downstream task using the loss functions to measure gains. [sent-187, score-0.718]
54 We show empirical results for two extrinsic loss-functions (optimizing for the downstream task): machine translation and domain adaptation; and for one intrinsic loss-function: an arclength parsing score. [sent-188, score-0.699]
55 For some experiments we also measure the standard intrinsic parser metrics unlabeled attachment score (UAS) and labeled attachment score (LAS) (Buchholz and Marsi, 2006). [sent-189, score-0.609]
56 2, we use a reorderingbased loss function to improve word order in a machine translation system. [sent-201, score-0.527]
57 In particular, we use a system of source-side reordering rules which, given a parse of the source sentence, will reorder the sentence into a target-side order (Collins et al. [sent-202, score-0.305]
58 In our experiments we work with a set of EnglishJapanese reordering rules1 and gold reorderings based on human generated correct reordering of an aligned target sentences. [sent-204, score-0.394]
59 We use a reordering score based on the reordering penalty from the METEOR scoring metric. [sent-205, score-0.388]
60 The extrinsic reordering training data consists of 10930 examples of English sentences and their correct Japanese word-order. [sent-208, score-0.475]
61 5 where for every two treebank updates we make one augmented-loss update, 1is a 1-to-1 mix, and 2 is where we make twice as many augmented-loss updates as treebank updates. [sent-216, score-0.34]
62 Table 1 shows the results of using the reordering cost as an augmented-loss to the standard treebank objective function. [sent-219, score-0.378]
63 Results are presented as mea- sured by the reordering score as well as a coarse exact-match score (the number of sentences which would have correct word-order given the parse and the fixed reordering rules). [sent-220, score-0.585]
64 We see continued improvements as we adjust the schedule to process the extrinsic loss more frequently, the best result being when we make two augmented-loss updates for every one treebank-based loss update. [sent-221, score-1.279]
65 (2010) observed that dependency parsers tend to do quite poorly when parsing questions due to their limited exposure to them in the news corpora from the PennTreebank. [sent-226, score-0.323]
66 The first is a parser trained on the standard training sections of the PennTreebank (PTB) and the second is a parser trained on the training portion of the QuestionBank (QTB). [sent-228, score-0.422]
67 Results for both both transition and graph-based parsers) the labeled accuracy score (LAS), unlabeled accuracy score (UAS) and Root-F1 for parsers trained on the PTB and QTB and tested on the QTB. [sent-229, score-0.315]
68 The augmented-loss parsers are trained on the PTB but with a partial tree loss on QTB that considers only root dependencies. [sent-230, score-0.617]
69 To test whether such weak information can significantly improve the parsing of questions, we trained an augmented-loss parser using the training set of the QTB stripped of all dependencies except the dependency from the root to the main verb of the sentence. [sent-242, score-0.583]
70 In other words, for each sentence, the parser may only observe a single dependency at training from the QTB the dependency to the main verb. [sent-243, score-0.447]
71 Our augmented-loss function in this case is a simple binary function: 0 if a parse has the correct root dependency and 1 if it does not. [sent-244, score-0.37]
72 Clearly improving just a single (and simple to annotate) dependency leads to general parser improvements. [sent-254, score-0.288]
73 3 Average Arc Length Score The augmented-loss framework can be used to incorporate multiple treebank-based loss functions as well. [sent-256, score-0.509]
74 Labeled attachment score is used as our base model loss function. [sent-257, score-0.527]
75 Optimizing for dependency arc length is particularly important as parsers tend to do worse on longer dependencies (McDonald and Nivre, 2007) and these dependencies are typically the most meaningful for downstream tasks, e. [sent-262, score-0.602]
76 , main verb dependencies for tasks 2For the graph-based parser one can also find the higest scoring tree with correct root by setting the score of all competing arcs to −∞. [sent-264, score-0.334]
77 The augmentedloss not only helps us improve on the longer dependencies (as reflected in the increased ALS), but also in the main parser objective function of LAS and UAS. [sent-287, score-0.387]
78 Using the labeled loss function provides better reinforcement as can be seen in the improvements over the unlabeled loss-function. [sent-288, score-0.587]
79 As with all experiments in this paper, the graph-based parser baselines are much lower than the transition-based parser due to the use of arc-factored features. [sent-289, score-0.34]
80 In these experiments we used an inline-ranker loss with 8 parses. [sent-290, score-0.449]
81 We experimented with larger sizes (16 and 64) and found very similar improvements: for example, the transition parser’s LAS for the labeled loss is 88. [sent-291, score-0.534]
82 In that work, a parser is used to first parse a set of manually reordered sentences to produce k-best lists. [sent-298, score-0.329]
83 The parse with the best reordering score is then fixed and added back to the training set and a new parser is trained on resulting data. [sent-299, score-0.548]
84 Our method differs as it does not statically fix a new parse, but dynamically updates the parameters and parse selection by incorporating the additional loss in the inner loop of online learning. [sent-302, score-0.721]
85 Furthermore, we also evaluate the method on alternate extrinsic loss functions. [sent-304, score-0.667]
86 Similar to the inline-ranker loss function presented here, they use a k-best lists of hypotheses in order to identify parameters which can improve a global ob- jective function: BLEU score. [sent-307, score-0.49]
87 Their algorithm specifically optimizes a loss function with 1496 the addition of constraints based on unlabeled data (what we call extrinsic datasets). [sent-312, score-0.852]
88 The augmented-loss algorithm can be viewed as an online version of this algorithm which performs model updates based on the augmented-loss functions directly (rather than adding a set of examples to the training set). [sent-315, score-0.338]
89 Again, these works are typically interested in using the extrinsic metric or, in general, extrinsic information to optimize the intrinsic metric in the absence of any labeled intrinsic data. [sent-319, score-0.841]
90 The idea of jointly training parsers to optimize multiple objectives is related to joint learning and inference for tasks like information extraction (Finkel and Manning, 2009) and machine translation (Burkett et al. [sent-321, score-0.353]
91 Our method only uses feedback from task specific objectives in order to update the parser parameters, guiding it towards better downstream performance. [sent-325, score-0.518]
92 – 6 Discussion The empirical results show that incorporating an augmented-loss using the inline-ranker loss framework achieves better performance under metrics associated with the external loss function. [sent-330, score-0.984]
93 For the intrinsic loss, we see that the augmented-loss frame- work can also result in an improvement in parsing performance; however, in the case of ALS, this is due to the fact that the loss function is very closely related to the standard evaluation metrics of UAS and LAS. [sent-331, score-0.759]
94 Although our analysis suggests that this algorithm is guaranteed to converge only for the separable case, it makes a further assumption that if there is a better parse under the augmented-loss, then there must be a lower cost parse in the k-best list. [sent-332, score-0.401]
95 To build a dependency parser which is better adapted to a downstream task, one would want to perform augmented-loss training on the tagger as well. [sent-342, score-0.538]
96 7 Conclusion We introduced the augmented-loss training algorithm and show that the algorithm can incorporate additional loss functions to adapt the model towards extrinsic evaluation metrics. [sent-343, score-0.86]
97 We present an empirical analysis for training dependency parsers for multiple parsing algorithms and multiple loss functions. [sent-345, score-0.813]
98 The augmented-loss framework supports both intrinsic and extrinsic losses, allowing for both com- binations of objectives as well as multiple sources of data for which the results of a parser can be evaluated. [sent-346, score-0.611]
99 Our dependency parsing results show that we are not limited to increasing parser performance via more data or external domain adaptation techniques, but that we can incorporate the downstream task into parser training. [sent-349, score-0.847]
100 Using a dependency parser to improve SMT for SubjectObject-Verb languages. [sent-594, score-0.288]
wordName wordTfidf (topN-words)
[('loss', 0.449), ('extrinsic', 0.218), ('downstream', 0.209), ('qtb', 0.201), ('reordering', 0.178), ('parser', 0.17), ('perceptron', 0.169), ('inline', 0.163), ('ycjj', 0.163), ('intrinsic', 0.152), ('reranker', 0.146), ('als', 0.146), ('parse', 0.127), ('las', 0.126), ('ptb', 0.123), ('parsers', 0.122), ('dependency', 0.118), ('theorem', 0.109), ('updates', 0.108), ('lj', 0.105), ('uas', 0.095), ('yi', 0.087), ('parsing', 0.083), ('objective', 0.081), ('mcdonald', 0.081), ('weak', 0.077), ('losses', 0.075), ('yx', 0.073), ('objectives', 0.071), ('xi', 0.071), ('codl', 0.07), ('sched', 0.07), ('xcjj', 0.07), ('stack', 0.068), ('ranker', 0.067), ('ringgaard', 0.063), ('treebank', 0.062), ('questionbank', 0.06), ('functions', 0.06), ('cost', 0.057), ('ganchev', 0.057), ('arc', 0.057), ('structured', 0.055), ('schedule', 0.055), ('optimizes', 0.054), ('labeled', 0.053), ('output', 0.053), ('cj', 0.052), ('external', 0.052), ('talbot', 0.051), ('meteor', 0.051), ('mistakes', 0.05), ('wi', 0.048), ('dependencies', 0.048), ('optimize', 0.048), ('buffer', 0.047), ('mcclosky', 0.047), ('head', 0.047), ('petrov', 0.047), ('augmentedloss', 0.047), ('indefinitely', 0.047), ('chang', 0.047), ('root', 0.046), ('algorithm', 0.046), ('attachment', 0.046), ('adaptation', 0.045), ('unlabeled', 0.044), ('assumption', 0.044), ('blitzer', 0.043), ('optimizing', 0.042), ('function', 0.041), ('training', 0.041), ('ifc', 0.04), ('ichikawa', 0.04), ('seno', 0.04), ('optimal', 0.038), ('correct', 0.038), ('mann', 0.038), ('translation', 0.037), ('online', 0.037), ('berant', 0.036), ('summed', 0.036), ('update', 0.034), ('feedback', 0.034), ('driven', 0.034), ('metrics', 0.034), ('convergence', 0.034), ('incorrect', 0.034), ('arg', 0.034), ('nakagawa', 0.034), ('feo', 0.034), ('jointly', 0.034), ('updating', 0.033), ('guarantees', 0.033), ('score', 0.032), ('transition', 0.032), ('reordered', 0.032), ('yates', 0.032), ('maxy', 0.032), ('signals', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
2 0.392903 136 emnlp-2011-Training a Parser for Machine Translation Reordering
Author: Jason Katz-Brown ; Slav Petrov ; Ryan McDonald ; Franz Och ; David Talbot ; Hiroshi Ichikawa ; Masakazu Seno ; Hideto Kazawa
Abstract: We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.
3 0.2065547 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
Author: Ryan McDonald ; Slav Petrov ; Keith Hall
Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source lan- guages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-theart performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.
4 0.1730656 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
5 0.16955915 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
Author: Karthik Visweswariah ; Rajakrishnan Rajkumar ; Ankur Gandhe ; Ananthakrishnan Ramanathan ; Jiri Navratil
Abstract: Preordering of source side sentences has proved to be useful in improving statistical machine translation. Most work has used a parser in the source language along with rules to map the source language word order into the target language word order. The requirement to have a source language parser is a major drawback, which we seek to overcome in this paper. Instead of using a parser and then using rules to order the source side sentence we learn a model that can directly reorder source side sentences to match target word order using a small parallel corpus with highquality word alignments. Our model learns pairwise costs of a word immediately preced- ing another word. We use the Lin-Kernighan heuristic to find the best source reordering efficiently during training and testing and show that it suffices to provide good quality reordering. We show gains in translation performance based on our reordering model for translating from Hindi to English, Urdu to English (with a public dataset), and English to Hindi. For English to Hindi we show that our technique achieves better performance than a method that uses rules applied to the source side English parse.
6 0.14593878 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
7 0.14509869 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
8 0.1358524 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
9 0.13416848 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
10 0.13372487 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
11 0.12510827 74 emnlp-2011-Inducing Sentence Structure from Parallel Corpora for Reordering
12 0.12328822 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
13 0.11089152 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
14 0.11024031 129 emnlp-2011-Structured Sparsity in Structured Prediction
15 0.10887899 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
16 0.10395943 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
17 0.096125551 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations
18 0.086605504 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
19 0.085183345 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
20 0.08517544 141 emnlp-2011-Unsupervised Dependency Parsing without Gold Part-of-Speech Tags
topicId topicWeight
[(0, 0.323), (1, 0.211), (2, 0.072), (3, 0.248), (4, -0.02), (5, 0.15), (6, 0.228), (7, 0.032), (8, 0.035), (9, 0.092), (10, -0.049), (11, -0.155), (12, -0.113), (13, -0.041), (14, -0.006), (15, -0.036), (16, 0.058), (17, -0.068), (18, 0.072), (19, -0.094), (20, 0.01), (21, -0.04), (22, 0.139), (23, -0.093), (24, 0.26), (25, -0.057), (26, -0.132), (27, 0.051), (28, 0.018), (29, 0.056), (30, -0.03), (31, -0.042), (32, 0.002), (33, 0.003), (34, -0.034), (35, -0.006), (36, 0.007), (37, -0.059), (38, 0.015), (39, -0.066), (40, -0.002), (41, 0.08), (42, 0.054), (43, -0.01), (44, -0.088), (45, 0.079), (46, -0.008), (47, -0.003), (48, 0.048), (49, -0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.96085614 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
2 0.80525172 136 emnlp-2011-Training a Parser for Machine Translation Reordering
Author: Jason Katz-Brown ; Slav Petrov ; Ryan McDonald ; Franz Och ; David Talbot ; Hiroshi Ichikawa ; Masakazu Seno ; Hideto Kazawa
Abstract: We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.
3 0.57417822 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types
Author: Enrique Henestroza Anguiano ; Marie Candito
Abstract: This paper develops a framework for syntactic dependency parse correction. Dependencies in an input parse tree are revised by selecting, for a given dependent, the best governor from within a small set of candidates. We use a discriminative linear ranking model to select the best governor from a group of candidates for a dependent, and our model includes a rich feature set that encodes syntactic structure in the input parse tree. The parse correction framework is parser-agnostic, and can correct attachments using either a generic model or specialized models tailored to difficult attachment types like coordination and pp-attachment. Our experiments show that parse correction, combining a generic model with specialized models for difficult attachment types, can successfully improve the quality of predicted parse trees output by sev- eral representative state-of-the-art dependency parsers for French.
4 0.56642145 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser
Author: Stephen Tratz ; Eduard Hovy
Abstract: Dependency parsers are critical components within many NLP systems. However, currently available dependency parsers each exhibit at least one of several weaknesses, including high running time, limited accuracy, vague dependency labels, and lack of nonprojectivity support. Furthermore, no commonly used parser provides additional shallow semantic interpretation, such as preposition sense disambiguation and noun compound interpretation. In this paper, we present a new dependency-tree conversion of the Penn Treebank along with its associated fine-grain dependency labels and a fast, accurate parser trained on it. We explain how a non-projective extension to shift-reduce parsing can be incorporated into non-directional easy-first parsing. The parser performs well when evaluated on the standard test section of the Penn Treebank, outperforming several popular open source dependency parsers; it is, to the best of our knowledge, the first dependency parser capable of parsing more than 75 sentences per second at over 93% accuracy.
5 0.54859149 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
Author: Ryan McDonald ; Slav Petrov ; Keith Hall
Abstract: We present a simple method for transferring dependency parsers from source languages with labeled training data to target languages without labeled training data. We first demonstrate that delexicalized parsers can be directly transferred between languages, producing significantly higher accuracies than unsupervised parsers. We then use a constraint driven learning algorithm where constraints are drawn from parallel corpora to project the final parser. Unlike previous work on projecting syntactic resources, we show that simple methods for introducing multiple source lan- guages can significantly improve the overall quality of the resulting parsers. The projected parsers from our system result in state-of-theart performance when compared to previously studied unsupervised and projected parsing systems across eight different languages.
6 0.5397734 74 emnlp-2011-Inducing Sentence Structure from Parallel Corpora for Reordering
7 0.51215982 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
8 0.50827575 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus
9 0.47891915 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation
10 0.4722662 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
11 0.44411257 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
12 0.43515557 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
13 0.40277678 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
14 0.40081224 129 emnlp-2011-Structured Sparsity in Structured Prediction
15 0.39412847 138 emnlp-2011-Tuning as Ranking
16 0.37974936 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
17 0.33641034 24 emnlp-2011-Bootstrapping Semantic Parsers from Conversations
18 0.32423672 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
19 0.31627041 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification
20 0.30491573 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
topicId topicWeight
[(23, 0.162), (36, 0.044), (37, 0.049), (45, 0.069), (47, 0.193), (53, 0.019), (54, 0.02), (57, 0.015), (62, 0.024), (64, 0.057), (66, 0.032), (69, 0.02), (79, 0.06), (82, 0.031), (90, 0.024), (96, 0.061), (98, 0.033)]
simIndex simValue paperId paperTitle
1 0.92593169 40 emnlp-2011-Discovering Relations between Noun Categories
Author: Thahir Mohamed ; Estevam Hruschka ; Tom Mitchell
Abstract: Traditional approaches to Relation Extraction from text require manually defining the relations to be extracted. We propose here an approach to automatically discovering relevant relations, given a large text corpus plus an initial ontology defining hundreds of noun categories (e.g., Athlete, Musician, Instrument). Our approach discovers frequently stated relations between pairs of these categories, using a two step process. For each pair of categories (e.g., Musician and Instrument) it first coclusters the text contexts that connect known instances of the two categories, generating a candidate relation for each resulting cluster. It then applies a trained classifier to determine which of these candidate relations is semantically valid. Our experiments apply this to a text corpus containing approximately 200 million web pages and an ontology containing 122 categories from the NELL system [Carlson et al., 2010b], producing a set of 781 proposed can- didate relations, approximately half of which are semantically valid. We conclude this is a useful approach to semi-automatic extension of the ontology for large-scale information extraction systems such as NELL. 1
same-paper 2 0.82794738 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
Author: Keith Hall ; Ryan McDonald ; Jason Katz-Brown ; Michael Ringgaard
Abstract: We present an online learning algorithm for training parsers which allows for the inclusion of multiple objective functions. The primary example is the extension of a standard supervised parsing objective function with additional loss-functions, either based on intrinsic parsing quality or task-specific extrinsic measures of quality. Our empirical results show how this approach performs for two dependency parsing algorithms (graph-based and transition-based parsing) and how it achieves increased performance on multiple target tasks including reordering for machine translation and parser adaptation.
3 0.76724833 136 emnlp-2011-Training a Parser for Machine Translation Reordering
Author: Jason Katz-Brown ; Slav Petrov ; Ryan McDonald ; Franz Och ; David Talbot ; Hiroshi Ichikawa ; Masakazu Seno ; Hideto Kazawa
Abstract: We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.
4 0.7437107 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
5 0.73335081 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
Author: Sebastian Riedel ; Andrew McCallum
Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.
6 0.72090703 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
7 0.71337301 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
8 0.71004981 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction
9 0.70977545 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
10 0.7054705 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
11 0.70537364 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases
12 0.7041887 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
13 0.70363086 6 emnlp-2011-A Generate and Rank Approach to Sentence Paraphrasing
14 0.70328444 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
15 0.70310688 126 emnlp-2011-Structural Opinion Mining for Graph-based Sentiment Representation
16 0.70258343 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
17 0.70024306 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
18 0.69951743 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study
19 0.69712895 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
20 0.6967333 54 emnlp-2011-Exploiting Parse Structures for Native Language Identification