emnlp emnlp2012 emnlp2012-59 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Bernd Bohnet ; Anders Bjorkelund ; Jonas Kuhn ; Wolfgang Seeker ; Sina Zarriess
Abstract: We propose a technique to generate nonprojective word orders in an efficient statistical linearization system. Our approach predicts liftings of edges in an unordered syntactic tree by means of a classifier, and uses a projective algorithm for tree linearization. We obtain statistically significant improvements on six typologically different languages: English, German, Dutch, Danish, Hungarian, and Czech.
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract We propose a technique to generate nonprojective word orders in an efficient statistical linearization system. [sent-3, score-0.56]
2 Our approach predicts liftings of edges in an unordered syntactic tree by means of a classifier, and uses a projective algorithm for tree linearization. [sent-4, score-0.701]
3 Here, the input is a linguistic representation, such as a syntactic dependency tree lacking all precedence information, and the task is to determine a natural, coherent linearization of the words. [sent-9, score-0.59]
4 The standard data-driven approach is to traverse the dependency tree deciding locally at each node on the relative order of the head and its children. [sent-10, score-0.311]
5 928 However, the approach can only generate projective word orders (which can be drawn without any crossing edges). [sent-14, score-0.261]
6 *It is federal support should try to what achieve *It is federal support should try to achieve what *It is try to achieve what federal support should Although rather infrequent in English, nonprojective word orders are quite common in languages with a less restrictive word order. [sent-20, score-0.48]
7 In these languages, it is often possible to find a grammatically correct projective linearization for a given input tree, but discourse coherence, information structure, and stylistic factors will often make speakers prefer some non-projective word order. [sent-21, score-0.565]
8 , 1998; Nivre and Nilsson, 2005), the parsing algorithm is restricted to projective structures, but the issue is side-stepped by converting non-projective structures to projective ones prior to training and application, and then restoring the original structure afterwards. [sent-43, score-0.286]
9 Similarly, we split the linearization task in two stages: initially, the input tree is modified by lifting certain edges in such a way that new orderings become possible even under a projectivity constraint; the second stage is the original, projective linearization step. [sent-44, score-1.717]
10 In parsing, projectivization is a deterministic process that lifts edges based on the linear order of a sentence. [sent-45, score-0.277]
11 Therefore, we use a statistical classifier as our initial lifting component. [sent-47, score-0.568]
12 This classifier has to be trained on suitable data, and it is an empirical question whether the projective linearizer can take advantage of this preceding lifting step. [sent-48, score-0.948]
13 929 2 Related Work An important concept for tree linearization are word order domains (Reape, 1989). [sent-52, score-0.548]
14 A straightforward method to obtain the word order domains from dependency trees and to order the words in the tree is to use each word and its children as domain and then to order the domains and contained words recursively. [sent-54, score-0.343]
15 Statistical methods for linearization have recently become more popular (Langkilde and Knight, 1998; Ringger et al. [sent-72, score-0.395]
16 They use machine-learning techniques to lift edges in a preprocessing step to a surface realizer. [sent-86, score-0.246]
17 3 Lifting Dependency Edges In this section, we describe the first of the two stages in our approach, namely the classifier that lifts edges in dependency trees. [sent-89, score-0.298]
18 The classifier we aim to train is meant to predict liftings on a given unordered dependency tree, yielding a tree that, with a perfect lin- earization, would not have any non-projective edges. [sent-90, score-0.44]
19 Moreover, a dependency tree is projective iff all its edges are projective. [sent-100, score-0.429]
20 A lifting of an edge h → d (or simply of the node d) Ais an operation tdhgaet replaces hr → pdl yw oifth t g → d, given tnha otp ptherearteio oenxi tshtsa an edge g → →h idn wthiteh tree, →an dd, uginvdeenfi tnheadt hoethreer ewxiisset (i. [sent-102, score-0.763]
21 2 When the lifting 2The undefined case occurs only when d depends on the root, and hence cannot be lifted further; but these edges are by definition projective, since the root dominates the entire tree. [sent-105, score-0.979]
22 930 operation is applied n successive times to the same node, we say the node was lifted n steps. [sent-106, score-0.394]
23 It works by iteratively lifting the shortest nonprojective edges until the tree is projective. [sent-109, score-0.858]
24 Since finding the shortest edge relies on the linear order, instead of lifting the shortest edge, we lift non-projective edges ordered by depth in the tree, starting with the deepest nested edge. [sent-111, score-0.91]
25 A lifted version of the tree from Figure 1 is shown in Figure 3. [sent-112, score-0.417]
26 The edge of what has been lifted three steps (the original edge is dotted), and the tree is no longer non-projective. [sent-113, score-0.591]
27 We model the edge lifting problem as a multiclass classification problem and consider nodes one at a time and ask the question “How far should this edge be lifted? [sent-116, score-0.752]
28 Its class is determined by the number of steps it would be lifted by the projectivization algorithm given the linear order (in most cases the class corresponds to no lifting, since most edges are projective). [sent-127, score-0.546]
29 As we traverse the nodes, we also execute the liftings (if any) and update the tree on the fly. [sent-128, score-0.275]
30 The features used for the lifting classifier are described in Table 1. [sent-131, score-0.568]
31 involve the lemma, dependency edge label, part-ofspeech tag, and morphological features of the node in question, and of several neighboring nodes in the dependency tree. [sent-141, score-0.334]
32 ”; or lifting one step at a time and applying the classifier iteratively until it says stop. [sent-147, score-0.568]
33 Second, to avoid data sparseness for infrequent lifting distances, we introduce a maximum number of liftings. [sent-150, score-0.52]
34 3 This means that we are able to predict the correct lifting for most (but not all) of the non-projective edges in our data sets (cf. [sent-153, score-0.654]
35 Third, as Nivre and Nilsson (2005) do for pars3During training, nodes that are lifted further than maxsteps are assigned to the class corresponding to maxsteps. [sent-155, score-0.383]
36 931 ing, we experimented with marking edges that were lifted by indicating this on the edge labels. [sent-159, score-0.546]
37 In our case, it could potentially be beneficial for both the lifting classifier, and for the linearizer. [sent-161, score-0.52]
38 3 Decoding In the decoding stage, an unordered tree is given and the goal is to lift edges that would be non-projective with respect to the gold linear order. [sent-164, score-0.402]
39 Similarly to how training instances are derived, the decoding algorithm traverses the tree bottom-up and visits every node once. [sent-165, score-0.232]
40 When a node is visited, the classifier is applied and the corresponding lifting is executed. [sent-167, score-0.637]
41 If ni is visited before nj, and ni is lifted, this means 4The MIN function is used to guarantee that the edge is not lifted beyond the root node of the tree. [sent-172, score-0.642]
42 This allows us to consider nj both in the context where ni has been lifted and when it has not been lifted. [sent-176, score-0.424]
43 Every tree also has an associated score, which is the sum of the scores of each lifting so far. [sent-181, score-0.612]
44 The score of a lifting is defined to be the log proba- bility returned from the logistic classifier. [sent-182, score-0.52]
45 After exploring all trees in the agenda, the k-best new trees from the beam are extracted and put back into the agenda. [sent-183, score-0.247]
46 Again, consider the sibling nodes ni and nj when ni is visited before nj. [sent-190, score-0.267]
47 The beam allows us to consider nj both when ni is lifted and when it is not. [sent-191, score-0.595]
48 4 Linearization A linearizer searches for the optimal word order given an unordered dependency tree, where the optimal word order is defined as the single reference order of the dependency tree in the gold standard. [sent-198, score-0.629]
49 We employ a statistical linearizer that is trained on a corpus of pairs consisting of unordered dependency trees and their corresponding sentences. [sent-199, score-0.422]
50 The linearization method consists of the following steps: Creating word order domains. [sent-200, score-0.426]
51 In the first step, we build the word order domains dh for all nodes h ∈ y of a dependency tree y. [sent-201, score-0.321]
52 This way, the linearizer can deduce word orders that would result in non-projective structures in the non-lifted tree. [sent-205, score-0.325]
53 In the second step, the linearizer orders the words of each domain. [sent-207, score-0.325]
54 In our implementation, the linearizer traverses the tree either top-down or bottom-up. [sent-210, score-0.365]
55 1 // T is the dependency tree with lifted nodes 2 beam-size ← 1000 3 fboera mh- ∈ Te ← ←do 1 4 dho ∈ma Tin dho ← GET-DOMAIN(T,h) 5 // initialize← the beam with a empty word list 6 Agendah ← (? [sent-211, score-0.706]
56 The linearization algorithm initializes the word order beam (agendah) with an empty order (? [sent-214, score-0.628]
57 If the beam (beam) exceeds a certain size (beam-size), it is sorted by score and pruned to maximum beam size (beamsize) (lines 16-20). [sent-219, score-0.342]
58 For this, the algorithm iterates over the alter- 933 native word orders of the domains in order to assemble different word orders on the sentence level. [sent-233, score-0.237]
59 5 Finally, when traversing the tree bottom-up, the algorithm has to use the different orders of the already ordered subtrees as context, which also requires a search over alternative word orders of the domains. [sent-234, score-0.268]
60 In the case that the linearization of a word order domain is incorrect the algorithm updates its weight vector w. [sent-241, score-0.426]
61 The following equation shows the update function of the weight vector: w = w + τh(φ(dh, T, xg) φ(dh, T, xp)) We update the weight vector w by adding the dif− ference of the feature vector representation of the correct linearization xg and the wrongly predicted linearization xp, multiplied by τ. [sent-242, score-0.822]
62 The linearizer traverses the tree either top-down or bottomup and assembles the results in the surface order. [sent-246, score-0.423]
63 The bottom-up linearization algorithm can take into account features drawn from the already ordered subtrees while the top-down algorithm can employ as context only the unordered nodes. [sent-247, score-0.482]
64 However, the bottom-up algorithm additionally has to carry out a search over the alternative linearization of the subdomains, as different orders of the subdomain provide different context features. [sent-248, score-0.483]
65 84% Table 3: Size of training sets, percentage of nonprojective (np) sentences and edges, percentage of np edges covered by 3 lifting steps. [sent-282, score-0.825]
66 The last column shows the percentage of non-projective edges that can be made projective by at most 3 lifting steps. [sent-287, score-0.844]
67 1 Setup In our two-stage approach, we first train the lifting classifier. [sent-289, score-0.52]
68 Second, we train the linearizer on the output of the lifting classifier. [sent-292, score-0.757]
69 gold-lifted structures, which gives us an upper bound for the lifting technique. [sent-295, score-0.52]
70 In this two-stage setup, we have the problem that, if we re-apply the lifting classifier on the data it was trained on, the input for the linearizer will be better during training than during testing. [sent-298, score-0.805]
71 To provide realistic training data for the linearizer, we make a 10fold cross-validation of the lifting classifier on the training set, and use this as training data for the linearizer. [sent-299, score-0.568]
72 The lifting classifier that is applied to the test set is trained on the entire training set. [sent-300, score-0.568]
73 2 Lifting results To evaluate the performance of the lifting classifier, we present precision, recall, and F-measure results for each language. [sent-302, score-0.52]
74 We also compute the percentage of sentences that were handled perfectly by the lifting classifier. [sent-303, score-0.567]
75 Although the major evaluation of the lifting is given by the performance of the linearizer, Table 4 gives us some clues about the lifting. [sent-310, score-0.52]
76 3 Linearization Results and Discussion We evaluate the linearizer with standard metrics: ngram overlap measures (BLEU, NIST), edit distance (Edit), and the proportion of exactly linearized sentences (Exact). [sent-327, score-0.265]
77 As a means to assess the impact of lifting more precisely, we propose the word-based measure Exactlift which only looks at the words with an incoming lifted edge. [sent-328, score-0.845]
78 e420rn1t,94l82i1f30t- ings, Exactlift is the exact match for words with an incoming lifted edge, Nlift is the total number of lifted edges. [sent-360, score-0.65]
79 The scores on the oracle liftings suggest that the impact of lifting on linearization is heavily language-dependent: It is highest on the V2languages, and somewhat smaller on English, Hungarian, and Czech. [sent-364, score-1.068]
80 perlireigfpthhtery 80 75 aryc567u0 Eng Ger DultanguageDan Hun Cze Figure 4: Accuracy for the linearization of the sentences’ left and right periphery, the bars are upper and lower bounds of the non-lifted and the gold-lifted baseline. [sent-370, score-0.395]
81 The Exactlift measure refines this picture: The linearization of the non-projective edges is relatively exact in English, and much less precise in Hungarian and Czech where Exactlift is even low on the goldlifted edges. [sent-371, score-0.529]
82 The linearization quality is also quite moderate on Dutch where the lifting leads to considerable improvements. [sent-372, score-0.915]
83 These tendencies point to some important underlying distinctions in the nonprojective word order phenomena over which we are generalizing: In certain cases, the linearization seems to systematically follow from the fact that the edge has to be lifted, such as wh-extraction in English (Figure 1). [sent-373, score-0.59]
84 In other cases, the non-projective linearization is just an alternative to other grammatical, but maybe less appropriate, realizations, such as the prefield-occupation in German (Figure 2). [sent-374, score-0.395]
85 It clearly emerges from this figure that the range of improvements obtainable from lifting is closely tied to the general 936 linearization quality, and also to word order properties of the languages. [sent-377, score-0.946]
86 Thus, the range of sentences affected by the lifting is clearly largest for the V2languages. [sent-378, score-0.52]
87 We also evaluated our linearizer on the data of 2011 Shared Task on Surface Realisation, which is based on the English CoNLL 2009 data (like our previous evaluations) but excludes information on morphological realization. [sent-391, score-0.237]
88 This shows that the improvements we obtain from the lifting carry over to more complex generation tasks which include morphological realization. [sent-401, score-0.553]
89 In particular, we wanted to check whether the lifting-based linearizer produces more natural word orders for sentences that had a non-projective tree in the corpus, and maybe less natural word orders on originally projective sentences. [sent-405, score-0.678]
90 We asked four annotators to judge 60 sentence pairs comparing the lifting-based against the nonlifted linearizer using the toolkit by Kow and Belz (2012). [sent-407, score-0.265]
91 The items are subdivided into 30 originally projective and 30 originally non-projective sentences. [sent-410, score-0.235]
92 However, on the subset of non-projective sentences, the lifted version is clearly preferred and has a higher average fluency and preference strength. [sent-446, score-0.437]
93 The percentage of zero preference items is much higher on the subset of projective sentences. [sent-447, score-0.281]
94 Moreover, the average fluency of the zero preference items is remarkably higher on the projective sentences than on the nonprojective subset. [sent-448, score-0.364]
95 We attribute the low fluency score on the non-projective zero preference items to cases where the linearizer did not get a correct lifting or could not linearize the lifting correctly such that the lifted and the non-lifted version were not appropriate. [sent-450, score-1.788]
96 On the other hand, incorrect liftings on projective sentences do not necessarily seem to result in deprecated linearizations, which leads to the high percentage of zero preferences with a good average fluency on this subset. [sent-451, score-0.396]
97 Our approach deals with nonprojectivity by lifting edges in an unordered input tree which can then be linearized by a standard projective linearization algorithm. [sent-453, score-1.427]
98 We obtain significant improvements for the lifting-based linearization on English, German, Dutch, Danish, Czech and Hungarian, and show that lifting has the largest impact on the V2-languages. [sent-454, score-0.915]
99 In a human evaluation carried out on German we also show that human judges clearly prefer liftingbased linearizations on originally non-projective sentences, and, on the other hand, that incorrect liftings do not necessarily result in bad realizations of the sentence. [sent-455, score-0.313]
100 Tree linearization in English: improving language model based approaches. [sent-560, score-0.395]
wordName wordTfidf (topN-words)
[('lifting', 0.52), ('linearization', 0.395), ('lifted', 0.325), ('linearizer', 0.237), ('beam', 0.171), ('liftings', 0.153), ('projective', 0.143), ('edges', 0.134), ('hungarian', 0.12), ('german', 0.109), ('bohnet', 0.103), ('tree', 0.092), ('orders', 0.088), ('edge', 0.087), ('realisation', 0.087), ('unordered', 0.087), ('agendah', 0.084), ('nonprojective', 0.077), ('dutch', 0.077), ('linearizations', 0.072), ('exactlift', 0.07), ('node', 0.069), ('agenda', 0.066), ('belz', 0.065), ('federal', 0.065), ('dependency', 0.06), ('preference', 0.059), ('visited', 0.059), ('haji', 0.059), ('surface', 0.058), ('nodes', 0.058), ('extraposition', 0.056), ('lifts', 0.056), ('projectivization', 0.056), ('lift', 0.054), ('filippova', 0.054), ('guo', 0.054), ('fluency', 0.053), ('ni', 0.051), ('dh', 0.05), ('nj', 0.048), ('classifier', 0.048), ('percentage', 0.047), ('danish', 0.046), ('depth', 0.045), ('precedence', 0.043), ('topological', 0.043), ('langkilde', 0.042), ('linearize', 0.042), ('linearizers', 0.042), ('periphery', 0.042), ('seeker', 0.042), ('slider', 0.042), ('tscore', 0.042), ('realization', 0.04), ('gamon', 0.04), ('nlg', 0.04), ('positives', 0.038), ('orderings', 0.038), ('trees', 0.038), ('czech', 0.037), ('shared', 0.036), ('kahane', 0.036), ('foreach', 0.036), ('grandchildren', 0.036), ('ringger', 0.036), ('traverses', 0.036), ('kow', 0.035), ('decoding', 0.035), ('shortest', 0.035), ('generation', 0.033), ('xg', 0.032), ('items', 0.032), ('order', 0.031), ('judges', 0.031), ('domains', 0.03), ('pseudocode', 0.03), ('grandparent', 0.03), ('crossing', 0.03), ('restrictive', 0.03), ('traverse', 0.03), ('originally', 0.03), ('try', 0.03), ('head', 0.029), ('afechats', 0.028), ('bibeam', 0.028), ('clones', 0.028), ('crosses', 0.028), ('duchier', 0.028), ('extractkbest', 0.028), ('fronting', 0.028), ('kathol', 0.028), ('linearized', 0.028), ('nonlifted', 0.028), ('nonprojectivity', 0.028), ('reattached', 0.028), ('vincze', 0.028), ('wanner', 0.028), ('subtree', 0.027), ('prefer', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999887 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
Author: Bernd Bohnet ; Anders Bjorkelund ; Jonas Kuhn ; Wolfgang Seeker ; Sina Zarriess
Abstract: We propose a technique to generate nonprojective word orders in an efficient statistical linearization system. Our approach predicts liftings of edges in an unordered syntactic tree by means of a classifier, and uses a projective algorithm for tree linearization. We obtain statistically significant improvements on six typologically different languages: English, German, Dutch, Danish, Hungarian, and Czech.
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
3 0.12840876 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
Author: Emily Pitler ; Sampath Kannan ; Mitchell Marcus
Abstract: We introduce gap inheritance, a new structural property on trees, which provides a way to quantify the degree to which intervals of descendants can be nested. Based on this property, two new classes of trees are derived that provide a closer approximation to the set of plausible natural language dependency trees than some alternative classes of trees: unlike projective trees, a word can have descendants in more than one interval; unlike spanning trees, these intervals cannot be nested in arbitrary ways. The 1-Inherit class of trees has exactly the same empirical coverage of natural language sentences as the class of mildly nonprojective trees, yet the optimal scoring tree can be found in an order of magnitude less time. Gap-minding trees (the second class) have the property that all edges into an interval of descendants come from the same node, and thus an algorithm which uses only single in- tervals can produce trees in which a node has descendants in multiple intervals.
4 0.10495252 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
Author: Michael White ; Rajakrishnan Rajkumar
Abstract: Comprehension and corpus studies have found that the tendency to minimize dependency length has a strong influence on constituent ordering choices. In this paper, we investigate dependency length minimization in the context of discriminative realization ranking, focusing on its potential to eliminate egregious ordering errors as well as better match the distributional characteristics of sentence orderings in news text. We find that with a stateof-the-art, comprehensive realization ranking model, dependency length minimization yields statistically significant improvements in BLEU scores and significantly reduces the number of heavy/light ordering errors. Through distributional analyses, we also show that with simpler ranking models, dependency length minimization can go overboard, too often sacrificing canonical word order to shorten dependencies, while richer models manage to better counterbalance the dependency length minimization preference against (sometimes) competing canonical word order preferences.
5 0.086243063 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
Author: Hao Zhang ; Ryan McDonald
Abstract: State-of-the-art graph-based parsers use features over higher-order dependencies that rely on decoding algorithms that are slow and difficult to generalize. On the other hand, transition-based dependency parsers can easily utilize such features without increasing the linear complexity of the shift-reduce system beyond a constant. In this paper, we attempt to address this imbalance for graph-based parsing by generalizing the Eisner (1996) algorithm to handle arbitrary features over higherorder dependencies. The generalization is at the cost of asymptotic efficiency. To account for this, cube pruning for decoding is utilized (Chiang, 2007). For the first time, label tuple and structural features such as valencies can be scored efficiently with third-order features in a graph-based parser. Our parser achieves the state-of-art unlabeled accuracy of 93.06% and labeled accuracy of 91.86% on the standard test set for English, at a faster speed than a reimplementation ofthe third-ordermodel of Koo et al. (2010).
6 0.077703163 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
7 0.076580457 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
8 0.069647528 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
9 0.068238705 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
10 0.065743558 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
11 0.061909679 55 emnlp-2012-Forest Reranking through Subtree Ranking
12 0.057611324 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
13 0.05518797 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
14 0.05337853 94 emnlp-2012-Multiple Aspect Summarization Using Integer Linear Programming
15 0.051288139 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
16 0.047816813 64 emnlp-2012-Improved Parsing and POS Tagging Using Inter-Sentence Consistency Constraints
17 0.04764897 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
18 0.045263529 2 emnlp-2012-A Beam-Search Decoder for Grammatical Error Correction
19 0.043316551 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
20 0.04304282 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
topicId topicWeight
[(0, 0.171), (1, -0.109), (2, 0.082), (3, -0.053), (4, 0.04), (5, -0.004), (6, -0.012), (7, 0.03), (8, 0.002), (9, 0.212), (10, 0.092), (11, 0.016), (12, 0.052), (13, 0.054), (14, 0.153), (15, -0.048), (16, 0.036), (17, -0.046), (18, 0.13), (19, 0.136), (20, -0.086), (21, -0.128), (22, 0.048), (23, 0.04), (24, 0.002), (25, 0.015), (26, 0.038), (27, 0.124), (28, -0.04), (29, -0.086), (30, 0.004), (31, 0.006), (32, -0.053), (33, 0.004), (34, -0.088), (35, -0.062), (36, -0.045), (37, 0.088), (38, 0.076), (39, 0.184), (40, 0.132), (41, -0.148), (42, -0.089), (43, -0.241), (44, 0.015), (45, -0.166), (46, 0.014), (47, -0.049), (48, -0.205), (49, 0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.93369222 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
Author: Bernd Bohnet ; Anders Bjorkelund ; Jonas Kuhn ; Wolfgang Seeker ; Sina Zarriess
Abstract: We propose a technique to generate nonprojective word orders in an efficient statistical linearization system. Our approach predicts liftings of edges in an unordered syntactic tree by means of a classifier, and uses a projective algorithm for tree linearization. We obtain statistically significant improvements on six typologically different languages: English, German, Dutch, Danish, Hungarian, and Czech.
2 0.64075953 88 emnlp-2012-Minimal Dependency Length in Realization Ranking
Author: Michael White ; Rajakrishnan Rajkumar
Abstract: Comprehension and corpus studies have found that the tendency to minimize dependency length has a strong influence on constituent ordering choices. In this paper, we investigate dependency length minimization in the context of discriminative realization ranking, focusing on its potential to eliminate egregious ordering errors as well as better match the distributional characteristics of sentence orderings in news text. We find that with a stateof-the-art, comprehensive realization ranking model, dependency length minimization yields statistically significant improvements in BLEU scores and significantly reduces the number of heavy/light ordering errors. Through distributional analyses, we also show that with simpler ranking models, dependency length minimization can go overboard, too often sacrificing canonical word order to shorten dependencies, while richer models manage to better counterbalance the dependency length minimization preference against (sometimes) competing canonical word order preferences.
3 0.5995959 37 emnlp-2012-Dynamic Programming for Higher Order Parsing of Gap-Minding Trees
Author: Emily Pitler ; Sampath Kannan ; Mitchell Marcus
Abstract: We introduce gap inheritance, a new structural property on trees, which provides a way to quantify the degree to which intervals of descendants can be nested. Based on this property, two new classes of trees are derived that provide a closer approximation to the set of plausible natural language dependency trees than some alternative classes of trees: unlike projective trees, a word can have descendants in more than one interval; unlike spanning trees, these intervals cannot be nested in arbitrary ways. The 1-Inherit class of trees has exactly the same empirical coverage of natural language sentences as the class of mildly nonprojective trees, yet the optimal scoring tree can be found in an order of magnitude less time. Gap-minding trees (the second class) have the property that all edges into an interval of descendants come from the same node, and thus an algorithm which uses only single in- tervals can produce trees in which a node has descendants in multiple intervals.
4 0.3779102 104 emnlp-2012-Parse, Price and Cut-Delayed Column and Row Generation for Graph Based Parsers
Author: Sebastian Riedel ; David Smith ; Andrew McCallum
Abstract: Graph-based dependency parsers suffer from the sheer number of higher order edges they need to (a) score and (b) consider during optimization. Here we show that when working with LP relaxations, large fractions of these edges can be pruned before they are fully scored—without any loss of optimality guarantees and, hence, accuracy. This is achieved by iteratively parsing with a subset of higherorder edges, adding higher-order edges that may improve the score of the current solution, and adding higher-order edges that are implied by the current best first order edges. This amounts to delayed column and row generation in the LP relaxation and is guaranteed to provide the optimal LP solution. For second order grandparent models, our method considers, or scores, no more than 6–13% of the second order edges of the full model. This yields up to an eightfold parsing speedup, while pro- viding the same empirical accuracy and certificates of optimality as working with the full LP relaxation. We also provide a tighter LP formulation for grandparent models that leads to a smaller integrality gap and higher speed.
Author: Bernd Bohnet ; Joakim Nivre
Abstract: Most current dependency parsers presuppose that input words have been morphologically disambiguated using a part-of-speech tagger before parsing begins. We present a transitionbased system for joint part-of-speech tagging and labeled dependency parsing with nonprojective trees. Experimental evaluation on Chinese, Czech, English and German shows consistent improvements in both tagging and parsing accuracy when compared to a pipeline system, which lead to improved state-of-theart results for all languages.
6 0.3656401 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
7 0.33489001 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
8 0.29688403 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
9 0.29220033 57 emnlp-2012-Generalized Higher-Order Dependency Parsing with Cube Pruning
10 0.28139716 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
11 0.26070315 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
12 0.25532287 35 emnlp-2012-Document-Wide Decoding for Phrase-Based Statistical Machine Translation
13 0.24369249 27 emnlp-2012-Characterizing Stylistic Elements in Syntactic Structure
14 0.23400903 2 emnlp-2012-A Beam-Search Decoder for Grammatical Error Correction
15 0.21776398 119 emnlp-2012-Spectral Dependency Parsing with Latent Variables
16 0.20514856 55 emnlp-2012-Forest Reranking through Subtree Ranking
17 0.18952501 131 emnlp-2012-Unified Dependency Parsing of Chinese Morphological and Syntactic Structures
18 0.18834564 16 emnlp-2012-Aligning Predicates across Monolingual Comparable Texts using Graph-based Clustering
19 0.18757951 105 emnlp-2012-Parser Showdown at the Wall Street Corral: An Empirical Investigation of Error Types in Parser Output
20 0.18457139 53 emnlp-2012-First Order vs. Higher Order Modification in Distributional Semantics
topicId topicWeight
[(2, 0.022), (11, 0.029), (16, 0.051), (25, 0.01), (34, 0.066), (45, 0.014), (60, 0.058), (63, 0.064), (64, 0.016), (65, 0.019), (74, 0.039), (76, 0.057), (79, 0.013), (80, 0.02), (81, 0.369), (86, 0.028), (95, 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.7455532 59 emnlp-2012-Generating Non-Projective Word Order in Statistical Linearization
Author: Bernd Bohnet ; Anders Bjorkelund ; Jonas Kuhn ; Wolfgang Seeker ; Sina Zarriess
Abstract: We propose a technique to generate nonprojective word orders in an efficient statistical linearization system. Our approach predicts liftings of edges in an unordered syntactic tree by means of a classifier, and uses a projective algorithm for tree linearization. We obtain statistically significant improvements on six typologically different languages: English, German, Dutch, Danish, Hungarian, and Czech.
2 0.72727025 96 emnlp-2012-Name Phylogeny: A Generative Model of String Variation
Author: Nicholas Andrews ; Jason Eisner ; Mark Dredze
Abstract: Many linguistic and textual processes involve transduction of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is to organize the collection. Our generative model explains similarities among the strings by supposing that some strings in the collection were not generated ab initio, but were instead derived by transduction from other, “similar” strings in the collection. Our variational EM learning algorithm alternately reestimates this phylogeny and the transducer parameters. The final learned transducer can quickly link any test name into the final phylogeny, thereby locating variants of the test name. We find that our method can effectively find name variants in a corpus of web strings used to referto persons in Wikipedia, improving over standard untrained distances such as Jaro-Winkler and Levenshtein distance.
3 0.3678481 82 emnlp-2012-Left-to-Right Tree-to-String Decoding with Prediction
Author: Yang Feng ; Yang Liu ; Qun Liu ; Trevor Cohn
Abstract: Decoding algorithms for syntax based machine translation suffer from high computational complexity, a consequence of intersecting a language model with a context free grammar. Left-to-right decoding, which generates the target string in order, can improve decoding efficiency by simplifying the language model evaluation. This paper presents a novel left to right decoding algorithm for tree-to-string translation, using a bottom-up parsing strategy and dynamic future cost estimation for each partial translation. Our method outperforms previously published tree-to-string decoders, including a competing left-to-right method.
4 0.34746724 33 emnlp-2012-Discovering Diverse and Salient Threads in Document Collections
Author: Jennifer Gillenwater ; Alex Kulesza ; Ben Taskar
Abstract: We propose a novel probabilistic technique for modeling and extracting salient structure from large document collections. As in clustering and topic modeling, our goal is to provide an organizing perspective into otherwise overwhelming amounts of information. We are particularly interested in revealing and exploiting relationships between documents. To this end, we focus on extracting diverse sets of threads—singlylinked, coherent chains of important documents. To illustrate, we extract research threads from citation graphs and construct timelines from news articles. Our method is highly scalable, running on a corpus of over 30 million words in about four minutes, more than 75 times faster than a dynamic topic model. Finally, the results from our model more closely resemble human news summaries according to several metrics and are also preferred by human judges.
5 0.33805925 124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction
Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky
Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.
6 0.33519095 136 emnlp-2012-Weakly Supervised Training of Semantic Parsers
7 0.33410865 127 emnlp-2012-Transforming Trees to Improve Syntactic Convergence
8 0.33316398 20 emnlp-2012-Answering Opinion Questions on Products by Exploiting Hierarchical Organization of Consumer Reviews
9 0.32611996 14 emnlp-2012-A Weakly Supervised Model for Sentence-Level Semantic Orientation Analysis with Multiple Experts
10 0.32589504 109 emnlp-2012-Re-training Monolingual Parser Bilingually for Syntactic SMT
11 0.32519379 23 emnlp-2012-Besting the Quiz Master: Crowdsourcing Incremental Classification Games
12 0.32513463 123 emnlp-2012-Syntactic Transfer Using a Bilingual Lexicon
13 0.32511121 89 emnlp-2012-Mixed Membership Markov Models for Unsupervised Conversation Modeling
14 0.32360721 71 emnlp-2012-Joint Entity and Event Coreference Resolution across Documents
15 0.32327691 51 emnlp-2012-Extracting Opinion Expressions with semi-Markov Conditional Random Fields
16 0.32119423 46 emnlp-2012-Exploiting Reducibility in Unsupervised Dependency Parsing
17 0.32103452 42 emnlp-2012-Entropy-based Pruning for Phrase-based Machine Translation
18 0.32065016 66 emnlp-2012-Improving Transition-Based Dependency Parsing with Buffer Transitions
19 0.32040137 5 emnlp-2012-A Discriminative Model for Query Spelling Correction with Latent Structural SVM
20 0.32028043 77 emnlp-2012-Learning Constraints for Consistent Timeline Extraction