acl acl2010 acl2010-99 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Terry Koo ; Michael Collins
Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. [sent-3, score-0.452]
2 Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. [sent-4, score-0.127]
3 We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93. [sent-5, score-0.164]
4 1 Introduction Dependency grammar has proven to be a very useful syntactic formalism, due in no small part to the development of efficient parsing algorithms (Eisner, 2000; McDonald et al. [sent-8, score-0.221]
5 These parsing algorithms share an important characteristic: they factor dependency trees into sets of parts that have limited interactions. [sent-12, score-0.531]
6 By exploiting the additional constraints arising from the factorization, maximizations or summations over the set of possible dependency trees can be performed efficiently and exactly. [sent-13, score-0.198]
7 A crucial limitation of factored parsing algorithms is that the associated parts are typically quite small, losing much of the contextual information within the dependency tree. [sent-14, score-0.566]
8 For the purposes of improving parsing performance, it is desirable to increase the size and variety of the parts used by the factorization. [sent-15, score-0.264]
9 1 At the same time, the need for more expressive factorizations 1For examples of how performance varies with the degree of the parser’s factorization see, e. [sent-16, score-0.389]
10 1 must be balanced against any resulting increase in the computational cost of the parsing algorithm. [sent-21, score-0.152]
11 Consequently, recent work in dependency parsing has been restricted to applications of secondorder parsers, the most powerful of which (Carreras, 2007) requires O(n4) time and O(n3) space, while being limited to second-order parts. [sent-22, score-0.391]
12 In this paper, we present new third-order parsing algorithms that increase both the size and variety of the parts participating in the factorization, while simultaneously maintaining computational requirements of O(n4) time and O(n3) space. [sent-23, score-0.333]
13 We evaluate our parsers on the Penn WSJ Treebank (Marcus et al. [sent-24, score-0.127]
14 2 The remainder of this paper is divided as follows: Sections 2 and 3 give background, Sections 4 and 5 describe our new parsing algorithms, Section 6 discusses related work, Section 7 presents our ex- perimental results, and Section 8 concludes. [sent-35, score-0.152]
15 2 Dependency parsing In dependency grammar, syntactic relationships are represented as head-modifier dependencies: directed arcs between a head, which is the more “essential” word in the relationship, and a modifier, which supplements the meaning of the head. [sent-36, score-0.35]
16 For example, Figure 1 contains a dependency between the verb “report” (the head) and its object “sales” (the modifier). [sent-37, score-0.198]
17 A complete analysis of a sentence is given by a dependency tree: a set of dependencies that forms a rooted, directed tree spanning the words of the sentence. [sent-38, score-0.418]
18 Every dependency tree is rooted at a special “*” token, allowing the 2http . [sent-39, score-0.249]
19 c e2 A0s1so0c Aiastsio cnia fotiro Cn fo mrp Cuotmatpiounta tlio Lninaglu Li sntigcusi,s ptaicgses 1–1 , * Insiders : / / groups must report purchases and sales immediately Figure 1: An example dependency structure. [sent-44, score-0.301]
20 A common strategy, and one which forms the focus of this paper, is to factor each dependency tree into small parts, which can be scored in isolation. [sent-49, score-0.249]
21 Factored parsing can be formalized as follows: SCORE(x,y) = Xp∈ySCOREPART(x,p) That is, we treat the dependency tree y as a set of parts p, each of which makes a separate contribution to the score of y. [sent-50, score-0.513]
22 For certain factorizations, efficient parsing algorithms exist for solving Eq. [sent-51, score-0.221]
23 We define the order of a part according to the number of dependencies it contains, with analogous terminology for factorizations and parsing algorithms. [sent-53, score-0.38]
24 In the remainder of this paper, we focus on factorizations utilizing the following parts: h m dependency h s m sibling g h m grandchild g h s m h t s m grand-sibling tri-sibling Specifically, Sections 4. [sent-54, score-1.019]
25 3 describe parsers that, respectively, factor trees into grandchild parts, grand-sibling parts, and a mixture of grand-sibling and tri-sibling parts. [sent-57, score-0.507]
26 3 Existing parsing algorithms Our new third-order dependency parsers build on ideas from existing parsing algorithms. [sent-58, score-0.698]
27 In this section, we provide background on two relevant parsers from previous work. [sent-59, score-0.127]
28 2 (a) = (b) = h e h m h m h r + + m e r+1 m Figure 2: The dynamic-programming structures and derivations of the Eisner (2000) algorithm. [sent-60, score-0.072]
29 Complete spans are depicted as triangles and incomplete spans as trapezoids. [sent-61, score-0.377]
30 1 First-order factorization The first type of parser we describe uses a “firstorder” factorization, which decomposes a dependency tree into its individual dependencies. [sent-64, score-0.58]
31 Eisner (2000) introduced a widely-used dynamicprogramming algorithm for first-order parsing; as it is the basis for many parsers, including our new algorithms, we summarize its design here. [sent-65, score-0.051]
32 The Eisner (2000) algorithm is based on two interrelated types of dynamic-programming structures: complete spans, which consist of a headword and its descendents on one side, and incomplete spans, which consist of a dependency and the region between the head and modifier. [sent-66, score-0.497]
33 Formally, we denote a complete span as Ch,e where h and e are the indices of the span’s headword and endpoint. [sent-67, score-0.352]
34 An incomplete span is denoted as Ih,m where h and m are the index of the head and modifier of a dependency. [sent-68, score-0.479]
35 Intuitively, a complete span represents a “half-constituent” headed by h, whereas an incomplete span is only a partial half-constituent, since the constituent can be extended by adding more modifiers to m. [sent-69, score-0.588]
36 Each type of span is created by recursively combining two smaller, adjacent spans; the constructions are specified graphically in Figure 2. [sent-70, score-0.174]
37 An incomplete span is constructed from a pair of complete spans, indicating the division of the range [h, m] into constituents headed by h and m. [sent-71, score-0.361]
38 A complete span is created by “completing” an incomplete span with the other half of m’s constituent. [sent-72, score-0.535]
39 The point of concatenation in each construction—m in Figure 2(a) or r in Figure 2(b)—is the split point, a free index that must be enumerated to find the optimal construction. [sent-73, score-0.114]
40 In order to parse a sentence x, it suffices to find optimal constructions for all complete and incomplete spans defined on x. [sent-74, score-0.311]
41 This can be (a) = (b) = (c) = h e h m h m h s s m s r + + + m e s m r+1 m Figure 3: The dynamic-programming structures and derivations of the second-order sibling parser; sibling spans are depicted as boxes. [sent-75, score-0.768]
42 Since each derivation is defined by two fixed indices (the boundaries of the span) and a third free index (the split point), the parsing algorithm requires O(n3) time and O(n2) space (Eisner, 1996; McAllester, 1999). [sent-78, score-0.35]
43 2 Second-order sibling factorization As remarked by Eisner (1996) and McDonald and Pereira (2006), it is possible to rearrange the dynamic-programming structures to conform to an improved factorization that decomposes each tree into sibling parts—pairs of dependencies with a shared head. [sent-80, score-1.198]
44 Specifically, a sibling part consists of a triple of indices (h, m, s) where (h, m) and (h, s) are dependencies, and where s and m are successive modifiers to the same side of h. [sent-81, score-0.463]
45 In order to parse this factorization, the secondorder parser introduces a third type of dynamicprogramming structure: sibling spans, which represent the region between successive modifiers of some head. [sent-82, score-0.603]
46 Formally, we denote a sibling span as Ss,m where s and m are a pair of modifiers involved in a sibling relationship. [sent-83, score-0.799]
47 Modified versions of sibling spans will play an important role in the new parsing algorithms described in Section 4. [sent-84, score-0.631]
48 Figure 3 provides a graphical specification of the second-order parsing algorithm. [sent-85, score-0.152]
49 Note that incomplete spans are constructed in a new way: the second-order parser combines a smaller incomplete span, representing the next-innermost dependency, with a sibling span that covers the region between the two modifiers. [sent-86, score-0.974]
50 Sibling parts (h, m, s) can thus be obtained from Figure 3(b). [sent-87, score-0.112]
51 Despite the use of second-order parts, each derivation is 3 (a)= (b) = g h e g h m (c) h e g g h m = g h h m r g + h m e h r+1 m + + h m e (d) h m g =+ h r g h r+1 m Figure 4: The dynamic-programming structures and derivations of Model 0. [sent-88, score-0.072]
52 still defined by a span and split point, so the parser requires O(n3) time and O(n2) space. [sent-91, score-0.305]
53 4 New third-order parsing algorithms In this section we describe our new third-order dependency parsing algorithms. [sent-92, score-0.571]
54 Our overall method is characterized by the augmentation of each span with a “grandparent” index: an index external to the span whose role will be made clear below. [sent-93, score-0.428]
55 This section presents three parsing algorithms based on this idea: Model 0, a second-order parser, and Models 1 and 2, which are third-order parsers. [sent-94, score-0.221]
56 1 Model 0: all grandchildren The first parser, Model 0, factors each dependency tree into a set of grandchild parts—pairs of dependencies connected head-to-tail. [sent-96, score-0.778]
57 Specifically, a grandchild part is a triple of indices (g, h, m) where (g, h) and (h, m) are In order to parse this factorization, we augment both complete and incomplete spans with grandparent indices; for brevity, we refer to these augmented structures as g-spans. [sent-97, score-0.87]
58 Formally, we denote a complete g-span as Chg,e, where Ch,e is a normal complete span and g is an index lying outside the range [h, e], with the implication that (g, h) is a dependency. [sent-98, score-0.408]
59 Figure 4 depicts complete and incomplete gspans and provides a graphical specification of the dependencies. [sent-100, score-0.187]
60 3 3The Carreras (2007) parser also uses grandchild parts but only in restricted cases; see Section 6 for details. [sent-101, score-0.589]
61 Iig,j = maxi≤r 1 successive modifiers in O(nm+1) time (ibid. [sent-116, score-0.093]
62 To our knowledge, however, Models 1 and 2 are the first third-order parsing algorithms capable of modeling grandchild parts. [sent-119, score-0.601]
63 In our experiments, we find that grandchild interactions make important contributions to parsing performance (see Table 3). [sent-120, score-0.587]
64 Carreras (2007) presents a second-order parser that can score both sibling and grandchild parts, with complexities of O(n4) time and O(n3) space. [sent-121, score-0.763]
65 An important limitation of the parser’s factorization is that it only defines grandchild parts for outermost grandchildren: (g, h, m) is scored only when m is the outermost modifier of h in some direction. [sent-122, score-0.91]
66 Note that Models 1 and 2 have the same complexity as Carreras (2007), but strictly greater expressiveness: for each sibling or grandchild part used in the Carreras (2007) factorization, Model 1 defines an enclosing grand-sibling, while Model 2 defines an enclosing tri-sibling or grand-sibling. [sent-123, score-0.83]
67 The factored parsing approach we focus on is sometimes referred to as “graph-based” parsing; a popular alternative is “transition-based” parsing, in which trees are constructed by making a series of incremental decisions (Yamada and Matsumoto, 2003; Attardi, 2006; Nivre et al. [sent-124, score-0.187]
68 Transition-based parsers do not impose factorizations, so they can define arbitrary features on the tree as it is being built. [sent-126, score-0.178]
69 As a result, however, they rely on greedy or approximate search algorithms to solve Eq. [sent-127, score-0.069]
70 7 Parsing experiments In order to evaluate the effectiveness of our parsers in practice, we apply them to the Penn WSJ Treebank (Marcus et al. [sent-129, score-0.127]
71 Accuracy is 6For English, we extracted dependencies using Joakim Nivre’s Penn2Malt tool with standard head rules (Yamada and Matsumoto, 2003); for Czech, we “projectivized” the training data by finding best-match projective trees. [sent-133, score-0.159]
72 Following standard practice for higher-order dependency parsing (McDonald and Pereira, 2006; Carreras, 2007), Models 1 and 2 evaluate not only the relevant third-order parts, but also the lower-order parts that are implicit in their third-order factorizations. [sent-138, score-0.462]
73 The lower-order feature mappings fdep, fsib, and fgch are based on feature sets from previous work (McDonald et al. [sent-141, score-0.181]
74 For example, fdep contains lexicalized “in-between” features that depend on the head and modifier words as well as a word lying in between the two; in contrast, previous work has generally defined in-between features for POS tags only. [sent-143, score-0.249]
75 second-order mappings fsib and fgch define lexical trigram features, while previous work has generally used POS trigrams only. [sent-146, score-0.292]
76 Our third-order feature mappings fgsib and ftsib consist of four types of features. [sent-147, score-0.333]
77 First, we define 4-gram features that characterize the four relevant indices using words and POS tags; examples include POS 4-grams and mixed 4-grams with one word and three POS tags. [sent-148, score-0.084]
78 Third, we define backed-off features that track bigram and trigram interactions which are absent in the lower-order feature mappings: for example, ftsib(x, h, m, s, t) contains features predicated on the trigram (m, s, t) and the bigram (m, t), neither of which exist in any lower-order part. [sent-150, score-0.125]
79 , “report purchases and sales” in Figure 1), we define coordination features for certain grand-sibling parts. [sent-153, score-0.051]
80 2 Averaged perceptron training There are a wide variety of parameter estimation methods for structured linear models, such as log-linear models (Lafferty et al. [sent-159, score-0.072]
81 We chose the averaged structured perceptron (Freund and Schapire, 1999; Collins, 2002) as it combines highly competitive performance with fast training times, typically converging in 5–10 iterations. [sent-162, score-0.072]
82 We train each parser for 10 iterations and select pa9For Czech, the PDT provides automatic tags; for English, we used MXPOST (Ratnaparkhi, 1996) to tag validation and test data, with 10-fold cross-validation on the training set. [sent-163, score-0.19]
83 952m Table 1: Effect of the marginal-probability beam on English parsing. [sent-176, score-0.07]
84 For each beam value, parsers were trained on the English training set and evaluated on the English validation set; the same beam value was applied to both training and validation data. [sent-177, score-0.453]
85 Pass = %dependencies surviving the beam in training data, Orac = maximum achievable UAS on validation data, Acc1/Acc2 = UAS of Models 1/2 on validation data, and Time1/Time2 = minutes per perceptron training iteration for Models 1/2, averaged over all 10 iterations. [sent-178, score-0.328]
86 rameters from the iteration that achieves the best score on the validation set. [sent-182, score-0.093]
87 (2008) and eliminate unlikely dependencies using a form of coarse-to-fine pruning (Charniak and Johnson, 2005; Petrov and Klein, 2007). [sent-185, score-0.073]
88 Our parsers are Pth(ehn, mm|odxi)fie odf to ignore any dependency (h, m) whose marginal probability is below 0. [sent-187, score-0.325]
89 (2008) and 11For English, we generate marginals using a projective parser (Baker, 1979; Eisner, 2000); for Czech, we generate marginals using a non-projective parser (Smith and Smith, 2007; McDonald and Satta, 2007; Koo et al. [sent-195, score-0.307]
90 12Model 0 was not tested as its factorization is a strict subset of the factorization of Model 1. [sent-199, score-0.468]
91 All three of the “†” models are based on versions of the Carreras (2007) parser, so modifying rtshioesnes m ofet thhoed Csa tro- work with our new third-order parsing algorithms would be an interesting topic for future research. [sent-207, score-0.221]
92 For example, Models 1 and 2 obtain results comparable to the semi-supervised parsers of Koo et al. [sent-208, score-0.127]
93 5 Ablation studies In order to better understand the contributions of the various feature types, we ran additional ablation experiments; the results are listed in Table 3, in addition to the scores of Model 0 and the emulated Carreras (2007) parser (see Section 4. [sent-211, score-0.131]
94 Interestingly, grandchild interactions appear to provide important information: for example, when Model 2 is used without grandchild-based features (“Model 2, no-G” in Table 3), its accuracy suffers noticeably. [sent-213, score-0.435]
95 8 Conclusion We have presented new parsing algorithms that are capable ofefficiently parsing third-order factorizations, including both grandchild and sibling interactions. [sent-215, score-1.039]
96 The term no-G indicates a parser that was trained and tested with the grandchild-based feature mappings fgch and fgsib deactivated; note that “Model 2, no-G” emulates the third-order sibling parser proposed by McDonald and Pereira (2006). [sent-221, score-0.813]
97 There are several possibilities for further research involving our third-order parsing algorithms. [sent-222, score-0.152]
98 A second area for future work lies in applications of dependency parsing. [sent-225, score-0.198]
99 Finally, in the hopes that others in the NLP community may find our parsers useful, we provide a free distribution of our implementation. [sent-228, score-0.127]
100 Recognition and parsing of context-free languages in time n3. [sent-427, score-0.152]
wordName wordTfidf (topN-words)
[('grandchild', 0.38), ('sibling', 0.286), ('carreras', 0.237), ('factorization', 0.234), ('dependency', 0.198), ('koo', 0.181), ('span', 0.174), ('mcdonald', 0.163), ('factorizations', 0.155), ('fgsib', 0.152), ('parsing', 0.152), ('incomplete', 0.129), ('parsers', 0.127), ('spans', 0.124), ('parts', 0.112), ('haji', 0.109), ('terry', 0.107), ('fgch', 0.101), ('ftsib', 0.101), ('parser', 0.097), ('validation', 0.093), ('xavier', 0.089), ('indices', 0.084), ('index', 0.08), ('mappings', 0.08), ('elide', 0.076), ('fsib', 0.076), ('grandchildren', 0.076), ('eisner', 0.075), ('pos', 0.074), ('dependencies', 0.073), ('perceptron', 0.072), ('pereira', 0.071), ('ryan', 0.07), ('beam', 0.07), ('algorithms', 0.069), ('collins', 0.069), ('fdep', 0.067), ('grandparent', 0.061), ('brevity', 0.06), ('complete', 0.058), ('interactions', 0.055), ('modifier', 0.055), ('uas', 0.054), ('modifiers', 0.053), ('czech', 0.052), ('sales', 0.052), ('tree', 0.051), ('cocke', 0.051), ('deactivated', 0.051), ('dynamicprogramming', 0.051), ('exponentiated', 0.051), ('orac', 0.051), ('parserengcze', 0.051), ('purchases', 0.051), ('tags', 0.048), ('suzuki', 0.048), ('taskar', 0.048), ('nivre', 0.047), ('projective', 0.045), ('amir', 0.044), ('globerson', 0.044), ('outermost', 0.044), ('treebank', 0.044), ('association', 0.042), ('yamada', 0.042), ('defines', 0.041), ('enclosing', 0.041), ('secondorder', 0.041), ('pdt', 0.041), ('jarmila', 0.041), ('head', 0.041), ('successive', 0.04), ('fernando', 0.038), ('derivations', 0.038), ('spanning', 0.038), ('lying', 0.038), ('prague', 0.038), ('michael', 0.038), ('lafferty', 0.037), ('attachment', 0.037), ('headword', 0.036), ('petr', 0.036), ('joakim', 0.036), ('region', 0.035), ('factored', 0.035), ('conll', 0.035), ('brief', 0.035), ('trigram', 0.035), ('ablation', 0.034), ('marginals', 0.034), ('eva', 0.034), ('jan', 0.034), ('split', 0.034), ('structures', 0.034), ('prp', 0.033), ('substructures', 0.033), ('ov', 0.033), ('discriminative', 0.033), ('pages', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 99 acl-2010-Efficient Third-Order Dependency Parsers
Author: Terry Koo ; Michael Collins
Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
2 0.21647836 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
Author: Wenbin Jiang ; Qun Liu
Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
3 0.17971431 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
4 0.16278379 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
Author: Martin Haulrich
Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.
5 0.13571997 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
Author: Fei Huang ; Alexander Yates
Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.
6 0.13319482 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -
7 0.1234087 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures
8 0.11538769 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
9 0.11474477 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing
10 0.10919936 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
11 0.1031573 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
12 0.10269238 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
13 0.10075638 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese
14 0.097384922 69 acl-2010-Constituency to Dependency Translation with Forests
15 0.095640294 133 acl-2010-Hierarchical Search for Word Alignment
16 0.088441439 214 acl-2010-Sparsity in Dependency Grammar Induction
17 0.086408645 130 acl-2010-Hard Constraints for Grammatical Function Labelling
18 0.082765341 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future
19 0.082617342 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
20 0.082009457 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
topicId topicWeight
[(0, -0.222), (1, -0.049), (2, 0.122), (3, 0.032), (4, -0.095), (5, -0.081), (6, 0.082), (7, -0.0), (8, -0.018), (9, 0.23), (10, -0.182), (11, -0.001), (12, -0.055), (13, 0.163), (14, 0.069), (15, -0.041), (16, -0.062), (17, 0.033), (18, 0.014), (19, -0.047), (20, 0.041), (21, 0.011), (22, 0.022), (23, -0.051), (24, -0.021), (25, 0.044), (26, 0.034), (27, 0.058), (28, -0.086), (29, -0.014), (30, 0.049), (31, -0.027), (32, 0.027), (33, 0.045), (34, 0.001), (35, 0.0), (36, -0.057), (37, -0.053), (38, -0.009), (39, -0.007), (40, -0.069), (41, 0.075), (42, -0.002), (43, -0.017), (44, 0.02), (45, 0.035), (46, 0.043), (47, 0.03), (48, -0.088), (49, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.9680292 99 acl-2010-Efficient Third-Order Dependency Parsers
Author: Terry Koo ; Michael Collins
Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
2 0.85108215 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification
Author: Martin Haulrich
Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.
3 0.84581751 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
4 0.84432584 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
Author: Wenbin Jiang ; Qun Liu
Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.
5 0.7617926 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -
Author: Kotaro Kitagawa ; Kumiko Tanaka-Ishii
Abstract: Nivre’s method was improved by enhancing deterministic dependency parsing through application of a tree-based model. The model considers all words necessary for selection of parsing actions by including words in the form of trees. It chooses the most probable head candidate from among the trees and uses this candidate to select a parsing action. In an evaluation experiment using the Penn Treebank (WSJ section), the proposed model achieved higher accuracy than did previous deterministic models. Although the proposed model’s worst-case time complexity is O(n2), the experimental results demonstrated an average pars- ing time not much slower than O(n).
6 0.75946701 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing
7 0.70074928 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures
8 0.66985697 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation
9 0.60866332 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
10 0.58785349 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
11 0.58558685 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing
12 0.54652905 114 acl-2010-Faster Parsing by Supertagger Adaptation
13 0.53712398 130 acl-2010-Hard Constraints for Grammatical Function Labelling
14 0.53307194 252 acl-2010-Using Parse Features for Preposition Selection and Error Detection
15 0.52850872 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
16 0.4916496 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
17 0.47987071 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
18 0.47690842 263 acl-2010-Word Representations: A Simple and General Method for Semi-Supervised Learning
19 0.47291619 214 acl-2010-Sparsity in Dependency Grammar Induction
20 0.46298283 69 acl-2010-Constituency to Dependency Translation with Forests
topicId topicWeight
[(13, 0.012), (14, 0.389), (16, 0.012), (25, 0.065), (33, 0.01), (59, 0.073), (73, 0.034), (76, 0.031), (78, 0.032), (83, 0.06), (84, 0.024), (98, 0.164)]
simIndex simValue paperId paperTitle
1 0.98023117 239 acl-2010-Towards Relational POMDPs for Adaptive Dialogue Management
Author: Pierre Lison
Abstract: Open-ended spoken interactions are typically characterised by both structural complexity and high levels of uncertainty, making dialogue management in such settings a particularly challenging problem. Traditional approaches have focused on providing theoretical accounts for either the uncertainty or the complexity of spoken dialogue, but rarely considered the two issues simultaneously. This paper describes ongoing work on a new approach to dialogue management which attempts to fill this gap. We represent the interaction as a Partially Observable Markov Decision Process (POMDP) over a rich state space incorporating both dialogue, user, and environment models. The tractability of the resulting POMDP can be preserved using a mechanism for dynamically constraining the action space based on prior knowledge over locally relevant dialogue structures. These constraints are encoded in a small set of general rules expressed as a Markov Logic network. The first-order expressivity of Markov Logic enables us to leverage the rich relational structure of the problem and efficiently abstract over large regions ofthe state and action spaces.
2 0.83605886 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
Author: Andreas Maletti
Abstract: A characterization of the expressive power of synchronous tree-adjoining grammars (STAGs) in terms of tree transducers (or equivalently, synchronous tree substitution grammars) is developed. Essentially, a STAG corresponds to an extended tree transducer that uses explicit substitution in both the input and output. This characterization allows the easy integration of STAG into toolkits for extended tree transducers. Moreover, the applicability of the characterization to several representational and algorithmic problems is demonstrated.
same-paper 3 0.83229232 99 acl-2010-Efficient Third-Order Dependency Parsers
Author: Terry Koo ; Michael Collins
Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.
4 0.73379147 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD
Author: Weiwei Guo ; Mona Diab
Abstract: Word Sense Disambiguation remains one ofthe most complex problems facing computational linguists to date. In this paper we present a system that combines evidence from a monolingual WSD system together with that from a multilingual WSD system to yield state of the art performance on standard All-Words data sets. The monolingual system is based on a modification ofthe graph based state ofthe art algorithm In-Degree. The multilingual system is an improvement over an AllWords unsupervised approach, SALAAM. SALAAM exploits multilingual evidence as a means of disambiguation. In this paper, we present modifications to both of the original approaches and then their combination. We finally report the highest results obtained to date on the SENSEVAL 2 standard data set using an unsupervised method, we achieve an overall F measure of 64.58 using a voting scheme.
5 0.59112847 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
Author: Liang Huang ; Kenji Sagae
Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.
6 0.57725924 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands
7 0.5726766 214 acl-2010-Sparsity in Dependency Grammar Induction
8 0.55405933 35 acl-2010-Automated Planning for Situated Natural Language Generation
9 0.54456329 168 acl-2010-Learning to Follow Navigational Directions
10 0.52921802 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
11 0.52263939 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
12 0.51825833 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing
13 0.5181874 187 acl-2010-Optimising Information Presentation for Spoken Dialogue Systems
14 0.51681638 162 acl-2010-Learning Common Grammar from Multilingual Corpus
15 0.51357955 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
16 0.51249629 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
17 0.51213712 167 acl-2010-Learning to Adapt to Unknown Users: Referring Expression Generation in Spoken Dialogue Systems
18 0.50888115 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
19 0.50774443 71 acl-2010-Convolution Kernel over Packed Parse Forest
20 0.50664604 212 acl-2010-Simple Semi-Supervised Training of Part-Of-Speech Taggers