acl acl2013 acl2013-94 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky
Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.
Reference: text
sentIndex sentText sentNum sentScore
1 In other words, coordination is a pending problem in dependency analysis of natural languages. [sent-6, score-0.359]
2 This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. [sent-7, score-0.211]
3 We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. [sent-8, score-0.223]
4 In addition, empirical observations on convertibility between selected styles of representations are shown too. [sent-9, score-0.279]
5 One of the reasons is the increased availability of dependency treebanks, be they results of genuine dependency annotation projects or converted automatically from previously existing phrase-structure treebanks. [sent-11, score-0.296]
6 Worse, dependency representation is at a loss when it comes to representing paratactic linguistic phenomena such as coordination, whose nature is symmetric (two or more conjuncts play the same role), as opposed to the head-modifier asymmetry of dependencies. [sent-16, score-0.598]
7 The dominating solution in treebank design is to introduce artificial rules for the encoding of coordination structures within dependency trees using the same means that express dependencies, i. [sent-18, score-0.513]
8 Obviously, any tree-shaped representation of a coordination structure (CS) must be perceived only as a “shortcut” since relations present in coordination structures form an undirected cycle, as illustrated already by Tesni `ere (1959). [sent-21, score-0.422]
9 For example, if a noun is modified by two coordinated adjectives, there is a (symmetric) coordination relation between the two conjuncts and two (asymmetric) dependency relations between the conjuncts and the noun. [sent-22, score-1.159]
10 However, as there is no obvious linguistic intuition telling us which tree-shaped CS encoding is better and since the degree of freedom has several dimensions, one can find a number of distinct conventions introduced in particular dependency treebanks. [sent-23, score-0.197]
11 2 The present study does not try to decide which coordination style is the best from the parsing point of view. [sent-27, score-0.343]
12 Section 4 lists treebanks whose CS conventions we studied. [sent-38, score-0.272]
13 In the simplest case of a CS, a coordinating conjunction joins two (usually syntactically and semantically compatible) words or phrases called conjuncts. [sent-42, score-0.249]
14 Proper formal representation of CSs is further complicated by the following facts: • • • • • • • CSs with more than two conjuncts (multiconjunct CSs) eex tishta ann tdw are frequent. [sent-44, score-0.4]
15 Besides “private” modifiers of individual conjuncts, rthiveartee are mdiofiderifsier osf s ihnadrivedid by all conjuncts, such as in “Mary came and cried”. [sent-45, score-0.213]
16 Shared modifiers may appear alongside with private modifiers of particular conjuncts. [sent-46, score-0.485]
17 Shared modifiers can be coordinated, too: “big eadnd m cheap apples a bned oranges ”te. [sent-47, score-0.213]
18 The coordinating conjunction may be a multTiwheor cdo expression ( o“najsu nwcetlilo as”). [sent-54, score-0.249]
19 • • • • • Deficient CSs with a single conjunct exist. [sent-55, score-0.241]
20 Feo srt example, a conjunct can be elided while its arguments remain in the sentence, such as in the following traditional example: “I gave the books to Mary and the records to Sue. [sent-59, score-0.241]
21 In his solution, conjuncts are connected by vertical edges directly to the head and by horizontal edges to the conjunction (which constitutes a cycle in every CS). [sent-69, score-0.59]
22 518 • SS = Stanford parser style:5 the first conjunct SisS th =e S thaenadfo radn dpa trhsee remaining conjuncts (as well as conjunctions) are attached under it. [sent-74, score-0.742]
23 PS is advocated by Sˇt eˇp a´nek (2006) who claims that it can represent shared modifiers using a single additional binary attribute, while MS would require a more complex co-indexing attribute. [sent-78, score-0.32]
24 An argumentation of Tratz and Hovy (201 1) follows a similar direction: We would like to change our [MS] handling of coordinating conjunctions to treat the coordinating conjunction as the head [PS] because this has fewer ambiguities than [MS]. [sent-79, score-0.54]
25 We conclude that the influence of the choice of coordination style is a well-known problem in dependency syntax. [sent-82, score-0.461]
26 Nevertheless, published works usually focus only on a narrow ad-hoc selection of few coordination styles, without giving any systematic perspective. [sent-83, score-0.211]
27 The primitive format used for CoNLL shared tasks is widely used in dependency parsing, but its weaknesses have already been pointed out (cf. [sent-86, score-0.255]
28 Moreover, particular treebanks vary in their contents even more than in their format, i. [sent-88, score-0.223]
29 3 Variations in representing coordination structures Our analysis of variations in representing coordination structures is based on observations from a set of dependency treebanks for 26 languages. [sent-91, score-0.856]
30 7 5We use the already established MS-PS-SS distinction to facilitate literature overview; as shown in Section 3, the space of possible coordination styles is much richer. [sent-92, score-0.406]
31 , 2008), Basque: Basque Dependency Treebank (larger version than CoNLL 2007 generously proIn accordance with the usual conventions, we assume that each sentence is represented by one dependency tree, in which each node corresponds to one token (word or punctuation mark). [sent-96, score-0.261]
32 8 Further, we expect that the set of possible variations can be structured along several dimensions, each of which corresponds to a certain simple characteristic (such as choosing the leftmost conjunct as the CS head, or attaching shared modifiers below the nearest conjunct). [sent-101, score-0.631]
33 Even if it does not make sense to create the full Cartesian product of all dimensions because some values cannot be combined, it allows to explore the space of possible CS styles systematically. [sent-102, score-0.222]
34 1 Topological variations We distinguish the following dimensions of topological variations of CS styles (see Figure 1): Family configuration of conjuncts. [sent-104, score-0.343]
35 A third op- most12 11Note that for CSs with just two conjuncts, fM and fS may look exactly the same (depending on the attachment of conjunctions and punctuation as described below). [sent-136, score-0.209]
36 For the experiments in Section 5, we choose the head which is closer to the parent of the whole CS, with the motivation to make the edge between CS head and its parent shorter, which may improve parser training. [sent-141, score-0.208]
37 Shared modifiers may appear before the first conjunct or after the last one. [sent-143, score-0.454]
38 Therefore, it seems reasonable to attach shared modifiers either to the CS head (sH), or to the nearest (i. [sent-144, score-0.389]
39 In the Moscow family, conjunctions may be either part of the chain of conjuncts (cB), or they may be put outside of the chain and attached to the previous (cP) or following (cF) conjunct. [sent-148, score-0.649]
40 In the Stanford family, conjunctions may be either attached to the CS head (and therefore between conjuncts) (cB), or they may be attached to the previous (cP) or the following (cF) conjunct. [sent-149, score-0.365]
41 The cB option in both Moscow and Stanford families, treats conjunctions in the same way as conjuncts (with respect to topology only). [sent-150, score-0.532]
42 However, in most treebanks it is treated differently, so we consider it as well. [sent-155, score-0.223]
43 The values pP, pF and pB are analogous to cP, cF and cB except that punctuation may be also attached to the conjunction in case of pP and pF (otherwise, a comma before the conjunction would be non-projectively attached to the member following the conjunction). [sent-156, score-0.523]
44 The three established styles mentioned in Section 2 can be defined in terms of the newly introduced abbreviations: PS = fPhRsHcHpB, MS = fMhLsNcBp? [sent-157, score-0.195]
45 To fully capture CSs, we need more than one label, because there are several aspects involved (see the initial assump13The question marks indicate that the original Mel’ cˇuk and Stanford parser styles ignore punctuation. [sent-163, score-0.195]
46 tions in Section 3): We need to identify the coordinating conjunction (its POS tag might not be enough), conjuncts, shared modifiers, and punctuation that separates conjuncts. [sent-164, score-0.435]
47 The dependency relation of the whole CS to its parent is represented by the label of the conjunction, while the conjuncts are marked with a special label for conjuncts (e. [sent-169, score-1.049]
48 The CS is represented by a coordinating conjunction (or punctuation if there is no conjunction) with a special label (e. [sent-173, score-0.361]
49 Subsequently, each conjunct has its own label that reflects the dependency relation towards the parent ofthe whole CS, therefore, conjuncts of the same CS can have different labels, e. [sent-176, score-0.857]
50 shared modifiers are attached to the head (coordinating conjunction). [sent-181, score-0.49]
51 Each child of the head has to belong to one of three sets: conjuncts, shared modifiers, and punctuation or additional conjunctions. [sent-182, score-0.255]
52 In the Stanford and Moscow families, one of the conjuncts is the head. [sent-185, score-0.4]
53 In practice, it is never labeled as a conjunct explicitly, because the fact that it is a conjunct can be deduced from the presence of conjuncts among its children. [sent-186, score-0.882]
54 Usually, the other conjuncts are labeled as conjuncts; conjunctions and punctuation also have a special label. [sent-187, score-0.573]
55 Alternatively (as found in the Turkish treebank, dL), all conjuncts in the Moscow chain have their own dependency labels and the fact that they are conjuncts follows from the COORD INAT ION labels of the conjunction and punctuation nodes between them. [sent-189, score-1.175]
56 To represent shared modifiers in the Stan521 ford and Moscow families, an additional label is needed again to distinguish between private and shared modifiers since they cannot be distinguished topologically. [sent-190, score-0.732]
57 “shared” versus “private”) because it also has to indicate which conjuncts the shared modifier belongs to. [sent-193, score-0.537]
58 14 We use the following binary flag codes for capturing which CS participants are distinguished in the annotation: m01 = shared modifiers annotated; m 10 = conjuncts annotated; m1 1 = both annotated; m00 = neither annotated. [sent-194, score-0.756]
59 2, are based on the normalized shapes of the treebanks as contained in the HamleDT 1. [sent-196, score-0.223]
60 15 Some of the treebanks were downloaded individually from the web, but most of them came from previously published collections for dependency parsing campaigns: six languages from CoNLL-2006 (Buchholz and Marsi, 2006), seven languages from CoNLL-2007 (Nivre et al. [sent-199, score-0.401]
61 Obviously, there is a certain risk that the CS-related information contained in the source treebanks was slightly biased by the properties of the CoNLL format upon conversion. [sent-202, score-0.223]
62 In addition, many of the treebanks were natively dependency-based (cf. [sent-203, score-0.223]
63 Again, 14This is not needed in Prague family where shared modifiers are attached to the conjunction provided that each shared modifier is shared by conjuncts that form a full subtree together with their coordinating conjunctions; no exceptions were found during the annotation process of the PDT. [sent-206, score-1.433]
64 15A subset of the treebanks whose license terms permit redistribution is available directly at http://ufal. [sent-207, score-0.223]
65 Danish Romanian hunde, kat e ogrot te r c âini p i s i c i ş i şobolani Hungarian kutyák , macskák és patkányok Figure 2: Annotation styles of a few treebanks do not fit well into the multidimensional space defined in Section 3. [sent-211, score-0.418]
66 there is some risk that the CS-related information contained in treebanks resulting from such conversions is slightly different from what was intended in the very primary annotation. [sent-213, score-0.223]
67 Estonian or Chinese) which are not included in our study, despite of the fact that constituency treebanks do exist for them. [sent-216, score-0.223]
68 We also know about several more dependency treebanks that we have not processed yet. [sent-218, score-0.371]
69 Table 1 shows 26 languages whose treebanks we have studied from the viewpoint of their CS styles. [sent-219, score-0.223]
70 The reader can return to Figure 1 to see the basic statistics on the “popularity” of individual design decisions among the developers of dependency treebanks or constituency treebank converters. [sent-221, score-0.525]
71 CS styles of most treebanks are easily classifiable using the codes introduced in Section 3, plus a few additional codes: CS,16 • p0 = punctuation was removed from the treebpa0n =k. [sent-222, score-0.533]
72 16All non-Prague family treebanks are marked sN and m00 or m10, (i. [sent-223, score-0.342]
73 shared modifiers not marked in the original annotation, but attached to the head conjunct) because we found no counterexamples (modifiers attached to a conjunct, but not the nearest one). [sent-225, score-0.591]
74 /CJCsS /SMCsS /NCeSs[t%ed]URATS I10 = ICON 2010; SM = shared modifier; CJ = conjunct; Nested CS = portion of CSs participating in nested CSs (both as the inner and outer CS); RT UAS = unlabeled attachment score of the roundtrip experiment described in Section 5. [sent-230, score-0.289]
75 • fM* = Persian treebank uses a mix of fM and ffMS:* f S= Pfoerrs ciaonor trdeineabtaionnk ousfe vse arb ms axn odf f fMM o atnhderwise. [sent-232, score-0.201]
76 Figure 2 shows three other anomalies: • fS* = Danish treebank employs a mixture of ffSS *a =nd D DfMan, wshh etrreee tbhaen klas etm conjunct ims iaxtttaucrehe odf indirectly via the conjunction. [sent-233, score-0.395]
77 • fP* = Romanian treebank omits punctuation tfoPk*en =s R aonmd multi-conjunct ocmoiotrsd pinuanticotnusa get split. [sent-234, score-0.233]
78 • fT = Hungarian Szeged treebank uses “fTTesni e`re family” disconnected graphs for CSs where conjuncts (and conjunction and punctuation) are attached directly to the parent of CS, and so the other style dimensions are not applicable (hX, cX, pX). [sent-235, score-0.94]
79 – 5 Empirical Observations on Convertibility of Coordination Styles The various styles cannot represent the CS-related information to the same extent. [sent-236, score-0.195]
80 17 The dL style (which is most easily applicable to the Prague family) can represent coordination of different dependency relations. [sent-238, score-0.461]
81 This is again not possible in the other styles without adding e. [sent-239, score-0.195]
82 We can see that the Prague family has a greater expressive power than the other two families: it can represent complex CSs using just one additional binary label, distinguishing between shared modifiers and conjuncts. [sent-242, score-0.439]
83 A similar additional label is needed in the other styles to distinguish between shared and private modifiers. [sent-243, score-0.394]
84 523 there is no way of representing shared modifiers in the Moscow family without an additional attribute, converting a CS with shared modifiers from Prague to Moscow family makes the modifiers private. [sent-249, score-1.091]
85 When converting back, one can use certain heuristics to handle the most obvious cases, but sometimes the modifiers will stay private (very often, the nature of a modifier depends on context or is debatable even for humans, e. [sent-250, score-0.302]
86 Obviously, the individual CSs cannot be transformed independently because of coordination nesting. [sent-256, score-0.243]
87 For instance, when transforming a nested coordination from the Prague style to the Moscow style (e. [sent-257, score-0.464]
88 to fMhL), the leftmost conjunct in the inner (lower) coordination must climb up to become the head of the inner CS, but then it must climb up once again to become the head of the outer (upper) CS too. [sent-259, score-0.681]
89 The following four types of CS participants are distinguished: coordinating conjunctions, conjuncts, shared modifiers, and conjuncts. [sent-263, score-0.235]
90 Coordinating conjunctions can be usually identified with the help of dependency labels and POS tags. [sent-270, score-0.242]
91 Punctuation separating conjuncts can be detected with high accuracy using simple rules. [sent-271, score-0.4]
92 If shared modifiers are not annotated (code m00 or m10), one can imagine rule-based heuristics or special classifiers trained to distinguish shared modifiers. [sent-272, score-0.427]
93 So far, we limited ourselves only to conversions from/to the style of the HamleDT treebank collection, which contains all the treebanks under our study already converted into a common scheme. [sent-277, score-0.479]
94 19 We selected nine styles (3 families times 3 head choices) and transformed all the HamleDT scheme treebanks to these nine styles and back, which we call a roundtrip. [sent-279, score-0.771]
95 Second, we also encountered inconsistencies in the original treebanks (which we were not trying to fix in HamleDT for now). [sent-285, score-0.223]
96 20Table 1 shows that Latin and Ancient Greek treebanks have on average more than 6 CSs per 100 tokens, more than 2 conjuncts per CS, and Latin has also the highest number of shared modifiers per CS. [sent-290, score-0.943]
97 For each value of each dimension in Figure 1, we found at least one treebank where the value is used; even so, several treebanks take their own unique path that cannot be clearly classified under the taxonomy (the taxonomy could indeed be extended, for the price of being less clearly arranged). [sent-294, score-0.377]
98 We discussed the convertibility between the various styles and implemented a universal tool that transforms between any two styles of the taxonomy. [sent-295, score-0.446]
99 This is important because it opens the door to easily switching coordination styles for parsing experiments, phrase-to-dependency conversion etc. [sent-297, score-0.436]
100 Hybrid combination of constituency and dependency trees into an ensemble dependency parser. [sent-362, score-0.296]
wordName wordTfidf (topN-words)
[('conjuncts', 0.4), ('css', 0.344), ('conjunct', 0.241), ('treebanks', 0.223), ('modifiers', 0.213), ('coordination', 0.211), ('cs', 0.204), ('styles', 0.195), ('moscow', 0.172), ('treebank', 0.154), ('dependency', 0.148), ('coordinating', 0.128), ('conjunction', 0.121), ('family', 0.119), ('hamledt', 0.112), ('shared', 0.107), ('prague', 0.107), ('style', 0.102), ('attached', 0.101), ('abokrtsk', 0.095), ('conjunctions', 0.094), ('punctuation', 0.079), ('head', 0.069), ('haji', 0.069), ('zden', 0.069), ('roundtrip', 0.069), ('nek', 0.062), ('private', 0.059), ('greek', 0.059), ('jan', 0.058), ('families', 0.057), ('convertibility', 0.056), ('pdt', 0.056), ('tei', 0.056), ('persian', 0.055), ('topological', 0.051), ('paratactic', 0.05), ('conventions', 0.049), ('nested', 0.049), ('latin', 0.048), ('ms', 0.047), ('mel', 0.047), ('fm', 0.044), ('converters', 0.042), ('husain', 0.042), ('zeman', 0.041), ('cb', 0.041), ('danish', 0.041), ('romanian', 0.039), ('hl', 0.039), ('stanford', 0.039), ('popel', 0.038), ('topology', 0.038), ('hypotactic', 0.037), ('ramasamy', 0.037), ('szeged', 0.037), ('ancient', 0.037), ('petr', 0.037), ('attachment', 0.036), ('codes', 0.036), ('leftmost', 0.035), ('parent', 0.035), ('variations', 0.035), ('lombardo', 0.034), ('tesni', 0.034), ('node', 0.034), ('conll', 0.034), ('label', 0.033), ('bamman', 0.032), ('mare', 0.032), ('transformed', 0.032), ('turkish', 0.032), ('mary', 0.031), ('slovene', 0.031), ('hr', 0.031), ('modifier', 0.03), ('parsing', 0.03), ('fs', 0.03), ('basque', 0.028), ('aduriz', 0.028), ('alpino', 0.028), ('ancora', 0.028), ('atalay', 0.028), ('beek', 0.028), ('bultreebank', 0.028), ('civit', 0.028), ('coordinations', 0.028), ('floresta', 0.028), ('lesmo', 0.028), ('synaf', 0.028), ('taul', 0.028), ('tica', 0.028), ('zeroski', 0.028), ('inner', 0.028), ('observations', 0.028), ('tiger', 0.027), ('arabic', 0.027), ('chain', 0.027), ('dimensions', 0.027), ('obviously', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 94 acl-2013-Coordination Structures in Dependency Treebanks
Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky
Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.
2 0.2297578 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
Author: Ryan McDonald ; Joakim Nivre ; Yvonne Quirmbach-Brundage ; Yoav Goldberg ; Dipanjan Das ; Kuzman Ganchev ; Keith Hall ; Slav Petrov ; Hao Zhang ; Oscar Tackstrom ; Claudia Bedini ; Nuria Bertomeu Castello ; Jungmee Lee
Abstract: We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing.1
3 0.15092263 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu
Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.
4 0.12948813 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao
Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.
5 0.11040635 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
Author: David Marecek ; Milan Straka
Abstract: Even though the quality of unsupervised dependency parsers grows, they often fail in recognition of very basic dependencies. In this paper, we exploit a prior knowledge of STOP-probabilities (whether a given word has any children in a given direction), which is obtained from a large raw corpus using the reducibility principle. By incorporating this knowledge into Dependency Model with Valence, we managed to considerably outperform the state-of-theart results in terms of average attachment score over 20 treebanks from CoNLL 2006 and 2007 shared tasks.
6 0.10234646 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
7 0.095585749 288 acl-2013-Punctuation Prediction with Transition-based Parsing
8 0.095197141 372 acl-2013-Using CCG categories to improve Hindi dependency parsing
9 0.09453138 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
10 0.093317971 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
11 0.089258827 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
12 0.086394623 270 acl-2013-ParGramBank: The ParGram Parallel Treebank
13 0.085446626 335 acl-2013-Survey on parsing three dependency representations for English
14 0.083878629 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
15 0.081701905 80 acl-2013-Chinese Parsing Exploiting Characters
16 0.076745786 57 acl-2013-Arguments and Modifiers from the Learner's Perspective
17 0.071357921 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
18 0.070526876 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
19 0.070248194 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
20 0.069721602 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
topicId topicWeight
[(0, 0.151), (1, -0.066), (2, -0.157), (3, 0.0), (4, -0.108), (5, -0.032), (6, 0.022), (7, 0.006), (8, 0.094), (9, -0.117), (10, 0.002), (11, 0.013), (12, -0.028), (13, 0.089), (14, -0.062), (15, -0.005), (16, -0.056), (17, -0.018), (18, -0.029), (19, 0.028), (20, -0.016), (21, 0.022), (22, -0.04), (23, -0.017), (24, -0.028), (25, 0.048), (26, -0.042), (27, -0.012), (28, -0.0), (29, -0.046), (30, -0.029), (31, -0.035), (32, -0.037), (33, 0.03), (34, 0.083), (35, -0.04), (36, 0.001), (37, -0.106), (38, 0.096), (39, -0.078), (40, 0.083), (41, -0.057), (42, -0.057), (43, -0.123), (44, -0.012), (45, -0.178), (46, 0.032), (47, 0.019), (48, -0.1), (49, 0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.96005338 94 acl-2013-Coordination Structures in Dependency Treebanks
Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky
Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.
2 0.81261539 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
Author: Ryan McDonald ; Joakim Nivre ; Yvonne Quirmbach-Brundage ; Yoav Goldberg ; Dipanjan Das ; Kuzman Ganchev ; Keith Hall ; Slav Petrov ; Hao Zhang ; Oscar Tackstrom ; Claudia Bedini ; Nuria Bertomeu Castello ; Jungmee Lee
Abstract: We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing.1
3 0.75764185 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao
Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.
4 0.70988512 335 acl-2013-Survey on parsing three dependency representations for English
Author: Angelina Ivanova ; Stephan Oepen ; Lilja vrelid
Abstract: In this paper we focus on practical issues of data representation for dependency parsing. We carry out an experimental comparison of (a) three syntactic dependency schemes; (b) three data-driven dependency parsers; and (c) the influence of two different approaches to lexical category disambiguation (aka tagging) prior to parsing. Comparing parsing accuracies in various setups, we study the interactions of these three aspects and analyze which configurations are easier to learn for a dependency parser.
5 0.69620764 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
Author: Reut Tsarfaty
Abstract: Stanford Dependencies (SD) provide a functional characterization of the grammatical relations in syntactic parse-trees. The SD representation is useful for parser evaluation, for downstream applications, and, ultimately, for natural language understanding, however, the design of SD focuses on structurally-marked relations and under-represents morphosyntactic realization patterns observed in Morphologically Rich Languages (MRLs). We present a novel extension of SD, called Unified-SD (U-SD), which unifies the annotation of structurally- and morphologically-marked relations via an inheritance hierarchy. We create a new resource composed of U-SDannotated constituency and dependency treebanks for the MRL Modern Hebrew, and present two systems that can automatically predict U-SD annotations, for gold segmented input as well as raw texts, with high baseline accuracy.
6 0.69336057 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
7 0.68675727 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
8 0.67391515 270 acl-2013-ParGramBank: The ParGram Parallel Treebank
9 0.58952761 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
10 0.55029392 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
11 0.54328728 372 acl-2013-Using CCG categories to improve Hindi dependency parsing
12 0.53305089 362 acl-2013-Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers
13 0.51676786 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)
14 0.50601614 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
15 0.47947085 158 acl-2013-Feature-Based Selection of Dependency Paths in Ad Hoc Information Retrieval
16 0.46808326 280 acl-2013-Plurality, Negation, and Quantification:Towards Comprehensive Quantifier Scope Disambiguation
17 0.46710861 13 acl-2013-A New Syntactic Metric for Evaluation of Machine Translation
18 0.44314024 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
19 0.43861622 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
20 0.43825781 288 acl-2013-Punctuation Prediction with Transition-based Parsing
topicId topicWeight
[(0, 0.056), (6, 0.022), (11, 0.061), (15, 0.02), (24, 0.049), (26, 0.07), (35, 0.058), (36, 0.011), (40, 0.258), (42, 0.082), (48, 0.034), (61, 0.027), (67, 0.013), (70, 0.032), (88, 0.033), (90, 0.019), (95, 0.048)]
simIndex simValue paperId paperTitle
1 0.82657641 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
Author: Kenneth Heafield ; Ivan Pouzyrevsky ; Jonathan H. Clark ; Philipp Koehn
Abstract: We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed amount of RAM and variable amount of disk. Using one machine with 140 GB RAM for 2.8 days, we built an unpruned model on 126 billion tokens. Machine translation experiments with this model show improvement of 0.8 BLEU point over constrained systems for the 2013 Workshop on Machine Translation task in three language pairs. Our algorithm is also faster for small models: we estimated a model on 302 million tokens using 7.7% of the RAM and 14.0% of the wall time taken by SRILM. The code is open source as part of KenLM.
same-paper 2 0.82345217 94 acl-2013-Coordination Structures in Dependency Treebanks
Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky
Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.
3 0.79117417 163 acl-2013-From Natural Language Specifications to Program Input Parsers
Author: Tao Lei ; Fan Long ; Regina Barzilay ; Martin Rinard
Abstract: We present a method for automatically generating input parsers from English specifications of input file formats. We use a Bayesian generative model to capture relevant natural language phenomena and translate the English specification into a specification tree, which is then translated into a C++ input parser. We model the problem as a joint dependency parsing and semantic role labeling task. Our method is based on two sources of information: (1) the correlation between the text and the specification tree and (2) noisy supervision as determined by the success of the generated C++ parser in reading input examples. Our results show that our approach achieves 80.0% F-Score accu- , racy compared to an F-Score of 66.7% produced by a state-of-the-art semantic parser on a dataset of input format specifications from the ACM International Collegiate Programming Contest (which were written in English for humans with no intention of providing support for automated processing).1
4 0.75541008 260 acl-2013-Nonconvex Global Optimization for Latent-Variable Models
Author: Matthew R. Gormley ; Jason Eisner
Abstract: Many models in NLP involve latent variables, such as unknown parses, tags, or alignments. Finding the optimal model parameters is then usually a difficult nonconvex optimization problem. The usual practice is to settle for local optimization methods such as EM or gradient ascent. We explore how one might instead search for a global optimum in parameter space, using branch-and-bound. Our method would eventually find the global maximum (up to a user-specified ?) if run for long enough, but at any point can return a suboptimal solution together with an upper bound on the global maximum. As an illustrative case, we study a generative model for dependency parsing. We search for the maximum-likelihood model parameters and corpus parse, subject to posterior constraints. We show how to formulate this as a mixed integer quadratic programming problem with nonlinear constraints. We use the Reformulation Linearization Technique to produce convex relaxations during branch-and-bound. Although these techniques do not yet provide a practical solution to our instance of this NP-hard problem, they sometimes find better solutions than Viterbi EM with random restarts, in the same time.
5 0.67956161 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
Author: Yuki Arase ; Ming Zhou
Abstract: We propose a method for automatically detecting low-quality Web-text translated by statistical machine translation (SMT) systems. We focus on the phrase salad phenomenon that is observed in existing SMT results and propose a set of computationally inexpensive features to effectively detect such machine-translated sentences from a large-scale Web-mined text. Unlike previous approaches that require bilingual data, our method uses only monolingual text as input; therefore it is applicable for refining data produced by a variety of Web-mining activities. Evaluation results show that the proposed method achieves an accuracy of 95.8% for sentences and 80.6% for text in noisy Web pages.
6 0.6541211 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
7 0.57420671 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
8 0.5552386 225 acl-2013-Learning to Order Natural Language Texts
9 0.53984362 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
10 0.53527629 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
11 0.53164124 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
12 0.52990335 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
13 0.52987301 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
14 0.52955085 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
15 0.52945554 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
16 0.52891785 318 acl-2013-Sentiment Relevance
17 0.52639508 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
18 0.52547258 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
19 0.52454805 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
20 0.52307808 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching