acl acl2013 acl2013-204 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu
Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.
Reference: text
sentIndex sentText sentNum sentScore
1 Iterative Transformation of Annotation Guidelines for Constituency Parsing Xiang Li 1, 2 Wenbin Jiang 1 Yajuan L u¨ 1 Qun Liu 1, 3 1Key Laboratory of Intelligent Information Processing Institute of Computing Technology, Chinese Academy of Sciences {lixiang , j iangwenbin , lvya j uan} @ i . [sent-1, score-0.048]
2 Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only. [sent-7, score-0.189]
3 On one hand, the amount of existing labeled data is not sufficient; on the other hand, however there exists multiple annotated data with incompatible annotation guidelines for the same NLP task. [sent-9, score-0.256]
4 An available treebank is a major resource for syntactic parsing. [sent-13, score-0.434]
5 Various treebanks have been constructed based on different annotation guidelines. [sent-15, score-0.444]
6 In addition to the most popular CTB, Tsinghua Chinese Treebank (TCT) (Zhou, 2004) is another real large-scale treebank for Chinese constituent parsing. [sent-16, score-0.527]
7 Unfortunately, these heterogeneous treebanks can not be directly merged together for training a parsing model. [sent-18, score-0.45]
8 Therefore, it is highly desirable to transform a treebank into another compatible with another annotation guideline. [sent-20, score-0.779]
9 In this paper, we focus on harmonizing heterogeneous treebanks to improve parsing performance. [sent-21, score-0.423]
10 We first propose an effective approach to automatic treebank transformation from one annotation guideline to another. [sent-22, score-1.231]
11 For convenience of reference, a treebank with our desired annotation guideline is named as target treebank, and a treebank with a differtn annotation guideline is named as source treebank. [sent-23, score-1.857]
12 It is used to relabel the raw sentences of target treebank, to acquire parallel training data with two heterogeneous annotation guidelines. [sent-26, score-0.585]
13 Then, an annotation transformer is trained on the parallel training data to model the annotation inconsistencies. [sent-27, score-0.808]
14 In the last step, a parser trained on target treebank is used to generate k-best parse trees with target annotation for source sentences. [sent-28, score-1.217]
15 Then the optimal parse trees are selected by the annotation transformer. [sent-29, score-0.456]
16 In this way, the source treebank is transformed to another with our desired annotation guideline. [sent-30, score-0.849]
17 Then we propose an optimization strategy of iterative training to further improve the transformation performance. [sent-31, score-0.614]
18 At each iteration, the annotation transformation of sourceto-target and target-to-source are both performed. [sent-32, score-0.595]
19 The transformed treebank is used to provide better annotation guideline for the parallel training data of next iteration. [sent-33, score-1.066]
20 As a result, the better parallel training data will bring an improved annotation transformer at next iteration. [sent-34, score-0.516]
21 We perform treebank transformation from TC591 Proce dingSsof oifa, th Beu 5l1gsarti Aan,An u aglu Mste 4e-ti9n2g 0 o1f3 t. [sent-35, score-0.773]
22 HHH np """bbb v , n ZZ n n 情报 专 家 dj vp ,,ll 认为 敌人 d v 将 投降 IPX X NP NN 情报 VPP ZZ NN VV 专家 认为 ? [sent-40, score-0.081]
23 HHH , NP VP ##cc NN AD VV 敌人 将 投降 Figure 1: Example heterogeneous trees with TCT (left) and CTB (rigth) annotation guidelines. [sent-45, score-0.438]
24 T to CTB, in order to obtain additional treebank to improve a parser. [sent-46, score-0.434]
25 Experiments on Chinese constituent parsing show that, the iterative training strategy outperforms the basic annotation transformation baseline. [sent-47, score-1.048]
26 With addidional transformed treebank, the improved parser achieves an F-measure of 0. [sent-48, score-0.226]
27 95% absolute improvement over the baseline parser trained on CTB only. [sent-49, score-0.185]
28 This parallel data is used to train a source-to-target tree transformer. [sent-53, score-0.117]
29 In transformation procedure, the source k-best parse trees are first generated by a parser trained on the target treebank. [sent-54, score-0.796]
30 Then the optimal source parse trees with target annotation are selected by the annotation transformer with the help of gold source parse trees. [sent-55, score-1.17]
31 By combining the target treebank with the transformed source treebank, it can improve parsing accuracy using a parser trained on the enlarged treebank. [sent-56, score-0.909]
32 Algorithm 1 shows the training procedure of treebank annotation transformation. [sent-57, score-0.768]
33 treebanks and treebankt denote the source and target treebank respectively. [sent-58, score-1.017]
34 treebankmn denotes m treebank re-labeled with n annotation guideline. [sent-61, score-0.69]
35 Function TRAIN invokes the Berkeley parser (Petrov et al. [sent-62, score-0.168]
36 , 2006; Petrov and Klein, 2007) to train the constituent parsing models. [sent-63, score-0.212]
37 Function TRANSFORMTRAIN invokes the perceptron algorithm (Collins, 2002) to train a discriminative annotation transformer. [sent-65, score-0.43]
38 Function TRANSFORM selects the optimal transformed parse trees with the target annotation. [sent-66, score-0.372]
39 In this paper, the averaged perceptron algorithm is used to train the treebank transformation model. [sent-69, score-0.862]
40 It is an online training algorithm and has been successfully used in many NLP tasks, such as parsing (Collins and Roark, 2004) and word segmentation (Zhang and Clark, 2007; Zhang and Clark, 2010). [sent-70, score-0.183]
41 In addition to the target features which closely follow Sun et al. [sent-71, score-0.07]
42 We design the following quasi-synchronous features to model the annotation inconsistencies. [sent-73, score-0.256]
43 • Bigram constituent relation For two consBeicgurtaivme fcuonndstaimtueennttal celoantsiotintuen Ftosr si aon cdo sj in the target parse tree, we find the minimum categories Ni and Nj of the spans of si and sj in the source parse tree respectively. [sent-74, score-0.606]
44 Here 592 Algorithm 1 Basic treebank annotation transformation. [sent-75, score-0.69]
45 If Ni is a sibling of Nj or each other is identical, we regard the relation between si and sj as a positive feature. [sent-78, score-0.062]
46 Consistent relation If the span of a target cCoonnstsiistuteenntt can a btei oanlso parsed as a c oofn ast titauregentt by the source parser, the combination of target rule and source category is used. [sent-79, score-0.254]
47 Inconsistent relation If the span of a target ocnonsissttietunetnt r eclaantinoont b Ief analysed as a constituent by the source parser, the combination of target rule and corresponding treelet in the source parse tree is used. [sent-80, score-0.496]
48 POS tag The combination of POS tags of same twaogrds T hine t cheo parallel dna otaf fi sP uOsSed t. [sent-81, score-0.072]
49 3 Iterative Training for Annotation Transformation Treebank annotation transformation relies on the parallel training data. [sent-83, score-0.694]
50 Consequently, the accuracy of source parser decides the accuracy of annotation transformer. [sent-84, score-0.437]
51 We propose an iterative training method to improve the transformation accuracy by iteratively optimizing the parallel parse trees. [sent-85, score-0.718]
52 At each iteration of training, the treebank transformation of source-to-target and target-to-source are both performed, and the transformed treebank provides more appropriate annotation for subsequent iteration. [sent-86, score-1.611]
53 In turn, the annotation transformer can be improved gradually along with optimization of the parallel parse trees until convergence. [sent-87, score-0.689]
54 Algorithm 2 shows the overall procedure of iterative training, which terminates when the performance of a parser trained on the target treebank and the transformed treebank converges. [sent-88, score-1.394]
55 1 Experimental Setup We conduct the experiments of treebank transformation from TCT to CTB. [sent-90, score-0.773]
56 0 is taken as the source treebank for training the annotation transformer. [sent-100, score-0.798]
57 The Berkeley parsing model is trained with 5 split-merge iterations. [sent-101, score-0.122]
58 And we run the Berkeley parser in 100-best mode and construct the 20-fold cross validation training as described in Charniak and Johnson (2005). [sent-102, score-0.175]
59 In this way, we acquire the parallel parse trees for training the annotation transformer. [sent-103, score-0.56]
60 u2438r271e(al) Table 1: The performance of treebank annotation transformation using iterative training. [sent-110, score-1.196]
61 8ng 1×8,04 Size of CTB training data Figure 2: Parsing accuracy with different amounts of CTB training data. [sent-114, score-0.102]
62 2 Basic Transformation We conduct experiments to evaluate the effect of the amount of target training data on transformation accuracy, and how much constituent parser- s can benefit from our approach. [sent-116, score-0.553]
63 An enhanced parser is trained on the CTB training data with the addition of transformed TCT by our annotation transformer. [sent-117, score-0.569]
64 As comparison, we build a baseline system (direct parsing) using the Berkeley parser only trained on the CTB training data. [sent-118, score-0.211]
65 69% absolute improvement on the CTB test data over the direct parsing baseline when the whole CTB training data is used for training. [sent-124, score-0.162]
66 We also can find that our approach further extends the advantage over the two baseline systems as the amount of CTB training data decreases in Figure 2. [sent-125, score-0.051]
67 The figure confirms our approach is effective for improving parser performance, specially for the scenario where the target treebank is scarce. [sent-126, score-0.655]
68 3 Iterative Transformation We use the iterative training method for annota- tion transformation. [sent-128, score-0.218]
69 The CTB developing set is used to determine the optimal training iteration. [sent-129, score-0.081]
70 After each iteration, we test the performance of a parser trained on the combined treebank. [sent-130, score-0.16]
71 84264 012345678910 Training iterations Figure 3: Learning curve of iterative transformation training. [sent-132, score-0.555]
72 ure 3 shows the performance curve with iteration ranging from 1 to 10. [sent-133, score-0.095]
73 The performance of basic annotation transformation is also included in the curve when iteration is 1. [sent-134, score-0.714]
74 The curve shows that the maximum performance is achieved at iteration 5. [sent-135, score-0.095]
75 Compared to the basic annotation transforma- tion, the iterative training strategy leads to a better parser with higher accuracy. [sent-136, score-0.654]
76 Table 1 reports that the final optimized parsing results on the CTB test set contributes a 0. [sent-137, score-0.086]
77 4 Related Work Treebank transformation is an effective strategy to reuse existing annotated data. [sent-139, score-0.398]
78 (1994) proposed an approach to transform a treebank into another with a different grammar using their matching metric based on the bracket information of original treebank. [sent-141, score-0.499]
79 (2009) proposed annotation adaptation in Chinese word segmentation, then, some work were done in parsing (Sun et al. [sent-143, score-0.374]
80 (2012) proposed an advanced annotation transformation in Chinese word segmentation, and we extended it to the more complicated treebank annotation transformation used for Chinese constituent parsing. [sent-147, score-1.717]
81 Other related work has been focused on semisupervised parsing methods which utilize labeled data to annotate unlabeled data, then use the additional annotated data to improve the original model (McClosky et al. [sent-148, score-0.086]
82 The selftraining methodology enlightens us on getting annotated treebank compatibal with another annotation guideline. [sent-151, score-0.69]
83 Our approach places extra emphasis on improving the transformation performance with the help of source annotation knowledge. [sent-152, score-0.652]
84 Apart from constituency-to-constituency treebank transformation, there also exists some research on dependency-to-constituency treebank transformation. [sent-153, score-0.868]
85 (1999) used transformed constituency treebank from Prague Dependency Treebank for constituent parsing on Czech. [sent-155, score-0.8]
86 Xia and Palmer (2001) explored different algorithms that transform dependency structure to phrase structure. [sent-156, score-0.092]
87 (2009) proposed to convert a dependency treebank to a constituency one by using a parser trained on a constituency treebank to generate k-best lists for sentences in the dependency treebank. [sent-158, score-1.252]
88 Optimal conversion results are selected from the k-best lists. [sent-159, score-0.048]
89 (2012) generated rich quasisynchronous grammar features to improve parsing performance. [sent-161, score-0.122]
90 5 Conclusion This paper propose an effective approach to transform one treebank into another with a different annotation guideline. [sent-164, score-0.782]
91 Experiments show that our approach can effectively utilize the heterogeneous treebanks and significantly improve the state-ofthe-art Chinese constituency parsing performance. [sent-165, score-0.484]
92 How to exploit more heterogeneous knowledge to improve the transformation performance is an interesting future issue. [sent-166, score-0.464]
93 Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. [sent-202, score-0.107]
94 Automatic adaptation of annotation standards: Chinese word segmentation and pos tagging: a case study. [sent-222, score-0.334]
95 Iterative annotation transformation with predict-self reestimation for chinese word segmentation. [sent-226, score-0.732]
96 Tregex and tsurgeon: tools for querying and manipulating tree data Proceedings of the fifth international conference on Language Resources and Evaluation, pages 2231–2234. [sent-239, score-0.036]
97 Reducing approximation and estimation errors for chinese lexical processing with heterogeneous annotations. [sent-290, score-0.262]
98 Discriminative parse reranking for chinese with homogeneous and heterogeneous annotations. [sent-297, score-0.424]
99 The penn chinese treebank: Phrase structure annotation of a large corpus. [sent-322, score-0.417]
100 A fast decoder for joint word segmentation and pos-tagging using a single discriminative model. [sent-347, score-0.087]
wordName wordTfidf (topN-words)
[('treebank', 0.434), ('transformation', 0.339), ('ctb', 0.282), ('treebankt', 0.268), ('annotation', 0.256), ('treebanks', 0.188), ('tct', 0.188), ('guideline', 0.175), ('iterative', 0.167), ('transformer', 0.161), ('chinese', 0.137), ('treebankst', 0.134), ('heterogeneous', 0.125), ('parser', 0.124), ('parse', 0.113), ('transformers', 0.107), ('transformed', 0.102), ('constituent', 0.093), ('mcclosky', 0.09), ('parsing', 0.086), ('constituency', 0.085), ('parsert', 0.081), ('transformtrain', 0.081), ('target', 0.07), ('transform', 0.065), ('collins', 0.061), ('charniak', 0.059), ('trees', 0.057), ('source', 0.057), ('perceptron', 0.056), ('hhh', 0.054), ('torm', 0.054), ('transformert', 0.054), ('treebankts', 0.054), ('sun', 0.053), ('training', 0.051), ('reranking', 0.049), ('curve', 0.049), ('parallel', 0.048), ('conversion', 0.048), ('tsurgeon', 0.047), ('zz', 0.047), ('jiang', 0.047), ('berkeley', 0.047), ('iteration', 0.046), ('segmentation', 0.046), ('invokes', 0.044), ('parsers', 0.044), ('discriminative', 0.041), ('petrov', 0.041), ('xia', 0.038), ('tsinghua', 0.038), ('transforms', 0.038), ('sj', 0.037), ('tree', 0.036), ('quasisynchronous', 0.036), ('qun', 0.036), ('trained', 0.036), ('zhu', 0.036), ('acquire', 0.035), ('tran', 0.035), ('train', 0.033), ('niu', 0.033), ('levy', 0.033), ('strategy', 0.032), ('dj', 0.032), ('adaptation', 0.032), ('vv', 0.031), ('dublin', 0.031), ('wenbin', 0.031), ('optimal', 0.03), ('sf', 0.029), ('johansson', 0.029), ('yajuan', 0.029), ('gradually', 0.029), ('effective', 0.027), ('dependency', 0.027), ('procedure', 0.027), ('wang', 0.027), ('acl', 0.026), ('daum', 0.026), ('nn', 0.026), ('absolute', 0.025), ('si', 0.025), ('vp', 0.025), ('optimization', 0.025), ('compatible', 0.024), ('emnlp', 0.024), ('basic', 0.024), ('np', 0.024), ('penn', 0.024), ('foth', 0.024), ('tregex', 0.024), ('hine', 0.024), ('iangwenbin', 0.024), ('lvya', 0.024), ('ftosr', 0.024), ('eval', 0.024), ('harmonizing', 0.024), ('ts', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000008 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu
Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.
2 0.21498004 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao
Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.
3 0.21002883 80 acl-2013-Chinese Parsing Exploiting Characters
Author: Meishan Zhang ; Yue Zhang ; Wanxiang Che ; Ting Liu
Abstract: Characters play an important role in the Chinese language, yet computational processing of Chinese has been dominated by word-based approaches, with leaves in syntax trees being words. We investigate Chinese parsing from the character-level, extending the notion of phrase-structure trees by annotating internal structures of words. We demonstrate the importance of character-level information to Chinese processing by building a joint segmentation, part-of-speech (POS) tagging and phrase-structure parsing system that integrates character-structure features. Our joint system significantly outperforms a state-of-the-art word-based baseline on the standard CTB5 test, and gives the best published results for Chinese parsing.
4 0.20665424 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein
Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.
5 0.20444795 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
Author: Ryan McDonald ; Joakim Nivre ; Yvonne Quirmbach-Brundage ; Yoav Goldberg ; Dipanjan Das ; Kuzman Ganchev ; Keith Hall ; Slav Petrov ; Hao Zhang ; Oscar Tackstrom ; Claudia Bedini ; Nuria Bertomeu Castello ; Jungmee Lee
Abstract: We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing.1
6 0.19958046 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
7 0.18517165 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
8 0.17873912 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
9 0.15092263 94 acl-2013-Coordination Structures in Dependency Treebanks
10 0.14047758 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
11 0.1363392 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
12 0.13412137 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
13 0.12609653 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
14 0.12311864 270 acl-2013-ParGramBank: The ParGram Parallel Treebank
15 0.11368058 357 acl-2013-Transfer Learning for Constituency-Based Grammars
16 0.11310245 372 acl-2013-Using CCG categories to improve Hindi dependency parsing
17 0.10938255 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
18 0.10163268 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
19 0.095967382 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation
20 0.093554795 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
topicId topicWeight
[(0, 0.199), (1, -0.169), (2, -0.273), (3, 0.059), (4, 0.003), (5, -0.044), (6, -0.017), (7, -0.004), (8, 0.097), (9, -0.055), (10, 0.008), (11, 0.044), (12, 0.042), (13, 0.095), (14, -0.024), (15, 0.005), (16, -0.004), (17, 0.016), (18, -0.039), (19, 0.022), (20, -0.002), (21, -0.043), (22, -0.109), (23, -0.025), (24, -0.058), (25, -0.007), (26, 0.028), (27, -0.041), (28, -0.013), (29, -0.0), (30, -0.001), (31, 0.004), (32, 0.036), (33, -0.046), (34, 0.029), (35, 0.062), (36, 0.037), (37, -0.089), (38, 0.184), (39, -0.12), (40, 0.057), (41, 0.024), (42, -0.037), (43, -0.081), (44, 0.027), (45, -0.127), (46, -0.006), (47, 0.012), (48, 0.019), (49, 0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.96752143 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu
Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.
2 0.76802009 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein
Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.
3 0.76730376 94 acl-2013-Coordination Structures in Dependency Treebanks
Author: Martin Popel ; David Marecek ; Jan StÄłpanek ; Daniel Zeman ; ZdÄłnÄłk Zabokrtsky
Abstract: Paratactic syntactic structures are notoriously difficult to represent in dependency formalisms. This has painful consequences such as high frequency of parsing errors related to coordination. In other words, coordination is a pending problem in dependency analysis of natural languages. This paper tries to shed some light on this area by bringing a systematizing view of various formal means developed for encoding coordination structures. We introduce a novel taxonomy of such approaches and apply it to treebanks across a typologically diverse range of 26 languages. In addition, empirical observations on convertibility between selected styles of representations are shown too.
4 0.76237923 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
Author: Guangyou Zhou ; Jun Zhao
Abstract: This paper is concerned with the problem of heterogeneous dependency parsing. In this paper, we present a novel joint inference scheme, which is able to leverage the consensus information between heterogeneous treebanks in the parsing phase. Different from stacked learning methods (Nivre and McDonald, 2008; Martins et al., 2008), which process the dependency parsing in a pipelined way (e.g., a second level uses the first level outputs), in our method, multiple dependency parsing models are coordinated to exchange consensus information. We conduct experiments on Chinese Dependency Treebank (CDT) and Penn Chinese Treebank (CTB), experimental results show that joint infer- ence can bring significant improvements to all state-of-the-art dependency parsers.
5 0.75445527 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
Author: Ryan McDonald ; Joakim Nivre ; Yvonne Quirmbach-Brundage ; Yoav Goldberg ; Dipanjan Das ; Kuzman Ganchev ; Keith Hall ; Slav Petrov ; Hao Zhang ; Oscar Tackstrom ; Claudia Bedini ; Nuria Bertomeu Castello ; Jungmee Lee
Abstract: We present a new collection of treebanks with homogeneous syntactic dependency annotation for six languages: German, English, Swedish, Spanish, French and Korean. To show the usefulness of such a resource, we present a case study of crosslingual transfer parsing with more reliable evaluation than has been possible before. This ‘universal’ treebank is made freely available in order to facilitate research on multilingual dependency parsing.1
6 0.69824111 335 acl-2013-Survey on parsing three dependency representations for English
7 0.65634799 80 acl-2013-Chinese Parsing Exploiting Characters
8 0.64386356 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
9 0.63775343 28 acl-2013-A Unified Morpho-Syntactic Scheme of Stanford Dependencies
10 0.61912912 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
11 0.61052763 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
12 0.59848666 270 acl-2013-ParGramBank: The ParGram Parallel Treebank
13 0.58359283 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
14 0.56593931 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
15 0.54876757 112 acl-2013-Dependency Parser Adaptation with Subtrees from Auto-Parsed Target Domain Data
16 0.54676986 357 acl-2013-Transfer Learning for Constituency-Based Grammars
17 0.53661144 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
18 0.51341558 367 acl-2013-Universal Conceptual Cognitive Annotation (UCCA)
19 0.5132885 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation
20 0.51263154 372 acl-2013-Using CCG categories to improve Hindi dependency parsing
topicId topicWeight
[(0, 0.046), (6, 0.066), (11, 0.069), (14, 0.022), (24, 0.035), (26, 0.055), (28, 0.013), (35, 0.038), (42, 0.101), (48, 0.022), (64, 0.015), (70, 0.044), (71, 0.242), (88, 0.034), (90, 0.022), (95, 0.092)]
simIndex simValue paperId paperTitle
1 0.8845219 177 acl-2013-GuiTAR-based Pronominal Anaphora Resolution in Bengali
Author: Apurbalal Senapati ; Utpal Garain
Abstract: This paper attempts to use an off-the-shelf anaphora resolution (AR) system for Bengali. The language specific preprocessing modules of GuiTAR (v3.0.3) are identified and suitably designed for Bengali. Anaphora resolution module is also modified or replaced in order to realize different configurations of GuiTAR. Performance of each configuration is evaluated and experiment shows that the off-the-shelf AR system can be effectively used for Indic languages. 1
2 0.79721045 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text
Author: Mohamed Amir Yosef ; Sandro Bauer ; Johannes Hoffart ; Marc Spaniol ; Gerhard Weikum
Abstract: Recent research has shown progress in achieving high-quality, very fine-grained type classification in hierarchical taxonomies. Within such a multi-level type hierarchy with several hundreds of types at different levels, many entities naturally belong to multiple types. In order to achieve high-precision in type classification, current approaches are either limited to certain domains or require time consuming multistage computations. As a consequence, existing systems are incapable of performing ad-hoc type classification on arbitrary input texts. In this demo, we present a novel Webbased tool that is able to perform domain independent entity type classification under real time conditions. Thanks to its efficient implementation and compacted feature representation, the system is able to process text inputs on-the-fly while still achieving equally high precision as leading state-ofthe-art implementations. Our system offers an online interface where natural-language text can be inserted, which returns semantic type labels for entity mentions. Further more, the user interface allows users to explore the assigned types by visualizing and navigating along the type-hierarchy.
same-paper 3 0.79604894 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu
Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.
4 0.75042737 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays
Author: Beata Beigman Klebanov ; Michael Flor
Abstract: We describe a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassociated, and dis-associated pairs of words that co-exist in the given text. We illustrate the shape of the distirbution and observe variation with genre and target audience. We present a study of the relationship between quality of writing and word association profiles. For a set of essays written by college graduates on a number of general topics, we show that the higher scoring essays tend to have higher percentages of both highly associated and dis-associated pairs, and lower percentages of mildly associated pairs of words. Finally, we use word association profiles to improve a system for automated scoring of essays.
5 0.75023234 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein
Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.
6 0.6130203 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
7 0.6070829 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints
8 0.59890765 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation
9 0.59691483 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
10 0.5965724 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
11 0.5957669 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
12 0.59544492 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
13 0.5944562 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
14 0.59365791 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation
15 0.59255832 80 acl-2013-Chinese Parsing Exploiting Characters
16 0.59010053 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
17 0.58845335 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization
18 0.58737528 333 acl-2013-Summarization Through Submodularity and Dispersion
19 0.58705515 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation
20 0.58586138 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching