acl acl2013 acl2013-44 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein
Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Aspects of Chinese syntax result in a distinctive mix of parsing challenges. [sent-5, score-0.192]
2 However, the contribution of individual sources of error to overall difficulty is not well understood. [sent-6, score-0.265]
3 We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. [sent-7, score-1.095]
4 We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges. [sent-8, score-0.856]
5 1 Introduction A decade of Chinese parsing research, enabled by the Penn Chinese Treebank (PCTB; Xue et al. [sent-9, score-0.1]
6 , 2005), has seen Chinese parsing performance improve from 76. [sent-10, score-0.1]
7 While recent advances have focused on understanding and reducing the errors that occur in segmentation and partof-speech tagging (Qian and Liu, 2012; Jiang et al. [sent-13, score-0.322]
8 , 2009; Forst and Fang, 2009), a range of substantial issues remain that are purely syntactic. [sent-14, score-0.097]
9 Early work by Levy and Manning (2003) presented modifications to a parser motivated by a manual investigation of parsing errors. [sent-15, score-0.304]
10 They noted substantial differences between Chinese and English parsing, attributing some of the differences to treebank annotation decisions and others to meaningful differences in syntax. [sent-16, score-0.252]
11 Based on this analysis they considered how to modify their parser to capture the information necessary to model the syntax within the PCTB . [sent-17, score-0.287]
12 However, their manual analysis was limited in scope, covering only part of the parser output, and was unable to characterize the relative impact of the issues they uncovered. [sent-18, score-0.345]
13 au This paper presents a more comprehensive analysis of errors in Chinese parsing, building on the technique presented in Kummerfeld et al. [sent-22, score-0.26]
14 (2012), which characterized the error behavior of English parsers by quantifying how often they make errors such as PP attachment and coordination scope. [sent-23, score-0.802]
15 To accommodate error classes that are absent in English, we augment the system to recognize Chinese-specific parse errors. [sent-24, score-0.304]
16 1 We use the modified system to show the relative impact of different error types across a range of Chinese parsers. [sent-25, score-0.391]
17 To understand the impact of tagging errors on different error types, we performed a part-ofspeech ablation experiment, in which particular confusions are introduced in isolation. [sent-26, score-0.737]
18 By analyzing the distribution of errors in the system output with and without gold part-of-speech tags, we are able to isolate and quantify the error types that can be resolved by improvements in tagging accuracy. [sent-27, score-0.762]
19 Our analysis shows that improvements in tagging accuracy can only address a subset of the challenges of Chinese syntax. [sent-28, score-0.174]
20 Further improvement in Chinese parsing performance will require research addressing other challenges, in particular, determining coordination scope. [sent-29, score-0.206]
21 While their focus was on issues faced by their factored PCFG parser (Klein and Manning, 2003b), the error types they identified are general issues presented by Chinese syntax in the PCTB . [sent-31, score-0.756]
22 However, as noted in their final section, their manual analysis of parse errors in 100 sentences only covered a portion of a single parser’s output, limiting the conclusions they could reach regarding the distribution of errors in Chinese parsing. [sent-37, score-0.418]
23 (2012), which presented a system that automatically classifies English parse errors using a two stage process. [sent-40, score-0.194]
24 First, the system finds the shortest path from the system output to the gold annotations, where each step in the path is a tree transformation, fixing at least one bracket error. [sent-41, score-0.274]
25 Second, each transformation step is classified into one of several error types. [sent-42, score-0.358]
26 When directly applied to Chinese parser output, the system placed over 27% of the errors in the catch-all ‘Other’ type. [sent-43, score-0.443]
27 Many of these errors clearly fall into one of a small set of error types, motivating an adaptation to Chinese syntax. [sent-44, score-0.459]
28 3 Adapting error analysis to Chinese To adapt the Kummerfeld et al. [sent-45, score-0.295]
29 (2012) system to Chinese, we developed a new version of the second stage of the system, which assigns an error category to each tree transformation step. [sent-46, score-0.356]
30 To characterize the errors the original system placed in the ‘Other’ category, we looked through one hundred sentences, identifying error types generated by Chinese syntax that the existing system did not account for. [sent-47, score-0.667]
31 To ensure the accuracy of our classifications, we alternated between refining the classification code and looking at affected classifications to identify issues. [sent-49, score-0.071]
32 For exam- ple, we use the structure of the final gold standard tree when classifying errors that are a byproduct of sense disambiguation errors. [sent-52, score-0.324]
33 4 Chinese parsing errors Table 1presents the errors made by the Berkeley parser. [sent-53, score-0.488]
34 Below we describe the error types that are Error TypeBrackets% of total number of bracket errors attributed to that error type. [sent-54, score-1.006]
35 * indicates error types that were added or substantially changed as part of this work. [sent-56, score-0.384]
36 (2012), presenting the number of bracket errors (missing or extra) attributed to each error type. [sent-59, score-0.663]
37 Bracket counts are more informative than a direct count of each error type, because the impact on EVALB F-score varies between errors, e. [sent-60, score-0.313]
38 a single attachment error can cause 20 bracket errors, while a unary error causes only one. [sent-62, score-0.866]
39 We assign this error type when a transformation involves words whose parts of speech in the gold tree are one of: CC, CD, DEG, ETC, JJ, NN, NR, NT and OD. [sent-67, score-0.486]
40 We investigated the errors that fall into the NPinternal category and found that 49% of the errors involved the creation or deletion of a single pretermianl phrasal bracket. [sent-68, score-0.388]
41 These errors arise when a parser proposes a tree in which POS tags (for instance, JJ or NN) occur as siblings of phrasal tags (such as NP), a configuration used by the PCTB bracketing guidelines to indicate complementation as opposed to adjunction (Xue et al. [sent-69, score-0.654]
42 2For an explanation of the English error types, see Kummerfeld et al. [sent-71, score-0.265]
43 Note that this also covers some of the errors that Kummerfeld et al. [sent-78, score-0.194]
44 For mis-application of unary rules we separate out instances in which the two brackets in the production have the the same label (A-over-A). [sent-81, score-0.051]
45 This cases is created when traces are eliminated, a standard step in evaluation. [sent-82, score-0.096]
46 More than a third of unary errors made by the Berkeley parser are of the A-over-A type. [sent-83, score-0.449]
47 This can be attributed to two factors: (i) the PCTB annotates non-local dependencies using traces, and (ii) Chinese syntax generates more traces than English syntax (Guo et al. [sent-84, score-0.304]
48 However, for parsers that do not return traces they are a benign error. [sent-86, score-0.192]
49 Incorrect modifier scope caused by modifier phrase attachment level. [sent-89, score-0.354]
50 This applies when the head word of a phrase receives the wrong POS, leading to an attachment error. [sent-93, score-0.187]
51 This error type is common in Chinese because of POS fluidity, e. [sent-94, score-0.304]
52 the well-known Chinese verb/noun ambiguity often causes mis-attachments that are classified as this error type. [sent-96, score-0.35]
53 In Figure 1d, the word 投资 invest has both noun and verb senses. [sent-97, score-0.071]
54 While the gold standard interpretation is the relative clause firms that Macau invests in, the parser returned an NP interpretation Macau investment firms. [sent-98, score-0.324]
55 In this error type, a span is moved to a position where the POS tags of its new siblings all belong to the list of NP-internal structure tags which we identified above, reflecting the inclusion of additional material into an NP. [sent-100, score-0.482]
56 The PCTB annotations recognize several Chinese verb compounding strategies, such as the serial verb construction (规划建设 plan [and] build) and the resultative construction (煮熟 cook [until] done), which join a bare verb to another lexical item. [sent-102, score-0.213]
57 We introduce an error type specific to Chinese, in which such verb compounds are split, with the two halves of the compound placed in different phrases. [sent-103, score-0.42]
58 P Figure 1: Prominent error types in Chinese parsing. [sent-152, score-0.343]
59 The left tree is the gold structure; the right is the parser hypothesis. [sent-153, score-0.334]
60 These are cases in which a new span must be added to more closely bind a modifier phrase (ADVP, ADJP, and PP). [sent-155, score-0.1]
61 This error type is rare in Chinese, as adjunct PPs are pre-verbal. [sent-157, score-0.342]
62 It does occur near coordinated VPs, where ambiguity arises about which of the conjuncts the PP has scope over. [sent-158, score-0.147]
63 Whether this particular case is PP attachment or coordination is debatable; we follow Kummerfeld et al. [sent-159, score-0.247]
64 1 Chinese-English comparison It is difficult to directly compare error analysis results for Chinese and English parsing because of substantial changes in the classification method, and differences in treebank annotations. [sent-162, score-0.577]
65 As described in the previous section, the set of error categories considered for Chinese is very different to the set of categories for English. [sent-163, score-0.345]
66 Even for some of the categories that were not substantially changed, errors may be classified differently because of cross-over between categories between 100 SystemF1NInPt. [sent-164, score-0.315]
67 b8e7rof bracket errors per sentence attributed to that error type, where an empty bar is no errors and a full bar has the value indicated in the bottom row. [sent-180, score-0.945]
68 Differences in treebank annotations also present a challenge for cross-language error comparison. [sent-185, score-0.346]
69 The most common error type in Chinese, NPinternal structure, is rare in the results of Kummerfeld et al. [sent-186, score-0.342]
70 Further characterization of the impact of annotation differences on errors is beyond the scope of this paper. [sent-188, score-0.348]
71 Three conclusions that can be made are that (i) coordination is a major issue in both languages, (ii) PP attachment is a much greater problem in English, and (iii) a higher frequency of tracegenerating syntax in Chinese compared to English poses substantial challenges. [sent-189, score-0.366]
72 5 Cross-parser analysis The previous section described the error types and their distribution for a single Chinese parser. [sent-190, score-0.373]
73 Here we confirm that these are general trends, by showing that the same pattern is observed for several different parsers on the PCTB 6 dev We include results for a transition-based parser (ZPAR; Zhang and Clark, 2009), a split-merge PCFG parser (Petrov et al. [sent-191, score-0.534]
74 , 2006; Petrov and Klein, 2007; Petrov, 2010), a lexicalized parser (Bikel and Chiang, 2000), and a factored PCFG and dependency parser (Levy and Manning, 2003; Klein and Manning, 2003a,b). [sent-192, score-0.502]
75 Comparing the two Stanford parsers in Table 2, the factored model provides clear improvements set. [sent-193, score-0.19]
76 All analysis is on the dev set, to avoid revealing specific information about the test set. [sent-196, score-0.06]
77 4These parsers represent a variety of parsing methods, though exclude some recently developed parsers that are not publicly available (Qian and Liu, 2012; Xiong et al. [sent-197, score-0.292]
78 The Berkeley product parser we include uses only two grammars because we found, in contrast to the English results (Petrov, 2010), that further grammars provided limited benefits. [sent-200, score-0.288]
79 Comparing the performance with the standard Berkeley parser it seems that the diversity in the grammars only assists certain error types, with most of the improvement occurring in four of the categories, while there is no improvement, or a slight decrease, in five categories. [sent-201, score-0.54]
80 6 Tagging Error Impact The challenge of accurate POS tagging in Chinese has been a major part of several recent papers (Qian and Liu, 2012; Jiang et al. [sent-202, score-0.091]
81 The Berk-G row of Table 2 shows the performance of the Berkeley parser when given gold POS tags. [sent-204, score-0.295]
82 5 While the F1 improvement is unsurprising, for the first time we can clearly show that the gains are only in a subset of the error types. [sent-205, score-0.265]
83 In particular, tagging improvement will not help for two of the most significant challenges: coordi- nation scope errors, and verb argument selection. [sent-206, score-0.233]
84 To see which tagging confusions contribute to which error reductions, we adapt the POS ablation approach of Tse and Curran (2012). [sent-207, score-0.495]
85 To isolate the effects of each confusion we start from the gold tags and introduce the output of the Stanford tagger whenever it returns one of the two tags being considered. [sent-209, score-0.392]
86 6 We then feed these “semi-gold” tags 5We used the Berkeley parser as it was the best of the parsers we considered. [sent-210, score-0.391]
87 Note that the Berkeley parser occasionally prunes all of the parses that use the gold POS tags, and so returns the best available alternative. [sent-211, score-0.295]
88 6We introduce errors to gold tags, rather than removing er101 Confused tagsErrors∆F1 VDVECNDENG1055265--12. [sent-215, score-0.285]
89 to the Berkeley parser, and run the fine-grained error analysis on its output. [sent-221, score-0.295]
90 This confusion has been consistently shown to be a major contributor to parsing errors (Levy and Manning, 2003; Tse and Curran, 2012; Qian and Liu, 2012), and we find a drop of over 2. [sent-223, score-0.37]
91 We found that while most error types have contributions from a range of POS confusions, verb/noun confusion was responsible for virtually all of the noun boundary errors corrected by using gold tags. [sent-225, score-0.704]
92 This confusion between the relativizer and subordinator senses of the particle 的 de is the primary source of improvements on modifier attachment when using gold tags. [sent-227, score-0.379]
93 Despite their frequency, these confusions have little effect on parsing performance. [sent-229, score-0.201]
94 Even within the NP-internal error type their impact is limited, and almost all of the errors do not change the logical form. [sent-230, score-0.546]
95 7 Conclusion We have quantified the relative impacts of a comprehensive set of error types in Chinese parsing. [sent-231, score-0.379]
96 Our analysis has also shown that while improvements in Chinese POS tagging can make a substantial difference for some error types, it will not address two high-frequency error types: incorrect verb argument attachment and coordination scope. [sent-232, score-1.064]
97 The frequency of these two error types is also unimproved by the use of products of latent variable grammars. [sent-233, score-0.343]
98 These observations suggest that resolving the core challenges of Chinese parsing will require new developments that suit the distinctive properties of Chinese syntax. [sent-234, score-0.192]
99 This research was supported by a General Sir John Monash Fellowship to the first rors from automatic tags, isolating the effect of a single confusion by eliminating interaction between tagging decisions. [sent-237, score-0.167]
100 TBL-improved non-deterministic segmentation and POS tagging for a Chinese parser. [sent-247, score-0.128]
wordName wordTfidf (topN-words)
[('chinese', 0.396), ('pctb', 0.296), ('error', 0.265), ('kummerfeld', 0.253), ('parser', 0.204), ('berkeley', 0.201), ('errors', 0.194), ('bracket', 0.144), ('attachment', 0.141), ('qian', 0.133), ('bikel', 0.12), ('pos', 0.113), ('coordination', 0.106), ('tse', 0.101), ('confusions', 0.101), ('levy', 0.101), ('parsing', 0.1), ('parsers', 0.096), ('traces', 0.096), ('factored', 0.094), ('gold', 0.091), ('tagging', 0.091), ('tags', 0.091), ('forst', 0.087), ('pp', 0.083), ('treebank', 0.081), ('zpar', 0.081), ('types', 0.078), ('pcfg', 0.077), ('confusion', 0.076), ('petrov', 0.075), ('scope', 0.071), ('modifier', 0.071), ('verb', 0.071), ('curran', 0.07), ('substantial', 0.066), ('npinternal', 0.066), ('attributed', 0.06), ('klein', 0.059), ('macau', 0.054), ('syntax', 0.053), ('challenges', 0.053), ('transformation', 0.052), ('unary', 0.051), ('penn', 0.049), ('manning', 0.048), ('impact', 0.048), ('wrong', 0.046), ('placed', 0.045), ('unlexicalized', 0.045), ('bar', 0.044), ('ambiguity', 0.044), ('xue', 0.043), ('isolate', 0.043), ('grammars', 0.042), ('fang', 0.042), ('annotates', 0.042), ('classifications', 0.042), ('changed', 0.041), ('slav', 0.041), ('classified', 0.041), ('sydney', 0.04), ('liu', 0.04), ('categories', 0.04), ('np', 0.04), ('xiong', 0.039), ('absent', 0.039), ('confused', 0.039), ('distinctive', 0.039), ('tree', 0.039), ('type', 0.039), ('rare', 0.038), ('ablation', 0.038), ('segmentation', 0.037), ('sapporo', 0.036), ('guo', 0.036), ('comprehensive', 0.036), ('siblings', 0.035), ('differences', 0.035), ('jiang', 0.035), ('island', 0.033), ('dan', 0.033), ('arises', 0.032), ('jj', 0.032), ('characterize', 0.032), ('english', 0.031), ('issues', 0.031), ('stanford', 0.03), ('yue', 0.03), ('dev', 0.03), ('jeju', 0.03), ('analysis', 0.03), ('qun', 0.029), ('incorrect', 0.029), ('aastiso', 0.029), ('alternated', 0.029), ('assists', 0.029), ('bind', 0.029), ('cuotamtipounta', 0.029), ('firms', 0.029)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein
Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.
2 0.33339605 80 acl-2013-Chinese Parsing Exploiting Characters
Author: Meishan Zhang ; Yue Zhang ; Wanxiang Che ; Ting Liu
Abstract: Characters play an important role in the Chinese language, yet computational processing of Chinese has been dominated by word-based approaches, with leaves in syntax trees being words. We investigate Chinese parsing from the character-level, extending the notion of phrase-structure trees by annotating internal structures of words. We demonstrate the importance of character-level information to Chinese processing by building a joint segmentation, part-of-speech (POS) tagging and phrase-structure parsing system that integrates character-structure features. Our joint system significantly outperforms a state-of-the-art word-based baseline on the standard CTB5 test, and gives the best published results for Chinese parsing.
3 0.25992429 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
Author: Zhiguo Wang ; Chengqing Zong ; Nianwen Xue
Abstract: For the cascaded task of Chinese word segmentation, POS tagging and parsing, the pipeline approach suffers from error propagation while the joint learning approach suffers from inefficient decoding due to the large combined search space. In this paper, we present a novel lattice-based framework in which a Chinese sentence is first segmented into a word lattice, and then a lattice-based POS tagger and a lattice-based parser are used to process the lattice from two different viewpoints: sequential POS tagging and hierarchical tree building. A strategy is designed to exploit the complementary strengths of the tagger and parser, and encourage them to predict agreed structures. Experimental results on Chinese Treebank show that our lattice-based framework significantly improves the accuracy of the three sub-tasks. 1
4 0.25476289 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
Author: Xipeng Qiu ; Qi Zhang ; Xuanjing Huang
Abstract: The growing need for Chinese natural language processing (NLP) is largely in a range of research and commercial applications. However, most of the currently Chinese NLP tools or components still have a wide range of issues need to be further improved and developed. FudanNLP is an open source toolkit for Chinese natural language processing (NLP) , which uses statistics-based and rule-based methods to deal with Chinese NLP tasks, such as word segmentation, part-ofspeech tagging, named entity recognition, dependency parsing, time phrase recognition, anaphora resolution and so on.
5 0.21983767 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
Author: Muhua Zhu ; Yue Zhang ; Wenliang Chen ; Min Zhang ; Jingbo Zhu
Abstract: Shift-reduce dependency parsers give comparable accuracies to their chartbased counterparts, yet the best shiftreduce constituent parsers still lag behind the state-of-the-art. One important reason is the existence of unary nodes in phrase structure trees, which leads to different numbers of shift-reduce actions between different outputs for the same input. This turns out to have a large empirical impact on the framework of global training and beam search. We propose a simple yet effective extension to the shift-reduce process, which eliminates size differences between action sequences in beam-search. Our parser gives comparable accuracies to the state-of-the-art chart parsers. With linear run-time complexity, our parser is over an order of magnitude faster than the fastest chart parser.
6 0.20665424 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
7 0.17987782 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
8 0.1774701 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
9 0.14633545 275 acl-2013-Parsing with Compositional Vector Grammars
10 0.13862215 173 acl-2013-Graph-based Semi-Supervised Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging
11 0.13372858 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
12 0.12301054 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
13 0.1226777 26 acl-2013-A Transition-Based Dependency Parser Using a Dynamic Parsing Strategy
14 0.11999511 368 acl-2013-Universal Dependency Annotation for Multilingual Parsing
15 0.11852951 335 acl-2013-Survey on parsing three dependency representations for English
16 0.11596055 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
17 0.1080342 97 acl-2013-Cross-lingual Projections between Languages from Different Families
18 0.10344636 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
19 0.10258967 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
20 0.10163282 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints
topicId topicWeight
[(0, 0.248), (1, -0.167), (2, -0.322), (3, 0.047), (4, 0.086), (5, -0.032), (6, -0.082), (7, 0.037), (8, 0.055), (9, 0.03), (10, -0.013), (11, 0.046), (12, 0.043), (13, 0.016), (14, 0.01), (15, 0.003), (16, 0.023), (17, 0.012), (18, -0.039), (19, 0.016), (20, 0.008), (21, -0.048), (22, -0.004), (23, -0.056), (24, 0.028), (25, 0.017), (26, 0.008), (27, -0.082), (28, 0.061), (29, -0.0), (30, -0.0), (31, -0.077), (32, 0.043), (33, -0.142), (34, -0.046), (35, 0.037), (36, -0.041), (37, -0.048), (38, 0.123), (39, -0.121), (40, -0.022), (41, -0.011), (42, -0.028), (43, -0.141), (44, 0.018), (45, 0.018), (46, -0.053), (47, 0.022), (48, 0.018), (49, 0.054)]
simIndex simValue paperId paperTitle
same-paper 1 0.98334473 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein
Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.
2 0.8532576 80 acl-2013-Chinese Parsing Exploiting Characters
Author: Meishan Zhang ; Yue Zhang ; Wanxiang Che ; Ting Liu
Abstract: Characters play an important role in the Chinese language, yet computational processing of Chinese has been dominated by word-based approaches, with leaves in syntax trees being words. We investigate Chinese parsing from the character-level, extending the notion of phrase-structure trees by annotating internal structures of words. We demonstrate the importance of character-level information to Chinese processing by building a joint segmentation, part-of-speech (POS) tagging and phrase-structure parsing system that integrates character-structure features. Our joint system significantly outperforms a state-of-the-art word-based baseline on the standard CTB5 test, and gives the best published results for Chinese parsing.
3 0.8343069 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
Author: Xipeng Qiu ; Qi Zhang ; Xuanjing Huang
Abstract: The growing need for Chinese natural language processing (NLP) is largely in a range of research and commercial applications. However, most of the currently Chinese NLP tools or components still have a wide range of issues need to be further improved and developed. FudanNLP is an open source toolkit for Chinese natural language processing (NLP) , which uses statistics-based and rule-based methods to deal with Chinese NLP tasks, such as word segmentation, part-ofspeech tagging, named entity recognition, dependency parsing, time phrase recognition, anaphora resolution and so on.
4 0.81614298 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
Author: Zhiguo Wang ; Chengqing Zong ; Nianwen Xue
Abstract: For the cascaded task of Chinese word segmentation, POS tagging and parsing, the pipeline approach suffers from error propagation while the joint learning approach suffers from inefficient decoding due to the large combined search space. In this paper, we present a novel lattice-based framework in which a Chinese sentence is first segmented into a word lattice, and then a lattice-based POS tagger and a lattice-based parser are used to process the lattice from two different viewpoints: sequential POS tagging and hierarchical tree building. A strategy is designed to exploit the complementary strengths of the tagger and parser, and encourage them to predict agreed structures. Experimental results on Chinese Treebank show that our lattice-based framework significantly improves the accuracy of the three sub-tasks. 1
5 0.78896463 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu
Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.
6 0.69621754 243 acl-2013-Mining Informal Language from Chinese Microtext: Joint Word Recognition and Segmentation
7 0.6905582 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
8 0.66141212 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
9 0.64404917 208 acl-2013-Joint Inference for Heterogeneous Dependency Parsing
10 0.64219415 335 acl-2013-Survey on parsing three dependency representations for English
11 0.64025217 193 acl-2013-Improving Chinese Word Segmentation on Micro-blog Using Rich Punctuations
12 0.59325498 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
13 0.59196705 34 acl-2013-Accurate Word Segmentation using Transliteration and Language Model Projection
14 0.56038094 331 acl-2013-Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing
15 0.5494501 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
16 0.548742 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation
17 0.54604918 288 acl-2013-Punctuation Prediction with Transition-based Parsing
18 0.53565574 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints
19 0.53315824 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
20 0.52965897 94 acl-2013-Coordination Structures in Dependency Treebanks
topicId topicWeight
[(0, 0.043), (6, 0.026), (11, 0.117), (14, 0.017), (15, 0.023), (24, 0.047), (26, 0.069), (35, 0.081), (42, 0.062), (48, 0.035), (70, 0.048), (71, 0.23), (88, 0.043), (90, 0.015), (95, 0.083)]
simIndex simValue paperId paperTitle
1 0.94417864 177 acl-2013-GuiTAR-based Pronominal Anaphora Resolution in Bengali
Author: Apurbalal Senapati ; Utpal Garain
Abstract: This paper attempts to use an off-the-shelf anaphora resolution (AR) system for Bengali. The language specific preprocessing modules of GuiTAR (v3.0.3) are identified and suitably designed for Bengali. Anaphora resolution module is also modified or replaced in order to realize different configurations of GuiTAR. Performance of each configuration is evaluated and experiment shows that the off-the-shelf AR system can be effectively used for Indic languages. 1
2 0.86282146 179 acl-2013-HYENA-live: Fine-Grained Online Entity Type Classification from Natural-language Text
Author: Mohamed Amir Yosef ; Sandro Bauer ; Johannes Hoffart ; Marc Spaniol ; Gerhard Weikum
Abstract: Recent research has shown progress in achieving high-quality, very fine-grained type classification in hierarchical taxonomies. Within such a multi-level type hierarchy with several hundreds of types at different levels, many entities naturally belong to multiple types. In order to achieve high-precision in type classification, current approaches are either limited to certain domains or require time consuming multistage computations. As a consequence, existing systems are incapable of performing ad-hoc type classification on arbitrary input texts. In this demo, we present a novel Webbased tool that is able to perform domain independent entity type classification under real time conditions. Thanks to its efficient implementation and compacted feature representation, the system is able to process text inputs on-the-fly while still achieving equally high precision as leading state-ofthe-art implementations. Our system offers an online interface where natural-language text can be inserted, which returns semantic type labels for entity mentions. Further more, the user interface allows users to explore the assigned types by visualizing and navigating along the type-hierarchy.
same-paper 3 0.82746762 44 acl-2013-An Empirical Examination of Challenges in Chinese Parsing
Author: Jonathan K. Kummerfeld ; Daniel Tse ; James R. Curran ; Dan Klein
Abstract: Aspects of Chinese syntax result in a distinctive mix of parsing challenges. However, the contribution of individual sources of error to overall difficulty is not well understood. We conduct a comprehensive automatic analysis of error types made by Chinese parsers, covering a broad range of error types for large sets of sentences, enabling the first empirical ranking of Chinese error types by their performance impact. We also investigate which error types are resolved by using gold part-of-speech tags, showing that improving Chinese tagging only addresses certain error types, leaving substantial outstanding challenges.
4 0.80270213 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays
Author: Beata Beigman Klebanov ; Michael Flor
Abstract: We describe a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassociated, and dis-associated pairs of words that co-exist in the given text. We illustrate the shape of the distirbution and observe variation with genre and target audience. We present a study of the relationship between quality of writing and word association profiles. For a set of essays written by college graduates on a number of general topics, we show that the higher scoring essays tend to have higher percentages of both highly associated and dis-associated pairs, and lower percentages of mildly associated pairs of words. Finally, we use word association profiles to improve a system for automated scoring of essays.
5 0.79595947 204 acl-2013-Iterative Transformation of Annotation Guidelines for Constituency Parsing
Author: Xiang Li ; Wenbin Jiang ; Yajuan Lu ; Qun Liu
Abstract: This paper presents an effective algorithm of annotation adaptation for constituency treebanks, which transforms a treebank from one annotation guideline to another with an iterative optimization procedure, thus to build a much larger treebank to train an enhanced parser without increasing model complexity. Experiments show that the transformed Tsinghua Chinese Treebank as additional training data brings significant improvement over the baseline trained on Penn Chinese Treebank only.
6 0.66800463 164 acl-2013-FudanNLP: A Toolkit for Chinese Natural Language Processing
7 0.66111666 205 acl-2013-Joint Apposition Extraction with Syntactic and Semantic Constraints
8 0.65562993 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
9 0.65561086 358 acl-2013-Transition-based Dependency Parsing with Selectional Branching
10 0.65141255 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
11 0.6503371 318 acl-2013-Sentiment Relevance
12 0.65000409 169 acl-2013-Generating Synthetic Comparable Questions for News Articles
13 0.64960206 156 acl-2013-Fast and Adaptive Online Training of Feature-Rich Translation Models
14 0.64600003 154 acl-2013-Extracting bilingual terminologies from comparable corpora
15 0.64493215 245 acl-2013-Modeling Human Inference Process for Textual Entailment Recognition
16 0.64083409 333 acl-2013-Summarization Through Submodularity and Dispersion
17 0.64037395 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing
18 0.64027739 61 acl-2013-Automatic Interpretation of the English Possessive
19 0.64025605 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
20 0.63970125 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation