acl acl2012 acl2012-5 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu
Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.
Reference: text
sentIndex sentText sentNum sentScore
1 In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. [sent-11, score-0.522]
2 Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. [sent-12, score-0.859]
3 We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. [sent-13, score-0.369]
4 Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice. [sent-14, score-0.347]
5 1 Introduction Stanford dependencies (de Marneffe and Manning, 2008) provide a simple description of relations between pairs of words in a sentence. [sent-15, score-0.211]
6 Consequently, Stanford dependencies are widely used: in biomedical text mining (Kim et al. [sent-17, score-0.214]
7 In addition to English, there is a Chinese version of Stanford dependencies (Chang et al. [sent-20, score-0.176]
8 , 2009), 11 ‡Computer Science Department Stanford University Stanford, CA, 94305 (a) A constituent parse tree. [sent-21, score-0.268]
9 Figure 1: A sample Chinese constituent parse tree and its corresponding Stanford dependencies for the sentence China (中 国) encourages (鼓励) private (民营) entrepreneurs (企业家) to invest (投资) in national (国家) infrastructure (基础) construction (建设). [sent-23, score-0.477]
10 which is also useful for many applications, such as Chinese sentiment analysis (Wu et al. [sent-24, score-0.044]
11 Figure 1 shows a sample constituent parse tree and the corresponding Stanford dependencies for a sentence in Chinese. [sent-29, score-0.477]
12 Although there are several variants of Stanford dependencies for English,1 so far only a basic version (i. [sent-30, score-0.176]
13 Stanford dependencies were originally obtained from constituent trees, using rules (de Marneffe et al. [sent-32, score-0.444]
14 But as dependency parsing technologies mature (K¨ ubler et al. [sent-34, score-0.347]
15 (2010) reported that Stanford’s implementation (Klein and Manning, 2003) underperforms other constituent 1nlp . [sent-37, score-0.268]
16 drhetsml Table 1: Basic information for the seven parsers included in our experiments. [sent-50, score-0.286]
17 Their thorough investigation also showed that constituent parsers systematically outperform parsing directly to Stanford dependencies. [sent-52, score-0.58]
18 Nevertheless, relative standings could have changed in recent years: dependency parsers are now significantly more accurate, thanks to advances like the high-order maximum spanning tree (MST) model (Koo and Collins, 2010) for graph-based dependency parsing (McDonald and Pereira, 2006). [sent-53, score-0.769]
19 Therefore, we deemed it important to re-evaluate the performance of constituent and dependency parsers. [sent-54, score-0.48]
20 But the main purpose of our work is to apply the more sophisticated dependency parsing algorithms specifically to Chinese. [sent-55, score-0.294]
21 2 Methodology We compared seven popular open source constituent and dependency parsers, focusing on both accuracy and parsing speed. [sent-58, score-0.655]
22 We hope that our analysis will help end-users select a suitable method for parsing to Stanford dependencies in their own applications. [sent-59, score-0.258]
23 , 2006), Bikel (2004), Charniak (2000) and Stanford (Klein and Manning, 2003) chineseFactored, which is also the default used by Stanford dependencies. [sent-63, score-0.037]
24 The three dependency parsers are: MaltParser (Nivre et al. [sent-64, score-0.442]
25 2A second-order MST parser (with the speed optimization). [sent-67, score-0.15]
26 3 Settings Every parser was run with its own default options. [sent-77, score-0.187]
27 However, since the default classifier used by MaltParser is libsvm (Chang and Lin, 2011) with a polynomial kernel, it may be too slow for training models on all of CTB 7. [sent-78, score-0.116]
28 Therefore, we also tested this particular parser with the faster liblinear (Fan et al. [sent-80, score-0.245]
29 4 Features Unlike constituent parsers, dependency models require exogenous part-of-speech (POS) tags, both in training and in inference. [sent-85, score-0.48]
30 5 Word lemmas which are generalizations of words are another feature known to be useful for dependency parsing. [sent-89, score-0.212]
31 TypeParserUASDevLASUASTestLASParsing Time for both development and data parsing times (minutes:seconds) for the data only and exclude are test sets; are test eration of basic Stanford dependencies (for constituent parsers) and part-of-speech tagging (for dependency 3 genparsers). [sent-102, score-0.773]
32 They can be computed via a CoNLL-X shared task dependency parsing evaluation tool (without scoring — punctuation). [sent-104, score-0.363]
33 1 Chinese Mate scored highest, and Berkeley was the most accurate of constituent parsers, slightly behind Mate, using half of the time. [sent-106, score-0.399]
34 MaltParser (liblinear) was by far the most efficient but also the least performant; it scored higher with libsvm but took much more time. [sent-107, score-0.13]
35 The 1st-order MSTParser was more accurate than MaltParser (libsvm) a result that differs from that of Cer et al. [sent-108, score-0.08]
36 The Stanfoofr Cd parser (the 0de)f faourlt E fnogrl sShta (nsfeoerd § dependencies) was only slightly more accurate than MaltParser (liblinear). [sent-111, score-0.23]
37 Bikel’s parser was too slow to be used in practice; and Charniak’s parser which performs best for English did not work well for Chinese. [sent-112, score-0.3]
38 1%) and hence the better dependency parser for English, consistent with our results for Chinese (see Table 3). [sent-119, score-0.362]
39 (2010), however, since the constituent parser of Charniak and Johnson (2005) still scores substantially higher (89. [sent-124, score-0.418]
40 7 In a separate experiment (parsing web data),8 we found Mate to be less accurate than Charniak-Johnson and improvement from jackknifing smaller on English. [sent-126, score-0.187]
41 — — 4 Analysis To further compare the constituent and dependency approaches to generating Stanford dependencies, we focused on Mate and Berkeley parsers the best of each type. [sent-127, score-0.71]
42 Mate does better on most relations, noun compound modifiers (nn) and adjectival modifiers (amod) in particular; and the Berkeley parser is better at root and dep. [sent-131, score-0.218]
43 Since POS-tags are especially informative of Chinese dependencies (Li et al. [sent-133, score-0.176]
44 , 2011), we harmonized training and test data, using 10-way jackknifing (see §2. [sent-134, score-0.107]
45 This m 7One (small) factor contributing to the difference between the two languages is that in the Chinese setup we stop with basic Stanford dependencies — there is no penalty for further conversion; another is not using discriminative reranking for Chinese. [sent-138, score-0.176]
46 Table 4: Performance (F1 scores) for the fifteen mostfrequent dependency relations in the CTB 7. [sent-150, score-0.247]
47 parser with gold tags because it improves consistency, particularly for Chinese, where tagging accuracies are lower than in English. [sent-152, score-0.235]
48 On development data, Mate scored worse given gold tags (75. [sent-153, score-0.101]
49 4 versus Lemmatization offered additional useful cues for overcoming data sparseness (77. [sent-154, score-0.04]
50 5 Discussion Our results suggest that if accuracy is of primary concern, then Mate should be preferred;12 however, Berkeley parser offers a trade-off between accuracy and speed. [sent-161, score-0.15]
51 If neither parser satisfies the demands of a practical application (e. [sent-162, score-0.15]
52 Stanford dependencies are not the only popular dependency representation. [sent-166, score-0.425]
53 We also considered the 11Berkeley’s performance suffered with jackknifed tags (76. [sent-167, score-0.112]
54 14 conversion scheme of the Penn2Malt tool,13 used in a series of CoNLL shared tasks (Buchholz and Marsi, 2006; Nivre et al. [sent-171, score-0.069]
55 However, this tool relies on function tag information from the CTB in determining dependency relations. [sent-175, score-0.212]
56 Since these tags usually cannot be produced by constituent parsers, we could not, in turn, obtain CoNLL-style dependency trees from their output. [sent-176, score-0.53]
57 This points to another advantage of dependency parsers: they need only the dependency tree corpus to train and can conveniently make use of native (unconverted) corpora, such as the Chinese Dependency Treebank (Liu et al. [sent-177, score-0.457]
58 Lastly, we must note that although the Berkeley parser is on par with Charniak’s (2000) system for English (Cer et al. [sent-179, score-0.15]
59 The Berkeley parser appears more general without quite as many parameters or idiosyncratic design decisions as evidenced by — — a recent application to French (Candito et al. [sent-184, score-0.15]
60 6 Conclusion We compared seven popular open source parsers four constituent and three dependency for generating Stanford dependencies in Chinese. [sent-186, score-0.979]
61 Mate, a high-order MST dependency parser, with lemmatization and jackknifed POS-tags, appears most accurate; but Berkeley’s faster constituent parser, with jointly-inferred tags, is statistically no worse. [sent-187, score-0.581]
62 This outcome is different from English, where constituent parsers systematically outperform direct methods. [sent-188, score-0.498]
63 Though Mate scored higher overall, Berkeley’s parser was better at recovering longer-distance relations, suggesting that a combined approach could perhaps work better still (Rush et al. [sent-189, score-0.201]
64 — — Acknowledgments We thank Daniel Cer, for helping us replicate the English experimental setup and for suggesting that we explore jackknifing methods, and the anonymous reviewers, for valuable comments. [sent-192, score-0.107]
65 In Proceedings of the 20th international joint conference on Artifical intelligence, IJCAI’07, pages 2670–2676, San Francisco, CA, USA. [sent-209, score-0.107]
66 A distributional analysis of a lexicalized statistical parsing model. [sent-214, score-0.082]
67 In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 182–189, Barcelona, Spain, July. [sent-215, score-0.069]
68 Top accuracy and fast dependency pars- ing is not a contradiction. [sent-219, score-0.212]
69 In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 89–97, Beijing, China, August. [sent-220, score-0.069]
70 In Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X), pages 149–164, New York City, June. [sent-225, score-0.069]
71 In Coling 2010: Posters, pages 108–1 16, Beijing, China, August. [sent-230, score-0.069]
72 In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 173–180, Ann 15 Arbor, Michigan, June. [sent-249, score-0.069]
73 In Proceedings of the 1st North American chapter of the Association for Computational Linguistics conference, NAACL 2000, pages 132–139, Stroudsburg, PA, USA. [sent-254, score-0.069]
74 The CoNLL-2009 shared task: Syntac- ˇSt eˇp a´nek, tic and semantic dependencies in multiple languages. [sent-272, score-0.245]
75 In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 1–18, Boulder, Colorado, June. [sent-273, score-0.069]
76 In Proceedings of the 4th Asia information retrieval conference on Information retrieval technology, AIRS’08, pages 598– 604, Berlin, Heidelberg. [sent-278, score-0.107]
77 In Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing: Shared Task, BioNLP ’09, pages 1–9, Stroudsburg, PA, USA. [sent-283, score-0.069]
78 In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics - Volume 1, ACL ’03, pages 423–430, Stroudsburg, PA, USA. [sent-289, score-0.069]
79 In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 1–1 1, Stroudsburg, PA, USA. [sent-294, score-0.069]
80 In Proceedings of ACL-08: HLT, pages 595–603, Columbus, Ohio, June. [sent-299, score-0.069]
81 Joint models for Chinese POS tagging and dependency parsing. [sent-309, score-0.247]
82 In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pages 1180–1 191, Edinburgh, Scotland, UK. [sent-310, score-0.069]
83 In Proceedings of the 11th Conference of the European Chapter of the ACL (EACL 2006), pages 81–88. [sent-320, score-0.069]
84 Sentence level sentiment analysis in the presence of conjuncts using linguistic analysis. [sent-325, score-0.044]
85 In Proceedings of the 29th European conference on IR research, ECIR’07, pages 573–580, Berlin, Heidelberg. [sent-326, score-0.107]
86 In Proceedings of ACL-08: HLT, pages 950–958, Columbus, Ohio, June. [sent-331, score-0.069]
87 In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), pages 2216– 2219. [sent-336, score-0.069]
88 In Proceedings of the CoNLL Shared Task Session of EMNLPCoNLL 2007, pages 915–932, Prague, Czech Republic, June. [sent-340, score-0.069]
89 In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 433–440, Sydney, Australia, July. [sent-345, score-0.069]
90 In Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, pages 1–1 1, Cambridge, MA, October. [sent-351, score-0.069]
91 In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, HLT ’ 10, pages 649–652, Stroudsburg, PA, USA. [sent-357, score-0.069]
92 The CoNLL 2008 shared task on joint parsing of syntactic and semantic dependencies. [sent-361, score-0.151]
93 In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, pages 159–177, Manchester, England, August. [sent-362, score-0.069]
94 In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1, NAACL ’03, pages 173–180, Stroudsburg, PA, USA. [sent-368, score-0.069]
95 Morphological features help POS tagging of unknown words across language varieties. [sent-372, score-0.035]
96 In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, ACL ’ 10, pages 118–127, Stroudsburg, PA, USA. [sent-378, score-0.069]
97 In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 3 - Volume 3, EMNLP ’09, pages 1533–1541, Stroudsburg, PA, USA. [sent-383, score-0.069]
98 In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’ 11, pages 1332–1341, Stroudsburg, PA, USA. [sent-388, score-0.069]
99 In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers - Volume 2, HLT ’ 11, pages 188–193, Stroudsburg, PA, USA. [sent-393, score-0.069]
100 In Proceedings of the 15th ACM international conference on Information and knowledge management, CIKM ’06, pages 43–50, New York, NY, USA. [sent-398, score-0.107]
wordName wordTfidf (topN-words)
[('mate', 0.4), ('constituent', 0.268), ('stanford', 0.266), ('parsers', 0.23), ('dependency', 0.212), ('dependencies', 0.176), ('maltparser', 0.164), ('parser', 0.15), ('chinese', 0.138), ('cer', 0.138), ('berkeley', 0.134), ('joakim', 0.129), ('nivre', 0.116), ('jackknifing', 0.107), ('ctb', 0.106), ('stroudsburg', 0.105), ('surdeanu', 0.098), ('liblinear', 0.095), ('mstparser', 0.091), ('marneffe', 0.085), ('parsing', 0.082), ('accurate', 0.08), ('libsvm', 0.079), ('charniak', 0.077), ('pa', 0.077), ('association', 0.077), ('mcdonald', 0.074), ('koo', 0.071), ('pages', 0.069), ('shared', 0.069), ('conll', 0.069), ('wu', 0.064), ('mst', 0.062), ('mihai', 0.062), ('tseng', 0.062), ('jackknifed', 0.062), ('lide', 0.062), ('meena', 0.062), ('xuanjing', 0.062), ('yuanbin', 0.062), ('christopher', 0.059), ('chang', 0.057), ('seven', 0.056), ('ubler', 0.053), ('coling', 0.052), ('china', 0.051), ('scored', 0.051), ('tags', 0.05), ('ting', 0.05), ('ryan', 0.05), ('androutsopoulos', 0.049), ('organizing', 0.048), ('bionlp', 0.046), ('candito', 0.046), ('huihsin', 0.046), ('meyers', 0.046), ('rush', 0.046), ('wanxiang', 0.046), ('zhuang', 0.046), ('manning', 0.045), ('sentiment', 0.044), ('buchholz', 0.043), ('bikel', 0.043), ('jurafsky', 0.042), ('computational', 0.041), ('che', 0.041), ('harbin', 0.041), ('ir', 0.041), ('versus', 0.04), ('afrl', 0.039), ('banko', 0.039), ('las', 0.039), ('lemmatization', 0.039), ('uas', 0.039), ('conference', 0.038), ('biomedical', 0.038), ('terry', 0.038), ('english', 0.037), ('default', 0.037), ('popular', 0.037), ('proceedings', 0.037), ('johan', 0.036), ('car', 0.036), ('haji', 0.036), ('jens', 0.036), ('daniel', 0.035), ('relations', 0.035), ('arquez', 0.035), ('llu', 0.035), ('sandra', 0.035), ('huang', 0.035), ('tagging', 0.035), ('treebank', 0.034), ('modifiers', 0.034), ('tree', 0.033), ('johansson', 0.033), ('ensemble', 0.033), ('zhang', 0.033), ('klein', 0.033), ('fan', 0.032)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies
Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu
Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.
2 0.25400159 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
3 0.23077855 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Author: Wenliang Chen ; Min Zhang ; Haizhou Li
Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.
4 0.2029167 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
Author: Zhenghua Li ; Ting Liu ; Wanxiang Che
Abstract: We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5. 1 and 6.0) using the Chinese Dependency Treebank as the source treebank. The improvements are respectively 1.37% and 1.10% with automatic part-of-speech tags. Moreover, an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion.
5 0.19869421 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
Author: Katsuhiko Hayashi ; Taro Watanabe ; Masayuki Asahara ; Yuji Matsumoto
Abstract: This paper presents a novel top-down headdriven parsing algorithm for data-driven projective dependency analysis. This algorithm handles global structures, such as clause and coordination, better than shift-reduce or other bottom-up algorithms. Experiments on the English Penn Treebank data and the Chinese CoNLL-06 data show that the proposed algorithm achieves comparable results with other data-driven dependency parsing algorithms.
6 0.19704446 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
7 0.15615121 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
8 0.13995771 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
9 0.1327873 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing
10 0.13197054 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
11 0.13052225 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
12 0.12598087 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
13 0.12556444 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
14 0.11655024 115 acl-2012-Identifying High-Impact Sub-Structures for Convolution Kernels in Document-level Sentiment Classification
15 0.11479067 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
16 0.11312117 71 acl-2012-Dependency Hashing for n-best CCG Parsing
17 0.10636681 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
18 0.10037006 168 acl-2012-Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
19 0.097894534 96 acl-2012-Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
20 0.095684454 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
topicId topicWeight
[(0, -0.274), (1, 0.013), (2, -0.24), (3, -0.23), (4, -0.069), (5, -0.114), (6, 0.006), (7, -0.103), (8, 0.018), (9, 0.001), (10, 0.16), (11, 0.139), (12, 0.014), (13, -0.036), (14, 0.021), (15, 0.06), (16, -0.027), (17, 0.013), (18, -0.07), (19, 0.072), (20, -0.003), (21, -0.014), (22, 0.044), (23, -0.048), (24, 0.033), (25, 0.031), (26, -0.075), (27, 0.107), (28, 0.11), (29, -0.0), (30, -0.008), (31, -0.073), (32, 0.022), (33, 0.03), (34, -0.065), (35, 0.008), (36, 0.012), (37, -0.028), (38, 0.122), (39, 0.04), (40, 0.03), (41, -0.017), (42, 0.004), (43, -0.084), (44, -0.02), (45, 0.085), (46, 0.029), (47, 0.001), (48, 0.025), (49, -0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.97495025 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies
Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu
Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.
2 0.91463453 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
Author: Zhenghua Li ; Ting Liu ; Wanxiang Che
Abstract: We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5. 1 and 6.0) using the Chinese Dependency Treebank as the source treebank. The improvements are respectively 1.37% and 1.10% with automatic part-of-speech tags. Moreover, an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion.
3 0.85169941 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Author: Wenliang Chen ; Min Zhang ; Haizhou Li
Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.
4 0.83678985 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
Author: Emily Pitler
Abstract: Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependencies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and prepositions with an accuracy of 87.4%.
5 0.79095167 109 acl-2012-Higher-order Constituent Parsing and Parser Combination
Author: Xiao Chen ; Chunyu Kit
Abstract: This paper presents a higher-order model for constituent parsing aimed at utilizing more local structural context to decide the score of a grammar rule instance in a parse tree. Experiments on English and Chinese treebanks confirm its advantage over its first-order version. It achieves its best F1 scores of 91.86% and 85.58% on the two languages, respectively, and further pushes them to 92.80% and 85.60% via combination with other highperformance parsers.
6 0.77167964 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
7 0.71033299 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
8 0.67502648 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
9 0.66665667 95 acl-2012-Fast Syntactic Analysis for Statistical Language Modeling via Substructure Sharing and Uptraining
10 0.66303128 122 acl-2012-Joint Evaluation of Morphological Segmentation and Syntactic Parsing
11 0.6256178 119 acl-2012-Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese
12 0.60481536 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
13 0.59760398 75 acl-2012-Discriminative Strategies to Integrate Multiword Expression Recognition and Parsing
14 0.58981961 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
15 0.58588272 11 acl-2012-A Feature-Rich Constituent Context Model for Grammar Induction
16 0.52864128 90 acl-2012-Extracting Narrative Timelines as Temporal Dependency Structures
17 0.5273034 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
18 0.52582312 189 acl-2012-Syntactic Annotations for the Google Books NGram Corpus
19 0.51970196 71 acl-2012-Dependency Hashing for n-best CCG Parsing
20 0.49605015 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets
topicId topicWeight
[(25, 0.027), (26, 0.051), (28, 0.044), (30, 0.035), (37, 0.052), (39, 0.044), (59, 0.025), (71, 0.285), (74, 0.026), (82, 0.023), (84, 0.026), (85, 0.068), (90, 0.113), (92, 0.049), (94, 0.019), (99, 0.033)]
simIndex simValue paperId paperTitle
same-paper 1 0.79009491 5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies
Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu
Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.
2 0.75609291 4 acl-2012-A Comparative Study of Target Dependency Structures for Statistical Machine Translation
Author: Xianchao Wu ; Katsuhito Sudoh ; Kevin Duh ; Hajime Tsukada ; Masaaki Nagata
Abstract: This paper presents a comparative study of target dependency structures yielded by several state-of-the-art linguistic parsers. Our approach is to measure the impact of these nonisomorphic dependency structures to be used for string-to-dependency translation. Besides using traditional dependency parsers, we also use the dependency structures transformed from PCFG trees and predicate-argument structures (PASs) which are generated by an HPSG parser and a CCG parser. The experiments on Chinese-to-English translation show that the HPSG parser’s PASs achieved the best dependency and translation accuracies. 1
3 0.54592288 213 acl-2012-Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Author: Wenliang Chen ; Min Zhang ; Haizhou Li
Abstract: Most previous graph-based parsing models increase decoding complexity when they use high-order features due to exact-inference decoding. In this paper, we present an approach to enriching high-orderfeature representations for graph-based dependency parsing models using a dependency language model and beam search. The dependency language model is built on a large-amount of additional autoparsed data that is processed by a baseline parser. Based on the dependency language model, we represent a set of features for the parsing model. Finally, the features are efficiently integrated into the parsing model during decoding using beam search. Our approach has two advantages. Firstly we utilize rich high-order features defined over a view of large scope and additional large raw corpus. Secondly our approach does not increase the decoding complexity. We evaluate the proposed approach on English and Chinese data. The experimental results show that our new parser achieves the best accuracy on the Chinese data and comparable accuracy with the best known systems on the English data.
4 0.52863872 30 acl-2012-Attacking Parsing Bottlenecks with Unlabeled Data and Relevant Factorizations
Author: Emily Pitler
Abstract: Prepositions and conjunctions are two of the largest remaining bottlenecks in parsing. Across various existing parsers, these two categories have the lowest accuracies, and mistakes made have consequences for downstream applications. Prepositions and conjunctions are often assumed to depend on lexical dependencies for correct resolution. As lexical statistics based on the training set only are sparse, unlabeled data can help ameliorate this sparsity problem. By including unlabeled data features into a factorization of the problem which matches the representation of prepositions and conjunctions, we achieve a new state-of-the-art for English dependencies with 93.55% correct attachments on the current standard. Furthermore, conjunctions are attached with an accuracy of 90.8%, and prepositions with an accuracy of 87.4%.
5 0.52823937 87 acl-2012-Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars
Author: Zhenghua Li ; Ting Liu ; Wanxiang Che
Abstract: We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach can significantly advance the state-of-the-art parsing accuracy on two widely used target treebanks (Penn Chinese Treebank 5. 1 and 6.0) using the Chinese Dependency Treebank as the source treebank. The improvements are respectively 1.37% and 1.10% with automatic part-of-speech tags. Moreover, an indirect comparison indicates that our approach also outperforms previous work based on treebank conversion.
6 0.52079576 106 acl-2012-Head-driven Transition-based Parsing with Top-down Prediction
7 0.51169717 45 acl-2012-Capturing Paradigmatic and Syntagmatic Lexical Relations: Towards Accurate Chinese Part-of-Speech Tagging
8 0.50623727 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
9 0.49868292 172 acl-2012-Selective Sharing for Multilingual Dependency Parsing
10 0.49822345 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
11 0.49547896 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation
12 0.49419916 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
13 0.49382386 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
14 0.49332768 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
15 0.49272677 136 acl-2012-Learning to Translate with Multiple Objectives
16 0.49221766 175 acl-2012-Semi-supervised Dependency Parsing using Lexical Affinities
17 0.49189958 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
18 0.49187118 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation
19 0.49166429 64 acl-2012-Crosslingual Induction of Semantic Roles
20 0.49140105 109 acl-2012-Higher-order Constituent Parsing and Parser Combination