5 acl-2012-A Comparison of Chinese Parsers for Stanford Dependencies

Author: Wanxiang Che ; Valentin Spitkovsky ; Ting Liu

Abstract: Stanford dependencies are widely used in natural language processing as a semanticallyoriented representation, commonly generated either by (i) converting the output of a constituent parser, or (ii) predicting dependencies directly. Previous comparisons of the two approaches for English suggest that starting from constituents yields higher accuracies. In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice.

1 In this paper, we re-evaluate both methods for Chinese, using more accurate dependency parsers than in previous work. [sent-11, score-0.522]

2 Our comparison of performance and efficiency across seven popular open source parsers (four constituent and three dependency) shows, by contrast, that recent higher-order graph-based techniques can be more accurate, though somewhat slower, than constituent parsers. [sent-12, score-0.859]

3 We demonstrate also that n-way jackknifing is a useful technique for producing automatic (rather than gold) partof-speech tags to train Chinese dependency parsers. [sent-13, score-0.369]

4 Finally, we analyze the relations produced by both kinds of parsing and suggest which specific parsers to use in practice. [sent-14, score-0.347]

5 1 Introduction Stanford dependencies (de Marneffe and Manning, 2008) provide a simple description of relations between pairs of words in a sentence. [sent-15, score-0.211]

6 Consequently, Stanford dependencies are widely used: in biomedical text mining (Kim et al. [sent-17, score-0.214]

7 In addition to English, there is a Chinese version of Stanford dependencies (Chang et al. [sent-20, score-0.176]

8 , 2009), 11 ‡Computer Science Department Stanford University Stanford, CA, 94305 (a) A constituent parse tree. [sent-21, score-0.268]

9 Figure 1: A sample Chinese constituent parse tree and its corresponding Stanford dependencies for the sentence China (中 国) encourages (鼓励) private (民营) entrepreneurs (企业家) to invest (投资) in national (国家) infrastructure (基础) construction (建设). [sent-23, score-0.477]

10 which is also useful for many applications, such as Chinese sentiment analysis (Wu et al. [sent-24, score-0.044]

11 Figure 1 shows a sample constituent parse tree and the corresponding Stanford dependencies for a sentence in Chinese. [sent-29, score-0.477]

12 Although there are several variants of Stanford dependencies for English,1 so far only a basic version (i. [sent-30, score-0.176]

13 Stanford dependencies were originally obtained from constituent trees, using rules (de Marneffe et al. [sent-32, score-0.444]

14 But as dependency parsing technologies mature (K¨ ubler et al. [sent-34, score-0.347]

15 (2010) reported that Stanford’s implementation (Klein and Manning, 2003) underperforms other constituent 1nlp . [sent-37, score-0.268]

16 drhetsml Table 1: Basic information for the seven parsers included in our experiments. [sent-50, score-0.286]

17 Their thorough investigation also showed that constituent parsers systematically outperform parsing directly to Stanford dependencies. [sent-52, score-0.58]

18 Nevertheless, relative standings could have changed in recent years: dependency parsers are now significantly more accurate, thanks to advances like the high-order maximum spanning tree (MST) model (Koo and Collins, 2010) for graph-based dependency parsing (McDonald and Pereira, 2006). [sent-53, score-0.769]

19 Therefore, we deemed it important to re-evaluate the performance of constituent and dependency parsers. [sent-54, score-0.48]

20 But the main purpose of our work is to apply the more sophisticated dependency parsing algorithms specifically to Chinese. [sent-55, score-0.294]

21 2 Methodology We compared seven popular open source constituent and dependency parsers, focusing on both accuracy and parsing speed. [sent-58, score-0.655]

22 We hope that our analysis will help end-users select a suitable method for parsing to Stanford dependencies in their own applications. [sent-59, score-0.258]

23 , 2006), Bikel (2004), Charniak (2000) and Stanford (Klein and Manning, 2003) chineseFactored, which is also the default used by Stanford dependencies. [sent-63, score-0.037]

24 The three dependency parsers are: MaltParser (Nivre et al. [sent-64, score-0.442]

25 2A second-order MST parser (with the speed optimization). [sent-67, score-0.15]

26 3 Settings Every parser was run with its own default options. [sent-77, score-0.187]

27 However, since the default classifier used by MaltParser is libsvm (Chang and Lin, 2011) with a polynomial kernel, it may be too slow for training models on all of CTB 7. [sent-78, score-0.116]

28 Therefore, we also tested this particular parser with the faster liblinear (Fan et al. [sent-80, score-0.245]

29 4 Features Unlike constituent parsers, dependency models require exogenous part-of-speech (POS) tags, both in training and in inference. [sent-85, score-0.48]

30 5 Word lemmas which are generalizations of words are another feature known to be useful for dependency parsing. [sent-89, score-0.212]

31 TypeParserUASDevLASUASTestLASParsing Time for both development and data parsing times (minutes:seconds) for the data only and exclude are test sets; are test eration of basic Stanford dependencies (for constituent parsers) and part-of-speech tagging (for dependency 3 genparsers). [sent-102, score-0.773]

32 They can be computed via a CoNLL-X shared task dependency parsing evaluation tool (without scoring — punctuation). [sent-104, score-0.363]

33 1 Chinese Mate scored highest, and Berkeley was the most accurate of constituent parsers, slightly behind Mate, using half of the time. [sent-106, score-0.399]

34 MaltParser (liblinear) was by far the most efficient but also the least performant; it scored higher with libsvm but took much more time. [sent-107, score-0.13]

35 The 1st-order MSTParser was more accurate than MaltParser (libsvm) a result that differs from that of Cer et al. [sent-108, score-0.08]

36 The Stanfoofr Cd parser (the 0de)f faourlt E fnogrl sShta (nsfeoerd § dependencies) was only slightly more accurate than MaltParser (liblinear). [sent-111, score-0.23]

37 Bikel’s parser was too slow to be used in practice; and Charniak’s parser which performs best for English did not work well for Chinese. [sent-112, score-0.3]

38 1%) and hence the better dependency parser for English, consistent with our results for Chinese (see Table 3). [sent-119, score-0.362]

39 (2010), however, since the constituent parser of Charniak and Johnson (2005) still scores substantially higher (89. [sent-124, score-0.418]

40 7 In a separate experiment (parsing web data),8 we found Mate to be less accurate than Charniak-Johnson and improvement from jackknifing smaller on English. [sent-126, score-0.187]

41 — — 4 Analysis To further compare the constituent and dependency approaches to generating Stanford dependencies, we focused on Mate and Berkeley parsers the best of each type. [sent-127, score-0.71]

42 Mate does better on most relations, noun compound modifiers (nn) and adjectival modifiers (amod) in particular; and the Berkeley parser is better at root and dep. [sent-131, score-0.218]

43 Since POS-tags are especially informative of Chinese dependencies (Li et al. [sent-133, score-0.176]

44 , 2011), we harmonized training and test data, using 10-way jackknifing (see §2. [sent-134, score-0.107]

45 This m 7One (small) factor contributing to the difference between the two languages is that in the Chinese setup we stop with basic Stanford dependencies — there is no penalty for further conversion; another is not using discriminative reranking for Chinese. [sent-138, score-0.176]

46 Table 4: Performance (F1 scores) for the fifteen mostfrequent dependency relations in the CTB 7. [sent-150, score-0.247]

47 parser with gold tags because it improves consistency, particularly for Chinese, where tagging accuracies are lower than in English. [sent-152, score-0.235]

48 On development data, Mate scored worse given gold tags (75. [sent-153, score-0.101]

49 4 versus Lemmatization offered additional useful cues for overcoming data sparseness (77. [sent-154, score-0.04]

50 5 Discussion Our results suggest that if accuracy is of primary concern, then Mate should be preferred;12 however, Berkeley parser offers a trade-off between accuracy and speed. [sent-161, score-0.15]

51 If neither parser satisfies the demands of a practical application (e. [sent-162, score-0.15]

52 Stanford dependencies are not the only popular dependency representation. [sent-166, score-0.425]

53 We also considered the 11Berkeley’s performance suffered with jackknifed tags (76. [sent-167, score-0.112]

54 14 conversion scheme of the Penn2Malt tool,13 used in a series of CoNLL shared tasks (Buchholz and Marsi, 2006; Nivre et al. [sent-171, score-0.069]

55 However, this tool relies on function tag information from the CTB in determining dependency relations. [sent-175, score-0.212]

56 Since these tags usually cannot be produced by constituent parsers, we could not, in turn, obtain CoNLL-style dependency trees from their output. [sent-176, score-0.53]

57 This points to another advantage of dependency parsers: they need only the dependency tree corpus to train and can conveniently make use of native (unconverted) corpora, such as the Chinese Dependency Treebank (Liu et al. [sent-177, score-0.457]

58 Lastly, we must note that although the Berkeley parser is on par with Charniak’s (2000) system for English (Cer et al. [sent-179, score-0.15]

59 The Berkeley parser appears more general without quite as many parameters or idiosyncratic design decisions as evidenced by — — a recent application to French (Candito et al. [sent-184, score-0.15]

60 6 Conclusion We compared seven popular open source parsers four constituent and three dependency for generating Stanford dependencies in Chinese. [sent-186, score-0.979]

61 Mate, a high-order MST dependency parser, with lemmatization and jackknifed POS-tags, appears most accurate; but Berkeley’s faster constituent parser, with jointly-inferred tags, is statistically no worse. [sent-187, score-0.581]

62 This outcome is different from English, where constituent parsers systematically outperform direct methods. [sent-188, score-0.498]

63 Though Mate scored higher overall, Berkeley’s parser was better at recovering longer-distance relations, suggesting that a combined approach could perhaps work better still (Rush et al. [sent-189, score-0.201]

