acl acl2011 acl2011-111 knowledge-graph by maker-knowledge-mining

111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation


Source: pdf

Author: Nathan Green

Abstract: Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency parse will have a cascading effect down the NLP pipeline and in the end, improve machine translation output, even with a reduction in parser accuracy that the noun phrase structure might cause. This paper examines this noun phrase structure’s effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2.43%, 0.23 Bleu score, improvement for English to Czech machine translation. .

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. [sent-2, score-1.051]

2 Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. [sent-3, score-0.17]

3 It is proposed that changes to the noun phrase dependency parse will have a cascading effect down the NLP pipeline and in the end, improve machine translation output, even with a reduction in parser accuracy that the noun phrase structure might cause. [sent-4, score-1.713]

4 This paper examines this noun phrase structure’s effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2. [sent-5, score-0.91]

5 23 Bleu score, improvement for English to Czech machine translation. [sent-7, score-0.038]

6 1 Introduction Noun phrase structure in the Penn Treebank has up until recently been only considered, due to underspecification, a flat structure. [sent-9, score-0.366]

7 Due to the annotation and work of Vadas and Curran (2007a; 2007b; 2008), we are now able to create Natural Language Processing (NLP) systems that take advantage of the internal structure of noun phrases in the Penn Treebank. [sent-10, score-0.657]

8 This extra internal structure introduces additional complications in NLP applications such as parsing. [sent-11, score-0.285]

9 Dependency parsing has been shown to improve NLP systems in certain languages and in many cases is considered the state of the art in the field. [sent-16, score-0.151]

10 Dependency parsing made many improvements due to the CoNLL X shared task (Buchholz and Marsi, 2006). [sent-17, score-0.189]

11 However, in most cases, these systems were trained with a flat noun phrase structure in the Penn Treebank. [sent-18, score-0.721]

12 Vadas’ internal noun phrase structure has been used in previous work on constituent parsing using Collin’s parser (Vadas and Curran, 2007c), but has yet to be analyzed for its effects on dependency parsing. [sent-19, score-1.267]

13 Therefore, improvements in parsing output could have an improvement on other areas of NLP in many cases, such as Machine Translation. [sent-21, score-0.151]

14 At the same time, any errors in parsing will tend to propagate down the NLP pipeline. [sent-22, score-0.24]

15 One would expect parsing accuracy to be reduced when the complexity of the parse is increased, such as adding noun phrase structure. [sent-23, score-0.73]

16 But, for a machine translation system that is reliant on parsing, the new noun phrase structure, even with reduced parser accuracy, may yield improvements due to a more detailed grammatical structure. [sent-24, score-0.8]

17 This is particularly of interest for dependency relations, as it may aid in finding the correct head of a term in a complex noun phrase. [sent-25, score-0.491]

18 This paper examines the results and errors in parsing and machine translation of dependency parsers, trained with annotated noun phrase structure, against those with a flat noun phrase structure. [sent-26, score-1.715]

19 c 201 1 Association for Computational Linguistics gold standard internal noun phrase structure annotation. [sent-29, score-0.875]

20 Additionally, we analyze the effect of these improvements and errors in parsing down the NLP pipeline on the TectoMT machine translation system (Zˇabokrtsk y´ et al. [sent-30, score-0.437]

21 1 Dependency Parsing Dependence parsing is an alternative view to the common phrase or constituent parsing techniques used with the Penn Treebank. [sent-37, score-0.526]

22 Dependency relations can be used in many applications and have been shown to be quite useful in languages with a free word order. [sent-38, score-0.061]

23 With the influx of many data-driven techniques, the need for annotated dependency relations is apparent. [sent-39, score-0.229]

24 Since there are many data sets with constituent relations annotated, this paper uses free conversion software provided from the CoNLL 2008 shared task to create dependency relations (Johansson and Nugues, 2007; Surdeanu et al. [sent-40, score-0.367]

25 2 Dependency Parsers Dependency parsing comes in two main forms: Graph algorithms and Greedy algorithms. [sent-43, score-0.151]

26 Each parser has its advantages and disadvantages, but the accuracy overall is approximately the same. [sent-46, score-0.148]

27 The types of errors made by each parser, however, are very different. [sent-47, score-0.049]

28 MSTParser is globally trained for an optimal solution and this has led it to get the best results on longer sentences. [sent-48, score-0.067]

29 MaltParser on the other hand, is a greedy algorithm. [sent-49, score-0.031]

30 This allows it to perform extremely well on shorter sentences, as the errors tend to propagate and causemore egregious errors inlonger sentences with longer dependencies (McDonald and Nivre, 2007). [sent-50, score-0.138]

31 We expect each parser to have different errors handling internal noun phrase structure, but for this paper we will only be examining the globally trained 70 MSTParser. [sent-51, score-0.898]

32 3 TectoMT TectoMT is a machine translation framework based on Praguian tectogrammatics (Sgall, 1967) which represents four main layers: word layer, morphological layer, analytical layer, and tectogrammatical layer (Popel et al. [sent-53, score-0.469]

33 This framework is primarily focused on the translation from English into Czech. [sent-55, score-0.133]

34 Since much of dependency parsing work has been focused on Czech, this choice of machine translation framework logically follows as TectoMT makes direct use of the dependency relationships. [sent-56, score-0.683]

35 The work in this paper primarily addresses the noun phrase structure in the analytical layer (SEnglishA in Figure 1). [sent-57, score-0.781]

36 Figure 1: Translation Process in TectoMT in which the tectogrammatical layer is transfered from English to Czech. [sent-58, score-0.172]

37 This allows great ease in adding the two different parsers into the framework since each experiment can be run as a separate “Scenario” comprised ofdifferent parsing “Blocks”. [sent-60, score-0.247]

38 This allows a simple comparison of two machine translation system in which everything remains constant except the dependency parser. [sent-61, score-0.309]

39 , 1993), consisting of annotated portions of the Wall Street Journal. [sent-64, score-0.03]

40 Much of the annotation task is painstakingly done by annotators in great detail. [sent-65, score-0.033]

41 Some structures are not dealt with in detail, such as noun phrase structure. [sent-66, score-0.514]

42 Not having this information makes it difficult to tell the dependencies on phrases such as “crude oil prices” (Vadas and Curran, 2007c). [sent-67, score-0.194]

43 With- out internal annotation it is ambiguous whether the phrase is stating “crude prices” (crude (oil prices)) or “crude oil” ((crude oil) prices). [sent-68, score-0.411]

44 crude oil prices crude oil prices Figure 2: Ambiguous dependency caused by internal noun phrase structure. [sent-69, score-1.874]

45 Manual annotation of these phrases would be quite time consuming and as seen in the example above, sometimes ambiguous and therefore prone to poor inter-annotator agreement. [sent-70, score-0.147]

46 Vadas and Curran have constructed a Gold standard version Penn treebank with these structures. [sent-71, score-0.078]

47 The additional complexity of noun phrase structure has been shown to reduce parser accuracy in Collin’s parser but no similar evaluation has been conducted for dependency parsers. [sent-74, score-1.045]

48 The internal noun phrase structure has been used in experiments prior but without evaluation with respect to the noun phrases (Galley and Manning, 2009). [sent-75, score-1.138]

49 The Baseline system is McDonald’s MSTParser trained on the Penn Treebank in English without any extra noun phrase bracketing. [sent-78, score-0.572]

50 The Gold NP Parser is McDonald’s MSTParser trained on the Penn Treebank in English with gold standard noun phrase structure annotations (Vadas and Curran, 2007a). [sent-80, score-0.751]

51 1 Data Sets To maintain a consistent dataset to compare to previous work we use the Wall Street Journal (WSJ) section of the Penn Treebank since it was used in the CoNLL X shared task on dependency parsing (Buchholz and Marsi, 2006). [sent-82, score-0.355]

52 To test the effects of the noun phrase structure on machine translation, ACL 2008’s Workshop on Statistical Machine translation’s (WMT) data are used. [sent-84, score-0.685]

53 The Penn Treebank with no internal noun phrase structure (PTB w/o NP structure). [sent-88, score-0.771]

54 The Penn Treebank with gold standard noun phrase annotations provided by Vadas and Curran (PTB w/ gold standard NP structure). [sent-90, score-0.722]

55 These parsers are trained using McDonald’s Maximum Spanning Tree Algorithm (MSTParser) (McDonald et al. [sent-92, score-0.098]

56 Both of the parsers are then tested on a subset of the WSJ corpus, section 22, of the Penn Treebank and the UAS and LAS scores are generated. [sent-94, score-0.1]

57 Errors generated by each of these systems are then compared to discover where the internal noun phrase structure affects the output. [sent-95, score-0.771]

58 Parser accuracy is not necessarily the most important aspect of this work. [sent-96, score-0.034]

59 The effect of this noun phrase structure down the NLP pipeline is also crucial. [sent-97, score-0.711]

60 For this, the parsers are inserted into the TectoMT system. [sent-98, score-0.068]

61 3 Metrics Labeled Accuracy Score (LAS) and Unlabeled Accuracy Score (UAS) are the primary ways to evaluate dependency parsers. [sent-100, score-0.166]

62 LAS is the percentage of words that are connected to their correct heads and have the correct dependency label. [sent-102, score-0.166]

63 The Bleu (BiLingual Evaluation Understudy) score is an automatic scoring mechanism for machine translation that is quick and can be reused as a benchmark across machine translation tasks. [sent-104, score-0.315]

64 Bleu is calculated as the geometric mean of n-grams comparing a machine translation and a reference text (Papineni et al. [sent-105, score-0.143]

65 This experiment compares the two parsing systems against each other using the above metrics. [sent-107, score-0.151]

66 In both cases the test set data is sampled 1,000 times without replacement to calculate statistical significance using a pairwise comparison. [sent-108, score-0.06]

67 4 Results and Discussion When applied, the gold standard annotations changed approximately 1. [sent-109, score-0.104]

68 Once trained, both parsers were tested against section 22 of their respective annotated corpora. [sent-111, score-0.13]

69 This was expected given the additional complexity of predicting the noun phrase structure and the previous work on noun phrase bracketing’s effect on Collin’s parser. [sent-113, score-1.159]

70 Each is trained on Section 02-21 of the WSJ and tested on Section 22 While possibly more error prone, the 1. [sent-117, score-0.062]

71 5% change in edges in the training data did appear to add more useful syntactic structure to the resulting parses as can be seen in Table 2. [sent-118, score-0.103]

72 With the additional noun 72 phrase bracketing, the resulting Bleu score increased 0. [sent-119, score-0.514]

73 The improvement is statistically significant with 95% confidence using pairwise bootstrapping of 1,000 test sets randomly sampled with replacement (Koehn, 2004; Zhang et al. [sent-122, score-0.06]

74 The samples were sorted by the difference in bleu score. [sent-131, score-0.084]

75 Visually, changes can be seen in the English side parse that affect the overall translation quality. [sent-132, score-0.136]

76 Sentences that contained incorrect noun phrase structure such as “The second vice-president and Economy minister, Pedro Solbes” as seen in Figure 5 and Figure 6 were more correctly parsed in the Gold NP Parser. [sent-133, score-0.617]

77 In Figure 5 “and” is incorrectly assigned to the bottom of a noun phrase and does not connect any segments together in the output of the Baseline Parser, while it connects two phrases in Figure 6 which is the output of the Gold NP Parser. [sent-134, score-0.632]

78 This shift in bracketing also allows the proper noun, which is shaded, to be assigned to the correct head, the rightmost noun in the phrase. [sent-135, score-0.422]

79 Figure 5: The parse created with the data with flat structures does not appear to handle noun phrases with more depth, in this case the ’and’ does not properly connect the two components. [sent-136, score-0.506]

80 Figure 6: With the addition of noun phrase structure in parser, the complicated noun phrase appears to be better structured. [sent-137, score-1.131]

81 The ’and’ connects two components instead of improperly being a leaf node. [sent-138, score-0.042]

82 73 5 Conclusion This paper has demonstrated the benefit of additional noun phrase bracketing in training data for use in dependency parsing and machine translation. [sent-139, score-0.966]

83 Using the additional structure, the dependency parser’s accuracy was minimally reduced. [sent-140, score-0.2]

84 Despite this re- duction, machine translation, much further down the NLP pipeline, obtained a 2. [sent-141, score-0.038]

85 Future work should examine similar experiments with MaltParser and other machine translation systems. [sent-143, score-0.143]

86 Building a large annotated corpus of english: the penn treebank. [sent-169, score-0.186]

87 Bleu: a method for automatic evaluation of machine translation. [sent-188, score-0.038]

88 The conll-2008 shared task on joint parsing of syntactic and semantic dependencies. [sent-201, score-0.189]

89 Tectomt: highly modular mt system with tectogrammatics used as transfer layer. [sent-227, score-0.144]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('vadas', 0.379), ('tectomt', 0.333), ('noun', 0.325), ('crude', 0.206), ('phrase', 0.189), ('curran', 0.168), ('dependency', 0.166), ('prices', 0.162), ('penn', 0.156), ('internal', 0.154), ('oil', 0.152), ('parsing', 0.151), ('uas', 0.118), ('abokrtsk', 0.117), ('parser', 0.114), ('las', 0.112), ('layer', 0.105), ('translation', 0.105), ('gold', 0.104), ('mstparser', 0.104), ('structure', 0.103), ('mcdonald', 0.101), ('np', 0.101), ('zden', 0.1), ('bracketing', 0.097), ('buchholz', 0.092), ('bleu', 0.084), ('treebank', 0.078), ('flat', 0.074), ('marsi', 0.069), ('parsers', 0.068), ('nlp', 0.067), ('petr', 0.067), ('popel', 0.067), ('tectogrammatical', 0.067), ('tectogrammatics', 0.067), ('pipeline', 0.066), ('collin', 0.064), ('analytical', 0.059), ('ptb', 0.057), ('conll', 0.057), ('maltparser', 0.056), ('wsj', 0.053), ('johansson', 0.051), ('modular', 0.05), ('jan', 0.049), ('errors', 0.049), ('morristown', 0.046), ('joakim', 0.046), ('examines', 0.044), ('spanning', 0.044), ('melbourne', 0.043), ('nivre', 0.043), ('connects', 0.042), ('phrases', 0.042), ('nj', 0.041), ('propagate', 0.04), ('wall', 0.04), ('surdeanu', 0.038), ('shared', 0.038), ('czech', 0.038), ('machine', 0.038), ('prone', 0.037), ('globally', 0.037), ('street', 0.036), ('green', 0.035), ('james', 0.035), ('constituent', 0.035), ('ambiguous', 0.035), ('conversion', 0.034), ('connect', 0.034), ('accuracy', 0.034), ('relations', 0.033), ('annotation', 0.033), ('replacement', 0.033), ('tested', 0.032), ('association', 0.032), ('parse', 0.031), ('greedy', 0.031), ('effects', 0.03), ('annotated', 0.03), ('trained', 0.03), ('logically', 0.029), ('msm', 0.029), ('academia', 0.029), ('icetal', 0.029), ('nodalida', 0.029), ('reliant', 0.029), ('reused', 0.029), ('tartu', 0.029), ('underspecification', 0.029), ('prague', 0.029), ('framework', 0.028), ('free', 0.028), ('extra', 0.028), ('effect', 0.028), ('galley', 0.028), ('australia', 0.028), ('sampled', 0.027), ('transfer', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

Author: Nathan Green

Abstract: Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency parse will have a cascading effect down the NLP pipeline and in the end, improve machine translation output, even with a reduction in parser accuracy that the noun phrase structure might cause. This paper examines this noun phrase structure’s effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2.43%, 0.23 Bleu score, improvement for English to Czech machine translation. .

2 0.19808505 167 acl-2011-Improving Dependency Parsing with Semantic Classes

Author: Eneko Agirre ; Kepa Bengoetxea ; Koldo Gojenola ; Joakim Nivre

Abstract: This paper presents the introduction of WordNet semantic classes in a dependency parser, obtaining improvements on the full Penn Treebank for the first time. We tried different combinations of some basic semantic classes and word sense disambiguation algorithms. Our experiments show that selecting the adequate combination of semantic features on development data is key for success. Given the basic nature of the semantic classes and word sense disambiguation algorithms used, we think there is ample room for future improvements. 1

3 0.19430453 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

Author: Alexander Volokh ; Gunter Neumann

Abstract: Annotated corpora are essential for almost all NLP applications. Whereas they are expected to be of a very high quality because of their importance for the followup developments, they still contain a considerable number of errors. With this work we want to draw attention to this fact. Additionally, we try to estimate the amount of errors and propose a method for their automatic correction. Whereas our approach is able to find only a portion of the errors that we suppose are contained in almost any annotated corpus due to the nature of the process of its creation, it has a very high precision, and thus is in any case beneficial for the quality of the corpus it is applied to. At last, we compare it to a different method for error detection in treebanks and find out that the errors that we are able to detect are mostly different and that our approaches are complementary. 1

4 0.19418043 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

Author: Yue Zhang ; Joakim Nivre

Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.

5 0.18156883 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing

Author: Jinho D. Choi ; Martha Palmer

Abstract: This paper suggests two ways of improving transition-based, non-projective dependency parsing. First, we add a transition to an existing non-projective parsing algorithm, so it can perform either projective or non-projective parsing as needed. Second, we present a bootstrapping technique that narrows down discrepancies between gold-standard and automatic parses used as features. The new addition to the algorithm shows a clear advantage in parsing speed. The bootstrapping technique gives a significant improvement to parsing accuracy, showing near state-of-theart performance with respect to other parsing approaches evaluated on the same data set.

6 0.18097512 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

7 0.17410769 333 acl-2011-Web-Scale Features for Full-Scale Parsing

8 0.16644499 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

9 0.16406831 282 acl-2011-Shift-Reduce CCG Parsing

10 0.154477 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

11 0.14143004 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

12 0.13751481 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

13 0.11865391 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

14 0.11656663 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

15 0.1119624 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

16 0.11000509 44 acl-2011-An exponential translation model for target language morphology

17 0.10680102 10 acl-2011-A Discriminative Model for Joint Morphological Disambiguation and Dependency Parsing

18 0.10586884 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

19 0.099769257 263 acl-2011-Reordering Constraint Based on Document-Level Context

20 0.093637012 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.241), (1, -0.124), (2, -0.036), (3, -0.249), (4, -0.003), (5, -0.039), (6, 0.091), (7, 0.061), (8, 0.097), (9, -0.044), (10, 0.032), (11, -0.008), (12, 0.028), (13, -0.209), (14, 0.003), (15, 0.034), (16, 0.036), (17, 0.03), (18, -0.029), (19, 0.004), (20, -0.115), (21, 0.047), (22, -0.008), (23, 0.014), (24, 0.067), (25, 0.008), (26, 0.066), (27, -0.032), (28, -0.043), (29, -0.005), (30, 0.043), (31, -0.024), (32, 0.031), (33, 0.01), (34, 0.001), (35, -0.012), (36, -0.001), (37, -0.021), (38, 0.052), (39, 0.077), (40, -0.063), (41, 0.088), (42, 0.009), (43, -0.046), (44, 0.011), (45, 0.004), (46, 0.036), (47, 0.084), (48, 0.058), (49, -0.052)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97718668 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

Author: Nathan Green

Abstract: Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency parse will have a cascading effect down the NLP pipeline and in the end, improve machine translation output, even with a reduction in parser accuracy that the noun phrase structure might cause. This paper examines this noun phrase structure’s effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2.43%, 0.23 Bleu score, improvement for English to Czech machine translation. .

2 0.84407634 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

Author: Yue Zhang ; Joakim Nivre

Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.

3 0.83843786 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

Author: Guangyou Zhou ; Jun Zhao ; Kang Liu ; Li Cai

Abstract: In this paper, we present a novel approach which incorporates the web-derived selectional preferences to improve statistical dependency parsing. Conventional selectional preference learning methods have usually focused on word-to-class relations, e.g., a verb selects as its subject a given nominal class. This paper extends previous work to wordto-word selectional preferences by using webscale data. Experiments show that web-scale data improves statistical dependency parsing, particularly for long dependency relationships. There is no data like more data, performance improves log-linearly with the number of parameters (unique N-grams). More importantly, when operating on new domains, we show that using web-derived selectional preferences is essential for achieving robust performance.

4 0.79184693 333 acl-2011-Web-Scale Features for Full-Scale Parsing

Author: Mohit Bansal ; Dan Klein

Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.

5 0.78296316 48 acl-2011-Automatic Detection and Correction of Errors in Dependency Treebanks

Author: Alexander Volokh ; Gunter Neumann

Abstract: Annotated corpora are essential for almost all NLP applications. Whereas they are expected to be of a very high quality because of their importance for the followup developments, they still contain a considerable number of errors. With this work we want to draw attention to this fact. Additionally, we try to estimate the amount of errors and propose a method for their automatic correction. Whereas our approach is able to find only a portion of the errors that we suppose are contained in almost any annotated corpus due to the nature of the process of its creation, it has a very high precision, and thus is in any case beneficial for the quality of the corpus it is applied to. At last, we compare it to a different method for error detection in treebanks and find out that the errors that we are able to detect are mostly different and that our approaches are complementary. 1

6 0.78218293 143 acl-2011-Getting the Most out of Transition-based Dependency Parsing

7 0.77703458 39 acl-2011-An Ensemble Model that Combines Syntactic and Semantic Clustering for Discriminative Dependency Parsing

8 0.7768206 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation

9 0.76022947 167 acl-2011-Improving Dependency Parsing with Semantic Classes

10 0.74128217 59 acl-2011-Better Automatic Treebank Conversion Using A Feature-Based Approach

11 0.70108575 282 acl-2011-Shift-Reduce CCG Parsing

12 0.69420761 243 acl-2011-Partial Parsing from Bitext Projections

13 0.63041604 236 acl-2011-Optimistic Backtracking - A Backtracking Overlay for Deterministic Incremental Parsing

14 0.59348232 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

15 0.58170873 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

16 0.57383472 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful

17 0.5699538 267 acl-2011-Reversible Stochastic Attribute-Value Grammars

18 0.56958157 295 acl-2011-Temporal Restricted Boltzmann Machines for Dependency Parsing

19 0.56335896 107 acl-2011-Dynamic Programming Algorithms for Transition-Based Dependency Parsers

20 0.56171304 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.029), (17, 0.07), (26, 0.019), (28, 0.01), (37, 0.176), (39, 0.066), (41, 0.085), (55, 0.011), (59, 0.036), (66, 0.194), (72, 0.017), (91, 0.044), (96, 0.121), (97, 0.049)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82861316 111 acl-2011-Effects of Noun Phrase Bracketing in Dependency Parsing and Machine Translation

Author: Nathan Green

Abstract: Flat noun phrase structure was, up until recently, the standard in annotation for the Penn Treebanks. With the recent addition of internal noun phrase annotation, dependency parsing and applications down the NLP pipeline are likely affected. Some machine translation systems, such as TectoMT, use deep syntax as a language transfer layer. It is proposed that changes to the noun phrase dependency parse will have a cascading effect down the NLP pipeline and in the end, improve machine translation output, even with a reduction in parser accuracy that the noun phrase structure might cause. This paper examines this noun phrase structure’s effect on dependency parsing, in English, with a maximum spanning tree parser and shows a 2.43%, 0.23 Bleu score, improvement for English to Czech machine translation. .

2 0.78707558 215 acl-2011-MACAON An NLP Tool Suite for Processing Word Lattices

Author: Alexis Nasr ; Frederic Bechet ; Jean-Francois Rey ; Benoit Favre ; Joseph Le Roux

Abstract: MACAON is a tool suite for standard NLP tasks developed for French. MACAON has been designed to process both human-produced text and highly ambiguous word-lattices produced by NLP tools. MACAON is made of several native modules for common tasks such as a tokenization, a part-of-speech tagging or syntactic parsing, all communicating with each other through XML files . In addition, exchange protocols with external tools are easily definable. MACAON is a fast, modular and open tool, distributed under GNU Public License.

3 0.75705379 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

Author: Mark-Jan Nederhof ; Giorgio Satta

Abstract: We present a method for the computation of prefix probabilities for synchronous contextfree grammars. Our framework is fairly general and relies on the combination of a simple, novel grammar transformation and standard techniques to bring grammars into normal forms.

4 0.75485677 309 acl-2011-Transition-based Dependency Parsing with Rich Non-local Features

Author: Yue Zhang ; Joakim Nivre

Abstract: Transition-based dependency parsers generally use heuristic decoding algorithms but can accommodate arbitrarily rich feature representations. In this paper, we show that we can improve the accuracy of such parsers by considering even richer feature sets than those employed in previous systems. In the standard Penn Treebank setup, our novel features improve attachment score form 91.4% to 92.9%, giving the best results so far for transitionbased parsing and rivaling the best results overall. For the Chinese Treebank, they give a signficant improvement of the state of the art. An open source release of our parser is freely available.

5 0.75110221 122 acl-2011-Event Extraction as Dependency Parsing

Author: David McClosky ; Mihai Surdeanu ; Christopher Manning

Abstract: Nested event structures are a common occurrence in both open domain and domain specific extraction tasks, e.g., a “crime” event can cause a “investigation” event, which can lead to an “arrest” event. However, most current approaches address event extraction with highly local models that extract each event and argument independently. We propose a simple approach for the extraction of such structures by taking the tree of event-argument relations and using it directly as the representation in a reranking dependency parser. This provides a simple framework that captures global properties of both nested and flat event structures. We explore a rich feature space that models both the events to be parsed and context from the original supporting text. Our approach obtains competitive results in the extraction of biomedical events from the BioNLP’09 shared task with a F1 score of 53.5% in development and 48.6% in testing.

6 0.74629021 92 acl-2011-Data point selection for cross-language adaptation of dependency parsers

7 0.74575335 332 acl-2011-Using Multiple Sources to Construct a Sentiment Sensitive Thesaurus for Cross-Domain Sentiment Classification

8 0.74461186 186 acl-2011-Joint Training of Dependency Parsing Filters through Latent Support Vector Machines

9 0.74394244 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

10 0.74289453 334 acl-2011-Which Noun Phrases Denote Which Concepts?

11 0.7417444 103 acl-2011-Domain Adaptation by Constraining Inter-Domain Variability of Latent Feature Representation

12 0.73931855 54 acl-2011-Automatically Extracting Polarity-Bearing Topics for Cross-Domain Sentiment Classification

13 0.73585391 85 acl-2011-Coreference Resolution with World Knowledge

14 0.73468041 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

15 0.73460442 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

16 0.73299259 204 acl-2011-Learning Word Vectors for Sentiment Analysis

17 0.73222208 127 acl-2011-Exploiting Web-Derived Selectional Preference to Improve Statistical Dependency Parsing

18 0.73222131 256 acl-2011-Query Weighting for Ranking Model Adaptation

19 0.7308346 73 acl-2011-Collective Classification of Congressional Floor-Debate Transcripts

20 0.7306664 230 acl-2011-Neutralizing Linguistically Problematic Annotations in Unsupervised Dependency Parsing Evaluation