acl acl2010 acl2010-143 knowledge-graph by maker-knowledge-mining

143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

Source: pdf

Author: Bharat Ram Ambati

Abstract: Statistical systems with high accuracy are very useful in real-world applications. If these systems can capture basic linguistic information, then the usefulness of these statistical systems improve a lot. This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. We then show how the current state-ofthe-art dependency parsers violate this constraint. We present two new methods to handle this constraint. We evaluate our methods on the state-of-the-art dependency parsers for Hindi and Czech. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Importance of linguistic constraints in statistical dependency parsing Bharat Ram Ambati Language Technologies Research Centre, IIIT-Hyderabad, Gachibowli, Hyderabad, India 500032. [sent-1, score-0.359]

2 This paper is an attempt at incorporating linguistic constraints in statistical dependency parsing. [sent-7, score-0.359]

3 We consider a simple linguistic constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. [sent-8, score-0.59]

4 We first describe the importance of this constraint considering Machine Translation systems which use dependency parser output, as an example application. [sent-9, score-0.366]

5 We then show how the current state-ofthe-art dependency parsers violate this constraint. [sent-10, score-0.389]

6 We evaluate our methods on the state-of-the-art dependency parsers for Hindi and Czech. [sent-12, score-0.357]

7 Using the dependency analysis, a similar large scale annotation effort for Czech, has been the Prague Dependency Treebank (Hajicova, 1998). [sent-20, score-0.237]

8 It has been suggested that free-wordorder languages can be handled better using the dependency based framework than the constituency based one (Hudson, 1984; Shieber, 1985; Mel‟čuk, 1988, Bharati et al. [sent-22, score-0.237]

9 The basic difference between a constituent based represen- tation and a dependency representation is the lack of nonterminal nodes in the latter. [sent-24, score-0.237]

10 It is perhaps due to these reasons that the recent past has seen a surge in the development of dependency based treebanks. [sent-26, score-0.237]

11 Due to the availability of dependency treebanks, there are several recent attempts at building dependency parsers. [sent-27, score-0.501]

12 , 2007a) were held aiming at building state-of-theart dependency parsers for different languages. [sent-29, score-0.357]

13 Recently in NLP Tools Contest in ICON-2009 (Husain, 2009 and references therein), rulebased, constraint based, statistical and hybrid approaches were explored towards building dependency parsers for three Indian languages namely, Telugu, Hindi and Bangla. [sent-30, score-0.482]

14 The major limitation of both these parsers is that they won't take linguistic constraints into account explicitly. [sent-34, score-0.255]

15 If we can make these parsers handle linguistic constraints also, then they become very useful in real-world applications. [sent-36, score-0.255]

16 This paper is an effort towards incorporating linguistic constraints in statistical dependency parser. [sent-37, score-0.359]

17 We consider a simple constraint that a verb should not have multiple subjects/objects as its children. [sent-38, score-0.219]

18 In section 2, we take machine translation using dependency parser as an example and explain the need of this linguistic constraint. [sent-39, score-0.35]

19 We evaluate our approaches on the state-of-the-art dependency parsers for Hindi and Czech and analyze the results in section 4. [sent-41, score-0.396]

20 c St2u0d1e0n At Rsseosceia rtciohn W fo r k Csohmop ,u pta gteiosn 1a0l L3–in1g0u8i,stics 2 Motivation In this section we take Machine Translation (MT) systems that use dependency parser output as an example and explain the need of linguistic constraints. [sent-46, score-0.377]

21 We take a simple constraint that a verb should not have multiple subjects/objects as its children in the dependency tree. [sent-47, score-0.516]

22 Indian Language to Indian Language Machine Transtion System1 is one such MT system which uses dependency parser output. [sent-48, score-0.308]

23 b) transfer from source dependency tree to target dependency tree, and c) sentence generation from the target dependency tree. [sent-51, score-0.711]

24 In the transfer part several rules are framed based on the source language dependency tree. [sent-52, score-0.237]

25 For instance, for Telugu to Hindi MT system, based on the dependency labels of the Telugu sentence postpositions markers that need to be added to the words are decided. [sent-53, score-0.317]

26 Consider the following example, (1) Telugu: raamu oka ‘Ramu ’ ‘one ’ Hindi: raamu ne pamdu ‘fruit’ ’ eka phala tinnaadu ‘ate ’ khaayaa ‘Ramu ’ ‘ERG’ ’ ‘one ’ ‘fruit’ ‘ate ’ English: “Ramu ate a fruit”. [sent-54, score-0.25]

27 In the above Telugu sentence, „raamu‟ is the subject of the verb „tinnaadu‟. [sent-55, score-0.236]

28 If the dependency parser marks two subjects, both the words will have „ne‟ marker. [sent-57, score-0.308]

29 The dependency labels help in identifying the position of the word in the target sentence. [sent-62, score-0.341]

30 (2a) raama seba khaatha hai „Ram‟ „apple‟ „eats‟ „is‟ „Ram eats an apple‟ 1 http://sampark. [sent-64, score-1.004]

31 in/ (2b) seba raama khaatha hai „apple‟ „Ram‟ „eats‟ „is‟ ‘Ram eats an apple ’ Though the source sentence is different, the target sentence is same. [sent-67, score-1.091]

32 Even though the source sentences are different, the dependency tree is same for both the sentences. [sent-68, score-0.237]

33 In both the cases, „raama ’ is the subject and „seba‟ is the object of the verb „khaatha‟. [sent-69, score-0.325]

34 If the parser for the source sentence assigns the label „subject‟ to both „raama ’ and „seba‟, the MT system can not give the correct output. [sent-71, score-0.185]

35 There were some attempts at handling these kind of linguistic constraints using integer programming approaches (Riedel et al. [sent-72, score-0.173]

36 In these approaches dependency parsing is formulated as solving an integer program as McDonald et al. [sent-75, score-0.344]

37 All the linguistic constraints are encoded as constraints while solving the integer program. [sent-77, score-0.186]

38 The parse with satisfies all the constraints is considered as the dependency tree for the sentence. [sent-79, score-0.289]

39 In the following section, we describe two new approaches to avoid multiple subjects/objects for a verb. [sent-80, score-0.131]

40 3 Approaches In this section, we describe the two different approaches for avoiding the cases of a verb having multiple subjects/objects as its children in the dependency tree. [sent-81, score-0.524]

41 Instead of first best dependency label, we extract the k-best labels for each token in the sentence. [sent-84, score-0.317]

42 For each verb in the sentence, we check if there are multiple children with the dependency label „subject‟. [sent-85, score-0.58]

43 If there are any such cases, we extract the list of all the children with label „subject‟. [sent-86, score-0.208]

44 For the rest of the nodes in this list second best label and remove the from their respective k-best list check recursively, till all such we assign the first best label of labels. [sent-89, score-0.296]

45 Main criterion to avoid multiple subjects/objects in this approach is position of the node in the sentence. [sent-92, score-0.188]

46 3: raama seba khaatha hai „Ram‟ „apple‟ „eats‟ „is‟ „Ram eats an apple‟ Suppose the parser assigns the label „subject‟ to both the nouns, „raama‟ and „seba‟. [sent-94, score-1.189]

47 Then naive approach assigns the label subject to „raama‟ and second best label to „seba‟ as „raama‟ precedes „seba‟. [sent-95, score-0.47]

48 In this manner we can avoid a verb having multiple children with dependency labels subject/object. [sent-96, score-0.577]

49 So, if a verb has multiple subjects, based on position we can say that the node that occurs first will be the subject. [sent-102, score-0.195]

50 In both these examples, „raama‟ is the subject of the verb „khaatha‟ and „seba‟ is the object of the verb „khaatha‟. [sent-105, score-0.401]

51 NA can correctly identify „raama‟ as the subject in case of (2a). [sent-110, score-0.16]

52 2 Probabilistic Approach (PA) The probabilistic approach is similar to naive approach except that the main criterion to avoid multiple subjects/objects in this approach is probability of the node having a particular label. [sent-114, score-0.227]

53 Whereas in naive approach, position of the node is the main criterion to avoid multiple subjects/objects. [sent-115, score-0.251]

54 In this approach, for each node in the sentence, we extract the k-best labels along with their probabilities. [sent-116, score-0.122]

55 Similar to NA, we first check for each verb if there are multiple children with the dependency label „subject‟. [sent-117, score-0.58]

56 If there are any such cases, we extract the list of all the children with label „subject‟. [sent-118, score-0.208]

57 For the rest of the nodes in this list we assign the second best label and remove the first best label from their respective k-best list of labels. [sent-121, score-0.232]

58 We check recursively, till all such instances are avoided. [sent-122, score-0.126]

59 Probability of „raama‟ being a subject will be more than „seba‟ being a subject. [sent-126, score-0.16]

60 So, the probabilistic approach correctly marks „raama‟ as subject in both (2a) and (2b). [sent-127, score-0.16]

61 But, NA couldn't iden- tify „raama‟ as subject in (2b). [sent-128, score-0.16]

62 4 Experiments We evaluate our approaches on the state-of-theart parsers for two languages namely, Hindi and Czech. [sent-129, score-0.159]

63 First we calculate the instances of multiple subjects/objects in the output of the state-ofthe-art parsers for these two languages. [sent-130, score-0.262]

64 1 Hindi Recently in NLP Tools Contest in ICON-2009 (Husain, 2009 and references herein), rule-based, constraint based, statistical and hybrid approaches were explored for parsing Hindi. [sent-133, score-0.153]

65 All these attempts were at finding the inter-chunk dependency relations, given gold-standard POS and chunk tags. [sent-134, score-0.237]

66 For Hindi, dependency annotation is done using paninian framework (Begum et al. [sent-142, score-0.285]

67 So, in Hindi, the equivalent labels for subject and object are „karta (k1)‟ and „karma (k2)‟. [sent-145, score-0.329]

68 k2 behaves similar to object and patient (Bharati et al. [sent-148, score-0.123]

69 Thus we consider only k1 and k2 labels which are equivalent of subject and direct object. [sent-160, score-0.272]

70 Annotation scheme is such that there wouldn‟t be multiple subjects/objects for a verb in any case (Bharati et al. [sent-161, score-0.129]

71 For example, even in case of coordination, coordinating con- junction is the head and conjuncts are children of the coordinating conjunction. [sent-163, score-0.26]

72 The coordinating conjunction is attached to the verb with k1/k2 label and the conjuncts get attached to the coordinating conjunction with a dependency label „ccof‟. [sent-164, score-0.661]

73 In the output of Malt, there are 39 instances of multiple subjects/objects. [sent-168, score-0.142]

74 Because of this output of MST has higher number of instances of multiple subjects/objects than Malt. [sent-172, score-0.142]

75 Total Instances Malt 39 MST + MAXENT 51 Table 1: Number of instances of multiple subjects or objects in the output of the state-of-the-art parsers for Hindi Both the parsers output first best label for each node in the sentence. [sent-173, score-0.578]

76 In case of Malt, we modified the implementation to extract all the possible dependency labels with their scores. [sent-174, score-0.317]

77 Though interpreting the scores provided by libsvm as probabilities is not the correct way, that is the only option currently available with Malt. [sent-176, score-0.177]

78 We applied both the naive and probabilistic approaches to avoid multiple subjects/objects. [sent-179, score-0.194]

79 One reason is because of more number of instances of multiple subjects/objects in case of MST+MAXENT. [sent-198, score-0.115]

80 The minor variation of the baseline results from the results of CoNLL2007 shared task is due to different version Malt parser being used. [sent-208, score-0.142]

81 In the output of Malt, there are 39 instances of multiple subjects/objects out of 286 sentences in the testing data. [sent-210, score-0.142]

82 In case of Czech, the equivalent labels for subject and object are „agent‟ and „theme‟. [sent-211, score-0.329]

83 We explain the reason for this using the following example, consider a verb „V‟ has two children „C1 ‟ and „C2‟ with dependency label subject. [sent-216, score-0.527]

84 Assume that the label for „C1 ‟ is subject and the label of „C2‟ is object in the golddata. [sent-217, score-0.429]

85 As the parser marked „C1 ‟ with subject, this 106 adds to the accuracy of the parser. [sent-218, score-0.131]

86 While avoiding multiple subjects, if „C1 ‟ is marked as subject, then the accuracy doesn't drop. [sent-219, score-0.14]

87 If „C2‟ is marked as object then the accuracy increases. [sent-220, score-0.149]

88 But, if „C2‟ is marked as subject and „C1 ‟ is marked as object then the accuracy drops. [sent-221, score-0.34]

89 This could happen if probability of „C1 ‟ having subject as label is lower than „C1 ‟ having subject as the label. [sent-222, score-0.41]

90 This is because of two reasons, (a) parser itself wrongly predicted the probabilities, and (b) parser predicted correctly, but due to the limitation of libsvm, we couldn't get the scores correctly. [sent-223, score-0.183]

91 We couldn‟t able to achieve any improvement in case of Czech due to the limitation of libsvm learner used in Malt. [sent-231, score-0.143]

92 Settings of MST parser are available only for CoNLL-X shared task data sets. [sent-233, score-0.11]

93 Malt has the limitation for extracting probabilities due to libsvm learner. [sent-235, score-0.173]

94 Currently, we are handling only two labels, subject and object. [sent-241, score-0.16]

95 Apart from subject and object there can be other labels for which multiple instances for a single verb is not valid. [sent-242, score-0.52]

96 We can extend our approaches to handle such labels also. [sent-243, score-0.16]

97 We tried to incorporate one simple linguistic constraint in the statistical dependency parsers. [sent-244, score-0.337]

98 In this paper, we presented a new method of incorporating linguistic constraints into the statistical dependency parsers. [sent-248, score-0.359]

99 We took a simple constraint that a verb should not have multiple subjects/objects as its children. [sent-249, score-0.187]

100 We evaluated our approaches on state-of-the-art dependency parsers for Hindi and Czech. [sent-251, score-0.396]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('raama', 0.357), ('malt', 0.34), ('seba', 0.31), ('mst', 0.305), ('hindi', 0.275), ('dependency', 0.237), ('bharati', 0.167), ('subject', 0.16), ('eats', 0.143), ('khaatha', 0.143), ('parsers', 0.12), ('husain', 0.119), ('telugu', 0.119), ('indian', 0.109), ('ram', 0.105), ('ambati', 0.102), ('libsvm', 0.102), ('nivre', 0.099), ('children', 0.092), ('label', 0.09), ('object', 0.089), ('apple', 0.087), ('labels', 0.08), ('contest', 0.077), ('verb', 0.076), ('raamu', 0.071), ('ramu', 0.071), ('parser', 0.071), ('las', 0.068), ('mcdonald', 0.067), ('coordinating', 0.065), ('maxent', 0.064), ('naive', 0.063), ('czech', 0.063), ('instances', 0.062), ('constraint', 0.058), ('nilsson', 0.054), ('riedel', 0.054), ('fruit', 0.054), ('multiple', 0.053), ('constraints', 0.052), ('na', 0.051), ('liblinear', 0.051), ('hai', 0.051), ('mt', 0.05), ('begum', 0.048), ('karta', 0.048), ('mel', 0.048), ('paninian', 0.048), ('tinnaadu', 0.048), ('option', 0.045), ('pa', 0.043), ('precedes', 0.043), ('node', 0.042), ('hall', 0.042), ('karma', 0.042), ('sharma', 0.042), ('linguistic', 0.042), ('limitation', 0.041), ('handle', 0.041), ('tenth', 0.041), ('integer', 0.04), ('shared', 0.039), ('avoid', 0.039), ('approaches', 0.039), ('conjuncts', 0.038), ('subjects', 0.037), ('ate', 0.036), ('attachment', 0.034), ('behaves', 0.034), ('eryigit', 0.034), ('conll', 0.033), ('consider', 0.032), ('till', 0.032), ('violate', 0.032), ('check', 0.032), ('version', 0.032), ('india', 0.031), ('marked', 0.031), ('criterion', 0.03), ('latest', 0.03), ('buchholz', 0.03), ('replicated', 0.03), ('hyderabad', 0.03), ('probabilities', 0.03), ('accuracy', 0.029), ('tools', 0.028), ('incorporating', 0.028), ('parsing', 0.028), ('hybrid', 0.028), ('usefulness', 0.028), ('avoiding', 0.027), ('suppose', 0.027), ('output', 0.027), ('availability', 0.027), ('list', 0.026), ('assigns', 0.024), ('ne', 0.024), ('treebank', 0.024), ('position', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999946 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

Author: Bharat Ram Ambati

2 0.22208109 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.

3 0.11854349 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

Author: Martin Haulrich

Abstract: We show that using confidence-weighted classification in transition-based parsing gives results comparable to using SVMs with faster training and parsing time. We also compare with other online learning algorithms and investigate the effect of pruning features when using confidenceweighted classification.

4 0.11820351 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration

Author: Nadir Durrani ; Hassan Sajjad ; Alexander Fraser ; Helmut Schmid

Abstract: We present a novel approach to integrate transliteration into Hindi-to-Urdu statistical machine translation. We propose two probabilistic models, based on conditional and joint probability formulations, that are novel solutions to the problem. Our models consider both transliteration and translation when translating a particular Hindi word given the context whereas in previous work transliteration is only used for translating OOV (out-of-vocabulary) words. We use transliteration as a tool for disambiguation of Hindi homonyms which can be both translated or transliterated or transliterated differently based on different contexts. We obtain final BLEU scores of 19.35 (conditional prob- ability model) and 19.00 (joint probability model) as compared to 14.30 for a baseline phrase-based system and 16.25 for a system which transliterates OOV words in the baseline system. This indicates that transliteration is useful for more than only translating OOV words for language pairs like Hindi-Urdu.

5 0.11474477 99 acl-2010-Efficient Third-Order Dependency Parsers

Author: Terry Koo ; Michael Collins

Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.

6 0.1138797 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

7 0.11014483 130 acl-2010-Hard Constraints for Grammatical Function Labelling

8 0.1021715 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

9 0.093602106 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

10 0.085386731 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -

11 0.084794238 69 acl-2010-Constituency to Dependency Translation with Forests

12 0.083844133 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

13 0.081471168 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

14 0.079862408 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling

15 0.065497436 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns

16 0.062832102 214 acl-2010-Sparsity in Dependency Grammar Induction

17 0.058152907 195 acl-2010-Phylogenetic Grammar Induction

18 0.056301143 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

19 0.056293212 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

20 0.055947378 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.157), (1, -0.031), (2, 0.061), (3, 0.026), (4, -0.048), (5, -0.042), (6, 0.073), (7, -0.02), (8, -0.061), (9, 0.195), (10, -0.133), (11, 0.053), (12, -0.057), (13, 0.129), (14, 0.147), (15, -0.047), (16, -0.006), (17, 0.048), (18, 0.033), (19, -0.057), (20, 0.01), (21, 0.02), (22, 0.084), (23, -0.087), (24, 0.077), (25, -0.056), (26, 0.047), (27, 0.013), (28, -0.06), (29, 0.019), (30, -0.07), (31, -0.052), (32, -0.039), (33, -0.05), (34, -0.024), (35, -0.016), (36, -0.118), (37, -0.023), (38, -0.02), (39, -0.172), (40, -0.15), (41, -0.042), (42, -0.02), (43, 0.015), (44, 0.072), (45, -0.109), (46, 0.026), (47, -0.1), (48, -0.026), (49, 0.073)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95525712 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

Author: Bharat Ram Ambati

2 0.80137312 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

3 0.66712868 99 acl-2010-Efficient Third-Order Dependency Parsers

Author: Terry Koo ; Michael Collins

4 0.60439807 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations

Author: Markus Dickinson

Abstract: We outline different methods to detect errors in automatically-parsed dependency corpora, by comparing so-called dependency rules to their representation in the training data and flagging anomalous ones. By comparing each new rule to every relevant rule from training, we can identify parts of parse trees which are likely erroneous. Even the relatively simple methods of comparison we propose show promise for speeding up the annotation process. 1 Introduction and Motivation Given the need for high-quality dependency parses in applications such as statistical machine translation (Xu et al., 2009), natural language generation (Wan et al., 2009), and text summarization evaluation (Owczarzak, 2009), there is a corresponding need for high-quality dependency annotation, for the training and evaluation of dependency parsers (Buchholz and Marsi, 2006). Furthermore, parsing accuracy degrades unless sufficient amounts of labeled training data from the same domain are available (e.g., Gildea, 2001 ; Sekine, 1997), and thus we need larger and more varied annotated treebanks, covering a wide range of domains. However, there is a bottleneck in obtaining annotation, due to the need for manual intervention in annotating a treebank. One approach is to develop automatically-parsed corpora (van Noord and Bouma, 2009), but a natural disadvantage with such data is that it contains parsing errors. Identifying the most problematic parses for human post-processing could combine the benefits of automatic and manual annotation, by allowing a human annotator to efficiently correct automatic errors. We thus set out in this paper to detect errors in automatically-parsed data. If annotated corpora are to grow in scale and retain a high quality, annotation errors which arise from automatic processing must be minimized, as errors have a negative impact on training and eval- uation of NLP technology (see discussion and references in Boyd et al., 2008, sec. 1). There is work on detecting errors in dependency corpus annotation (Boyd et al., 2008), but this is based on finding inconsistencies in annotation for identical recurring strings. This emphasis on identical strings can result in high precision, but many strings do not recur, negatively impacting the recall of error detection. Furthermore, since the same strings often receive the same automatic parse, the types of inconsistencies detected are likely to have resulted from manual annotation. While we can build from the insight that simple methods can provide reliable annotation checks, we need an approach which relies on more general properties of the dependency structures, in order to develop techniques which work for automatically-parsed corpora. Developing techniques to detect errors in parses in a way which is independent of corpus and parser has fairly broad implications. By using only the information available in a training corpus, the methods we explore are applicable to annotation error detection for either hand-annotated or automatically-parsed corpora and can also provide insights for parse reranking (e.g., Hall and Nov a´k, 2005) or parse revision (Attardi and Ciaramita, 2007). Although we focus only on detecting errors in automatically-parsed data, similar techniques have been applied for hand-annotated data (Dickinson, 2008; Dickinson and Foster, 2009). Our general approach is based on extracting a grammar from an annotated corpus and comparing dependency rules in a new (automaticallyannotated) corpus to the grammar. Roughly speaking, if a dependency rule—which represents all the dependents of a head together (see section 3. 1)— does not fit well with the grammar, it is flagged as potentially erroneous. The methods do not have to be retrained for a given parser’s output (e.g., 729 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h.e ?c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 729–738, Campbell and Johnson, 2002), but work by comparing any tree to what is in the training grammar (cf. also approaches stacking hand-written rules on top of other parsers (Bick, 2007)). We propose to flag erroneous parse rules, using information which reflects different grammatical properties: POS lookup, bigram information, and full rule comparisons. We build on a method to detect so-called ad hoc rules, as described in section 2, and then turn to the main approaches in section 3. After a discussion of a simple way to flag POS anomalies in section 4, we evaluate the different methods in section 5, using the outputs from two different parsers. The methodology proposed in this paper is easy to implement and independent of corpus, language, or parser. 2 Approach We take as a starting point two methods for detecting ad hoc rules in constituency annotation (Dickinson, 2008). Ad hoc rules are CFG productions extracted from a treebank which are “used for specific constructions and unlikely to be used again,” indicating annotation errors and rules for ungrammaticalities (see also Dickinson and Foster, 2009). Each method compares a given CFG rule to all the rules in a treebank grammar. Based on the number of similar rules, a score is assigned, and rules with the lowest scores are flagged as potentially ad hoc. This procedure is applicable whether the rules in question are from a new data set—as in this paper, where parses are compared to a training data grammar—or drawn from the treebank grammar itself (i.e., an internal consistency check). The two methods differ in how the comparisons are done. First, the bigram method abstracts a rule to its bigrams. Thus, a rule such as NP → rJJu NeN to provides support fso,r aN rPu → uDcTh aJJs J NJ NN, iJnJ NthNat pitr vshidareess tuhpep oJrJt NfoNr sequence. By contrast, in the other method, which we call the whole rule method,1 a rule is compared in its totality to the grammar rules, using Levenshtein distance. There is no abstraction, meaning all elements are present—e.g., NP → DT JJ JJ NN is very similar to eNsePn → eD.gT. ,J NJ PN N→ b DeTcau JsJe J Jth Ne sequences mdiiflfearr by only one category. While previously used for constituencies, what is at issue is simply the valency of a rule, where by valency we refer to a head and its entire set 1This is referred to whole daughters in Dickinson (2008), but the meaning of “daughters” is less clear for dependencies. of arguments and adjuncts (cf. Przepi´ orkowski, 2006)—that is, a head and all its dependents. The methods work because we expect there to be regularities in valency structure in a treebank grammar; non-conformity to such regularities indicates a potential problem. 3 Ad hoc rule detection 3.1 An appropriate representation To capture valency, consider the dependency tree from the Talbanken05 corpus (Nilsson and Hall, 2005) in figure 1, for the Swedish sentence in (1), which has four dependency pairs.2 (1) Det g a˚r bara inte ihop . it goes just not together ‘It just doesn’t add up.’ SS MA NA PL Det g a˚r bara inte ihop PO VV AB AB AB Figure 1: Dependency graph example On a par with constituency rules, we define a grammar rule as a dependency relation rewriting as a head with its sequence of POS/dependent pairs (cf. Kuhlmann and Satta, 2009), as in figure 2. This representation supports the detection of idiosyncracies in valency.3 1. 12.. 23.. 34.. TOP → root ROOT:VV TROOPOT → → SoSt R:POOO VT:VV MVA:AB NA:AB PL:AB RSSO → P →O :5A. BN AN → ABB P SMSA → → AOB 56.. NPLA → A ABB Figure 2: Rule representation for (1) For example, for the ROOT category, the head is a verb (VV), and it has 4 dependents. The extent to which this rule is odd depends upon whether comparable rules—i.e., other ROOT rules or other VV rules (see section 3.2)—have a similar set of dependents. While many of the other rules seem rather spare, they provide useful information, showing categories which have no dependents. With a TOP rule, we have a rule for every 2Category definitions are in appendix A. 3Valency is difficult to define for coordination and is specific to an annotation scheme. We leave this for the future. 730 head, including the virtual root. Thus, we can find anomalous rules such as TOP → root ROOT:AV ROOT:NN, wulheesre su multiple categories hROavOe T b:AeeVn parsed as ROOT. 3.2 Making appropriate comparisons In comparing rules, we are trying to find evidence that a particular (parsed) rule is valid by examining the evidence from the (training) grammar. Units of comparison To determine similarity, one can compare dependency relations, POS tags, or both. Valency refers to both properties, e.g., verbs which allow verbal (POS) subjects (dependency). Thus, we use the pairs of dependency relations and POS tags as the units of comparison. Flagging individual elements Previous work scored only entire rules, but some dependencies are problematic and others are not. Thus, our methods score individual elements of a rule. Comparable rules We do not want to compare a rule to all grammar rules, only to those which should have the same valents. Comparability could be defined in terms of a rule’s dependency relation (LHS) or in terms of its head. Consider the four different object (OO) rules in (2). These vary a great deal, and much of the variability comes from the fact that they are headed by different POS categories, which tend to have different selectional properties. The head POS thus seems to be predictive of a rule’s valency. (2) a. OO → PO b. OO → DT:EN AT:AJ NN ET:VV c. OO → SS:PO QV VG:VV d. OO → DT:PO AT:AJ VN But we might lose information by ignoring rules with the same left-hand side (LHS). Our approach is thus to take the greater value of scores when comparing to rules either with the same depen- dency relation or with the same head. A rule has multiple chances to prove its value, and low scores will only be for rules without any type of support. Taking these points together, for a given rule of interest r, we assign a score (S) to each element ei in r, where r = e1...em by taking the maximum of scores for rules with the same head (h) or same LHS (lhs), as in (3). For the first element in (2b), for example, S(DT:EN) = max{s(DT:EN, NN), s(DT:EN, OO)}. TTh:eE question ixs now Tho:EwN we dNe)-, fsin(De s(ei, c) fOor)} t.he T comparable sele nmowen hto c. (3) S(ei) = max{s(ei, h) , s(ei, lhs)} 3.3 Whole rule anomalies 3.3.1 Motivation The whole rule method compares a list of a rule’s dependents to rules in a database, and then flags rule elements without much support. By using all dependents as a basis for comparison, this method detects improper dependencies (e.g., an adverb modifying a noun), dependencies in the wrong overall location of a rule (e.g., an adverb before an object), and rules with unnecessarily long ar- gument structures. For example, in (4), we have an improper relation between skall (‘shall’) and sambeskattas (‘be taxed together’), as in figure 3. It is parsed as an adverb (AA), whereas it should be a verb group (VG). The rule for this part of the tree is +F → ++:++ SV AA:VV, and the AA:VV position wF i→ll b +e low-scoring b:VecVau,s aen dth teh ++:++ VSVV context does not support it. (4) Makars o¨vriga inkomster a¨r B-inkomster spouses’ other incomes are B-incomes och skall som tidigare sambeskattas . and shall as previously be taxed togeher . ‘The other incomes of spouses are B-incomes and shall, as previously, be taxed together.’ ++ +F UK KA och skall som tidigare ++ SV UK AJ VG sambeskattas VV ++ +F UK SS och skall som tidigare ++ SV UK AJ AA sambeskattas VV Figure 3: Wrong label (top=gold, bottom=parsed) 3.3.2 Implementation The method we use to determine similarity arises from considering what a rule is like without a problematic element. Consider +F → ++:++ SV pArAob:VleVm afrtiocm e figure 3, Cwohnesried eArA + Fsh →ould + +b:e+ a d SifVferent category (VG). The rule without this error, +F → ++:++ SV, starts several rules in the 731 training data, including some with VG:VV as the next item. The subrule ++:++ SV seems to be reliable, whereas the subrules containing AA:VV (++:++ AA:VV and SV AA:VV) are less reliable. We thus determine reliability by seeing how often each subsequence occurs in the training rule set. Throughout this paper, we use the term subrule to refer to a rule subsequence which is exactly one element shorter than the rule it is a component of. We examine subrules, counting their frequency as subrules, not as complete rules. For example, TOP rules with more than one dependent are problematic, e.g., TOP → root ROOT:AV ROOT:NN. Correspondingly, Pth →ere r are no rOulTe:sA wVith R OthOrTee: NeNle-. ments containing the subrule root ROOT:AV. We formalize this by setting the score s(ei, c) equal to the summation of the frequencies of all comparable subrules containing ei from the training data, as in (5), where B is the set of subrules of r with length one less. (5) s(ei, c) = Psub∈B:ei∈sub C(sub, c) For example, Pwith c = +F, the frequency of +F → ++:++ SV as a subrule is added to the scores f→or ++:++ aVnd a sS aV. s Ibnr tlheis i case, d+ tFo → ++:++ SfoVr VG:BV, +dF S → ++:++ S cVas VG:AV, a +nd+ ++F+ → ++:++ VSV, +VFG →:VV + a:l+l +ad SdV support Vfo,r a n+dF → ++:++ +SV+ being a legitimate dsdub sruuplep.o Thus, ++:++ and SV are less likely to be the sources of any problems. Since +F → SV AA:VV and +F → ++:++ mAsA.:V SVin hcaev +e very l SittVle support i ann tdhe + trFai →ning data, AA:VV receives a low score. Note that the subrule count C(sub, c) is different than counting the number of rules containing a subrule, as can be seen with identical elements. For example, for SS → VN ET:PR ET:PR, C(VN ET:PR, SS) = 2, SinS keeping wE Tith:P tRhe E fTac:Pt Rth,a Ct t(hVerNe are 2 pieces of evidence for its legitimacy. 3.4 Bigram anomalies 3.4.1 Motivation The bigram method examines relationships between adjacent sisters, complementing the whole rule method by focusing on local properties. For (6), for example, we find the gold and parsed trees in figure 4. For the long parsed rule TA → PR HinD f:igIDur HeD 4.:ID F IoRr t:hIeR lAonNg:R pOar JR:IR, ea lTl Aele →men PtRs get low whole rule scores, i.e., are flagged as potentially erroneous. But only the final elements have anomalous bigrams: HD:ID IR:IR, IR:IR AN:RO, and AN:RO JR:IR all never occur. (6) N a¨r det g ¨aller inkomst a˚ret 1971 ( when it concerns the income year 1971 ( taxerings a˚ret 1972 ) skall barnet ... assessment year 1972 ) shall the child . . . ‘Concerning the income year of 1971 (assessment year 1972), the child . . . ’ 3.4.2 Implementation To obtain a bigram score for an element, we simply add together the bigrams which contain the element in question, as in (7). (7) s(ei, c) = C(ei−1ei, c) + C(eiei+1 , c) Consider the rule from figure 4. With c = TA, the bigram HD:ID IR:IR never occurs, so both HD:ID and IR:IR get 0 added to their score. HD:ID HD:ID, however, is a frequent bigram, so it adds weight to HD:ID, i.e., positive evidence comes from the bigram on the left. If we look at IR:IR, on the other hand, IR:IR AN:RO occurs 0 times, and so IR:IR gets a total score of 0. Both scoring methods treat each element independently. Every single element could be given a low score, even though once one is corrected, another would have a higher score. Future work can examine factoring in all elements at once. 4 Additional information The methods presented so far have limited definitions of comparability. As using complementary information has been useful in, e.g., POS error detection (Loftsson, 2009), we explore other simple comparable properties of a dependency grammar. Namely, we include: a) frequency information of an overall dependency rule and b) information on how likely each dependent is to be in a relation with its head, described next. 4.1 Including POS information Consider PA → SS:NN XX:XX HV OO:VN, as iCl ounsstirdaeterd P iAn figure :5N foNr XthXe :sXeXnte HncVe OinO (8). NT,h aiss rule is entirely correct, yet the XX:XX position has low whole rule and bigram scores. (8) Uppgift om vilka orter som information of which neighborhood who har utk o¨rning finner Ni has delivery find ocks a˚ i . . . you also in . . . ‘You can also find information about which neighborhoods have delivery services in . . . ’ 732 AA HD HD DT PA IR DT AN JR ... N a¨r det g ¨aller inkomst a˚ret 1971 ( taxerings a˚ret 1972 ) ... PR ID ID RO IR NN NN RO TAHDHDPAETIRDTANJR. N a¨r det g ¨aller PR ID inkomst a˚ret ID NN 1971 ( RO IR taxerings a˚ret NN 1972 RO IR ... ) ... IR ... Figure 4: A rule with extra dependents (top=gold, bottom=parsed) ET Uppgift NN DT om vilka PR PO SS orter NN XX PA som har XX OO utk o¨rning HV VN Figure 5: Overflagging (gold=parsed) One method which does not have this problem of overflagging uses a “lexicon” of POS tag pairs, examining relations between POS, irrespective of position. We extract POS pairs, note their dependency relation, and add a L/R to the label to indicate which is the head (Boyd et al., 2008). Additionally, we note how often two POS categories occur as a non-depenency, using the label NIL, to help determine whether there should be any attachment. We generate NILs by enumerating all POS pairs in a sentence. For example, from figure 5, the parsed POS pairs include NN PR → ETL, eN 5N, t hPeO p → NIL, eStc. p We convert the frequencies to probabilities. For example, of 4 total occurrences of XX HV in the training data, 2 are XX-R (cf. figure 5). A probability of 0.5 is quite high, given that NILs are often the most frequent label for POS pairs. 5 Evaluation In evaluating the methods, our main question is: how accurate are the dependencies, in terms of both attachment and labeling? We therefore currently examine the scores for elements functioning as dependents in a rule. In figure 5, for example, for har (‘has’), we look at its score within ET → PfoRr hPAar:H (‘Vha asn’)d, not wloohken a itt iftusn scctoiornes w as a head, as in PA → SS:NN XX:XX HV OO:VN. Relatedly, for each method, we are interested in whether elements with scores below a threshold have worse attachment accuracy than scores above, as we predict they do. We can measure this by scoring each testing data position below the threshold as a 1 if it has the correct head and dependency relation and a 0 otherwise. These are simply labeled attachment scores (LAS). Scoring separately for positions above and below a threshold views the task as one of sorting parser output into two bins, those more or less likely to be correctly parsed. For development, we also report unlabeled attachement scores (UAS). Since the goal is to speed up the post-editing of corpus data by flagging erroneous rules, we also report the precision and recall for error detection. We count either attachment or labeling errors as an error, and precision and recall are measured with respect to how many errors are found below the threshold. For development, we use two Fscores to provide a measure of the settings to ex- amine across language, corpus, and parser conditions: the balanced F1 measure and the F0.5 measure, weighing precision twice as much. Precision is likely more important in this context, so as to prevent annotators from sorting through too many false positives. In practice, one way to use these methods is to start with the lowest thresholds and work upwards until there are too many non-errors. To establish a basis for comparison, we compare 733 method performance to a parser on its own.4 By examining the parser output without any automatic assistance, how often does a correction need to be made? 5.1 The data All our data comes from the CoNLL-X Shared Task (Buchholz and Marsi, 2006), specifically the 4 data sets freely available online. We use the Swedish Talbanken data (Nilsson and Hall, 2005) and the transition-based dependency parser MaltParser (Nivre et al., 2007), with the default set- tings, for developing the method. To test across languages and corpora, we use MaltParser on the other 3 corpora: the Danish DDT (Kromann, 2003), Dutch Alpino (van der Beek et al., 2002), and Portuguese Bosque data (Afonso et al., 2002). Then, we present results using the graph-based parser MSTParser (McDonald and Pereira, 2006), again with default settings, to test the methods across parsers. We use the gold standard POS tags for all experiments. 5.2 Development data In the first line of table 1, we report the baseline MaltParser accuracies on the Swedish test data, including baseline error detection precision (=1LASb), recall, and (the best) F-scores. In the rest of table 1, we report the best-performing results for each of the methods,5 providing the number of rules below and above a particular threshold, along with corresponding UAS and LAS values. To get the raw number of identified rules, multiply the number of corpus position below a threshold (b) times the error detection precision (P). For ex- × ample, the bigram method with a threshold of 39 leads to finding 283 errors (455 .622). Dependency e 2le8m3e enrrtos rws (it4h5 frequency below the lowest threshold have lower attachment scores (66.6% vs. 90. 1% LAS), showing that simply using a complete rule helps sort dependencies. However, frequency thresholds have fairly low precision, i.e., 33.4% at their best. The whole rule and bigram methods reveal greater precision in identifying problematic dependencies, isolating elements with lower UAS and LAS scores than with frequency, along with corresponding greater pre4One may also use parser confidence or parser revision methods as a basis of comparison, but we are aware of no systematic evaluation of these approaches for detecting errors. 5Freq=rule frequency, WR=whole rule, Bi=bigram, POS=POS-based (POS scores multiplied by 10,000) cision and F-scores. The bigram method is more fine-grained, identifying small numbers of rule elements at each threshold, resulting in high error detection precision. With a threshold of 39, for example, we find over a quarter of the parser errors with 62% precision, from this one piece of information. For POS information, we flag 23.6% of the cases with over 60% precision (at 81.6). Taking all these results together, we can begin to sort more reliable from less reliable dependency tree elements, using very simple information. Additionally, these methods naturally group cases together by linguistic properties (e.g., adverbialverb dependencies within a particualr context), allowing a human to uncover the principle behind parse failure and ajudicate similar cases at the same time (cf. Wallis, 2003). 5.3 Discussion Examining some of the output from the Talbanken test data by hand, we find that a prominent cause of false positives, i.e., correctly-parsed cases with low scores, stems from low-frequency dependency-POS label pairs. If the dependency rarely occurs in the training data with the particular POS, then it receives a low score, regardless of its context. For example, the parsed rule TA → IG:IG RO has a correct dependency relation (IG) G be:tIwGee RnO Oth hea aPsO aS c tags IcGt d daenpde nitsd e hnecayd RO, yet is assigned a whole rule score of 2 and a bigram score of 20. It turns out that IG:IG only occurs 144 times in the training data, and in 11 of those cases (7.6%) it appears immediately before RO. One might consider normalizing the scores based on overall frequency or adjusting the scores to account for other dependency rules in the sentence: in this case, there may be no better attachment. Other false positives are correctly-parsed elements that are a part of erroneous rules. For instance, in AA → UK:UK SS:PO TA:AJ AV SP:AJ sOtaAn:PceR, +nF A:HAV → +F:HV, Kth SeS fi:rPsOt + TFA:H:AVJ AisV correct, yet given a low score (0 whole rule, 1 bigram). The following and erroneous +F:HV is similarly given a low score. As above, such cases might be handled by looking for attachments in other rules (cf. Attardi and Ciaramita, 2007), but these cases should be relatively unproblematic for handcorrection, given the neighboring error. We also examined false negatives, i.e., errors with high scores. There are many examples of PR PA:NN rules, for instance, with the NN improp734 erly attached, but there are also many correct instances of PR PA:NN. To sort out the errors, one needs to look at lexical knowledge and/or other dependencies in the tree. With so little context, frequent rules with only one dependent are not prime candidates for our methods of error detection. 5.4 Other corpora We now turn to the parsed data from three other corpora. The Alpino and Bosque corpora are approximately the same size as Talbanken, so we use the same thresholds for them. The DDT data is approximately half the size; to adjust, we simply halve the scores. In tables 2, 3, and 4, we present the results, using the best F0.5 and F1 settings from development. At a glance, we observe that the best method differs for each corpus and depending on an emphasis of precision or recall, with the bigram method generally having high precision. For Alpino, error detection is better with frequency than, for example, bigram scores. This is likely due to the fact that Alpino has the smallest label set of any of the corpora, with only 24 dependency labels and 12 POS tags (cf. 64 and 41 in Talbanken, respectively). With a smaller label set, there are less possible bigrams that could be anomalous, but more reliable statistics about a whole rule. Likewise, with fewer possible POS tag pairs, Alpino has lower precision for the lowthreshold POS scores than the other corpora. For the whole rule scores, the DDT data is worse (compare its 46. 1% precision with Bosque’s 45.6%, with vastly different recall values), which could be due to the smaller training data. One might also consider the qualitative differences in the dependency inventory of DDT compared to the others—e.g., appositions, distinctions in names, and more types of modifiers. 5.5 MSTParser Turning to the results of running the methods on the output of MSTParser, we find similar but slightly worse values for the whole rule and bigram methods, as shown in tables 5-8. What is 735 most striking are the differences in the POS-based method for Bosque and DDT (tables 7 and 8), where a large percentage of the test corpus is underneath the threshold. MSTParser is apparently positing fewer distinct head-dependent pairs, as most of them fall under the given thresholds. With the exception of the POS-based method for DDT (where LASb is actually higher than LASa) the different methods seem to be accurate enough to be used as part of corpus post-editing. 6 Summary and Outlook We have proposed different methods for flagging the errors in automatically-parsed corpora, by treating the problem as one of looking for anoma- lous rules with respect to a treebank grammar. The different methods incorporate differing types and amounts of information, notably comparisons among dependency rules and bigrams within such rules. Using these methods, we demonstrated success in sorting well-formed output from erroneous output across language, corpora, and parsers. Given that the rule representations and comparison methods use both POS and dependency information, a next step in evaluating and improving the methods is to examine automatically POStagged data. Our methods should be able to find POS errors in addition to dependency errors. Furthermore, although we have indicated that differences in accuracy can be linked to differences in the granularity and particular distinctions of the annotation scheme, it is still an open question as to which methods work best for which schemes and for which constructions (e.g., coordination). Acknowledgments Thanks to Sandra K ¨ubler and Amber Smith for comments on an earlier draft; Yvonne Samuelsson for help with the Swedish translations; the IU Computational Linguistics discussion group for feedback; and Julia Hockenmaier, Chris Brew, and Rebecca Hwa for discussion on the general topic. A Some Talbanken05 categories Dependencies 736 References Afonso, Susana, Eckhard Bick, Renato Haber and Diana Santos (2002). Floresta Sint a´(c)tica: a treebank for Portuguese. In Proceedings of LREC 2002. Las Palmas, pp. 1698–1703. Attardi, Giuseppe and Massimiliano Ciaramita (2007). Tree Revision Learning for Dependency Parsing. In Proceedings of NAACL-HLT-07. Rochester, NY, pp. 388–395. Bick, Eckhard (2007). Hybrid Ways to Improve Domain Independence in an ML Dependency Parser. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007. Prague, Czech Republic, pp. 1119–1 123. Boyd, Adriane, Markus Dickinson and Detmar Meurers (2008). On Detecting Errors in Dependency Treebanks. Research on Language and Computation 6(2), 113–137. Buchholz, Sabine and Erwin Marsi (2006). CoNLL-X Shared Task on Multilingual Dependency Parsing. In Proceedings of CoNLL-X. New York City, pp. 149–164. Campbell, David and Stephen Johnson (2002). A transformational-based learner for dependency grammars in discharge summaries. In Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain. Phildadelphia, pp. 37–44. Dickinson, Markus (2008). Ad Hoc Treebank Structures. In Proceedings of ACL-08. Columbus, OH. Dickinson, Markus and Jennifer Foster (2009). Similarity Rules! Exploring Methods for AdHoc Rule Detection. In Proceedings of TLT-7. Groningen, The Netherlands. Gildea, Daniel (2001). Corpus Variation and Parser Performance. In Proceedings of EMNLP-01. Pittsburgh, PA. Hall, Keith and V ´aclav Nov a´k (2005). Corrective Modeling for Non-Projective Dependency Parsing. In Proceedings of IWPT-05. Vancouver, pp. 42–52. Kromann, Matthias Trautner (2003). The Danish Dependency Treebank and the underlying linguistic theory. In Proceedings of TLT-03. Kuhlmann, Marco and Giorgio Satta (2009). Treebank Grammar Techniques for Non-Projective Dependency Parsing. In Proceedings of EACL09. Athens, Greece, pp. 478–486. Loftsson, Hrafn (2009). Correcting a POS-Tagged Corpus Using Three Complementary Methods. In Proceedings of EACL-09. Athens, Greece, pp. 523–531. McDonald, Ryan and Fernando Pereira (2006). Online learning of approximate dependency parsing algorithms. In Proceedings of EACL06. Trento. Nilsson, Jens and Johan Hall (2005). Reconstruction of the Swedish Treebank Talbanken. MSI report 05067, V ¨axj¨ o University: School of Mathematics and Systems Engineering. Nivre, Joakim, Johan Hall, Jens Nilsson, Atanas Chanev, Gulsen Eryigit, Sandra K ¨ubler, Svetoslav Marinov and Erwin Marsi (2007). MaltParser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13(2), 95–135. Owczarzak, Karolina (2009). DEPEVAL(summ): Dependency-based Evaluation for Automatic Summaries. In Proceedings of ACL-AFNLP-09. Suntec, Singapore, pp. 190–198. Przepi´ orkowski, Adam (2006). What to acquire from corpora in automatic valence acquisition. In Violetta Koseska-Toszewa and Roman Roszko (eds.), Semantyka a konfrontacja jezykowa, tom 3, Warsaw: Slawistyczny O ´srodek Wydawniczy PAN, pp. 25–41. Sekine, Satoshi (1997). The Domain Dependence of Parsing. In Proceedings of ANLP-96. Washington, DC. van der Beek, Leonoor, Gosse Bouma, Robert Malouf and Gertjan van Noord (2002). The Alpino Dependency Treebank. In Proceedings of CLIN 2001. Rodopi. van Noord, Gertjan and Gosse Bouma (2009). Parsed Corpora for Linguistics. In Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?. Athens, pp. 33–39. Wallis, Sean (2003). Completing Parsed Corpora. In Anne Abeill´ e (ed.), Treebanks: Building and using syntactically annoted corpora, Dordrecht: Kluwer Academic Publishers, pp. 61–71. Wan, Stephen, Mark Dras, Robert Dale and C ´ecile Paris (2009). Improving Grammaticality in Sta737 tistical Sentence Generation: Introducing a Dependency Spanning Tree Algorithm with an Argument Satisfaction Model. In Proceedings of EACL-09. Athens, Greece, pp. 852–860. Xu, Peng, Jaeho Kang, Michael Ringgaard and Franz Och (2009). Using a Dependency Parser to Improve SMT for Subject-Object-Verb Languages. In Proceedings of NAACL-HLT-09. Boulder, Colorado, pp. 245–253. 738

5 0.58928347 12 acl-2010-A Probabilistic Generative Model for an Intermediate Constituency-Dependency Representation

Author: Federico Sangati

Abstract: We present a probabilistic model extension to the Tesni `ere Dependency Structure (TDS) framework formulated in (Sangati and Mazza, 2009). This representation incorporates aspects from both constituency and dependency theory. In addition, it makes use of junction structures to handle coordination constructions. We test our model on parsing the English Penn WSJ treebank using a re-ranking framework. This technique allows us to efficiently test our model without needing a specialized parser, and to use the standard evaluation metric on the original Phrase Structure version of the treebank. We obtain encouraging results: we achieve a small improvement over state-of-the-art results when re-ranking a small number of candidate structures, on all the evaluation metrics except for chunking.

6 0.57922608 130 acl-2010-Hard Constraints for Grammatical Function Labelling

7 0.57448095 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

8 0.55588424 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

9 0.52138919 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -

10 0.49014005 214 acl-2010-Sparsity in Dependency Grammar Induction

11 0.48467672 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

12 0.45994309 195 acl-2010-Phylogenetic Grammar Induction

13 0.45186365 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing

14 0.44810739 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

15 0.43683475 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing

16 0.41868591 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

17 0.39040866 69 acl-2010-Constituency to Dependency Translation with Forests

18 0.38850865 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration

19 0.37298852 124 acl-2010-Generating Image Descriptions Using Dependency Relational Patterns

20 0.33641949 39 acl-2010-Automatic Generation of Story Highlights

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.038), (25, 0.049), (39, 0.016), (59, 0.076), (73, 0.036), (76, 0.043), (78, 0.041), (83, 0.059), (84, 0.033), (90, 0.362), (98, 0.143)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.72563976 104 acl-2010-Evaluating Machine Translations Using mNCD

Author: Marcus Dobrinkat ; Tero Tapiovaara ; Jaakko Vayrynen ; Kimmo Kettunen

Abstract: This paper introduces mNCD, a method for automatic evaluation of machine translations. The measure is based on normalized compression distance (NCD), a general information theoretic measure of string similarity, and flexible word matching provided by stemming and synonyms. The mNCD measure outperforms NCD in system-level correlation to human judgments in English.

same-paper 2 0.70951968 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

Author: Bharat Ram Ambati

3 0.5391112 54 acl-2010-Boosting-Based System Combination for Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang

Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1

4 0.4615587 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

Author: Mohit Bansal ; Dan Klein

Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.

5 0.4595229 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

Author: Liang Huang ; Kenji Sagae

Abstract: Incremental parsing techniques such as shift-reduce have gained popularity thanks to their efficiency, but there remains a major problem: the search is greedy and only explores a tiny fraction of the whole space (even with beam search) as opposed to dynamic programming. We show that, surprisingly, dynamic programming is in fact possible for many shift-reduce parsers, by merging “equivalent” stacks based on feature values. Empirically, our algorithm yields up to a five-fold speedup over a state-of-the-art shift-reduce depen- dency parser with no loss in accuracy. Better search also leads to better learning, and our final parser outperforms all previously reported dependency parsers for English and Chinese, yet is much faster.

6 0.45745313 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

7 0.45382923 133 acl-2010-Hierarchical Search for Word Alignment

8 0.45177436 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

9 0.4506225 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD

10 0.44978058 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

11 0.44811881 79 acl-2010-Cross-Lingual Latent Topic Extraction

12 0.44781944 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

13 0.44733402 162 acl-2010-Learning Common Grammar from Multilingual Corpus

14 0.44605619 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

15 0.44536984 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

16 0.44531512 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification

17 0.44483984 202 acl-2010-Reading between the Lines: Learning to Map High-Level Instructions to Commands

18 0.44481426 71 acl-2010-Convolution Kernel over Packed Parse Forest

19 0.44480318 116 acl-2010-Finding Cognate Groups Using Phylogenies

20 0.4446348 99 acl-2010-Efficient Third-Order Dependency Parsers