acl acl2010 acl2010-52 knowledge-graph by maker-knowledge-mining

52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints


Source: pdf

Author: Wenliang Chen ; Jun'ichi Kazama ; Kentaro Torisawa

Abstract: This paper proposes a dependency parsing method that uses bilingual constraints to improve the accuracy of parsing bilingual texts (bitexts). In our method, a targetside tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. Then it is verified by checking the subtree list that is collected from large scale automatically parsed data on the target side. Our method, thus, requires gold standard trees only on the source side of a bilingual corpus in the training phase, unlike the joint parsing model, which requires gold standard trees on the both sides. Compared to the reordering constraint model, which requires the same training data as ours, our method achieved higher accuracy because ofricher bilingual constraints. Experiments on the translated portion of the Chinese Treebank show that our system outperforms monolingual parsers by 2.93 points for Chinese and 1.64 points for English.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In our method, a targetside tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. [sent-2, score-0.29]

2 Then it is verified by checking the subtree list that is collected from large scale automatically parsed data on the target side. [sent-3, score-0.931]

3 Our method, thus, requires gold standard trees only on the source side of a bilingual corpus in the training phase, unlike the joint parsing model, which requires gold standard trees on the both sides. [sent-4, score-0.568]

4 Compared to the reordering constraint model, which requires the same training data as ours, our method achieved higher accuracy because ofricher bilingual constraints. [sent-5, score-0.319]

5 1 Introduction Parsing bilingual texts (bitexts) is crucial for training machine translation systems that rely on syntactic structures on either the source side or the target side, or the both (Ding and Palmer, 2005; Nakazawa et al. [sent-9, score-0.6]

6 Bitexts could provide more information, which is useful in parsing, than a usual monolingual texts that can be called “bilingual constraints”, and we expect to obtain more accurate parsing results that can be effectively used in the training of MT systems. [sent-11, score-0.282]

7 This paper proposes a dependency parsing method, which uses the bilingual constraints that we call bilingual subtree constraints and statistics concerning the constraints estimated from large unlabeled monolingual corpora. [sent-16, score-1.732]

8 Basically, a (candidate) dependency subtree in a source-language sentence is mapped to a subtree in the corresponding target-language sentence by using word alignment and mapping rules that are automatically learned. [sent-17, score-1.809]

9 The target subtree is verified by checking the subtree list that is collected from unlabeled sentences in the target language parsed by a usual monolingual parser. [sent-18, score-1.896]

10 The result is used as additional features for the source side dependency parser. [sent-19, score-0.395]

11 In this paper, our task is to improve the source side parser with the help of the translations on the target side. [sent-20, score-0.386]

12 Many researchers have investigated the use of bilingual constraints for parsing (Burkett and Klein, 2008; Zhao et al. [sent-21, score-0.41]

13 Our method only requires dependency annotation on the source side and is much simpler and faster. [sent-26, score-0.33]

14 The input of their method is the source trees with their translation on the target side as ours, which is much easier to obtain than trees on both sides. [sent-29, score-0.394]

15 ec A2s0s1o0ci Aatsisoonci faotrio Cno fomrp Cutoamtipountaalti Loinnaglu Lisitnicgsu,ips atigcess 21–29, the target side that might be useful for ambiguity resolution. [sent-32, score-0.239]

16 Our method achieves much greater improvement because it uses the richer subtree constraints. [sent-33, score-0.691]

17 (2009) and exploits the subtree structure on the target side to provide the bilingual constraints. [sent-35, score-1.176]

18 The subtrees are extracted from large-scale auto- parsed monolingual data on the target side. [sent-36, score-0.473]

19 The main problem to be addressed is mapping words on the source side to the target subtree because there are many to many mappings and reordering problems that often occur in translation (Koehn et al. [sent-37, score-1.251]

20 Based on the mapping rules, we design a set of features for parsing models. [sent-40, score-0.338]

21 The basic idea is as follows: if the words form a subtree on one side, their corresponding words on the another side will also probably form a subtree. [sent-41, score-0.824]

22 Section 4 proposes an approach of constructing bilingual subtree constraints. [sent-51, score-0.966]

23 2 Motivation In this section, we use an example to show the idea of using the bilingual subtree constraints to improve parsing performance. [sent-54, score-1.101]

24 Suppose that we have an input sentence pair as shown in Figure 1, where the source sentence is in English, the target is in Chinese, the dashed undirected links are word alignment links, and the directed links between words indicate that they have a (candidate) dependency relation. [sent-55, score-0.491]

25 Therefore, we can use the information on the Chinese side to help disambigua- He ate the meat with a fork . [sent-58, score-0.456]

26 We verify that the corresponding words form a subtree by looking up a subtree list in Chinese (described in Section 4. [sent-71, score-1.476]

27 Then we verify that the words form a subtree by looking up the subtree list. [sent-76, score-1.439]

28 This time we can find the subtree as shown in Figure 2. [sent-77, score-0.691]

29 (eat) Figure 2: Example for a searched subtree Finally, the parser may assign “ate” to be the head of “with” based on the verification results. [sent-82, score-0.767]

30 This simple example shows how to use the subtree information on the target side. [sent-83, score-0.834]

31 3 Dependency parsing For dependency parsing, there are two main types of parsing models (Nivre and McDonald, 2008; Nivre and Kubler, 2006): transition-based (Nivre, 2003; Yamada and Matsumoto, 2003) and graphbased (McDonald et al. [sent-84, score-0.365]

32 In this paper, we employ the graph-based MST parsing model proposed by McDonald and Pereira 22 (2006), which is an extension of the projective parsing algorithm of Eisner (1996). [sent-87, score-0.256]

33 1 Parsing with monolingual features Figure 3 shows an example of dependency parsing. [sent-90, score-0.315]

34 Figure 3: Example of dependency tree In our systems, the monolingual features include the first- and second- order features presented in (McDonald et al. [sent-94, score-0.406]

35 We call the parser with the monolingual features monolingual parser. [sent-96, score-0.359]

36 2 Parsing with bilingual features In this paper, we parse source sentences with the help of their translations. [sent-98, score-0.451]

37 A set of bilingual features are designed for the parsing model. [sent-99, score-0.422]

38 1 Bilingual subtree features We design bilingual subtree features, as described in Section 4, based on the constraints between the source subtrees and the target subtrees that are verified by the subtree list on the target side. [sent-102, score-3.275]

39 The source subtrees are from the possible dependency relations. [sent-103, score-0.404]

40 4 Bilingual subtree constraints In this section, we propose an approach that uses the bilingual subtree constraints to help parse source sentences that have translations on the target side. [sent-110, score-2.017]

41 We use large-scale auto-parsed data to obtain subtrees on the target side. [sent-111, score-0.353]

42 Then we generate the mapping rules to map the source subtrees onto the extracted target subtrees. [sent-112, score-0.689]

43 Finally, we design the bilingual subtree features based on the mapping rules for the parsing model. [sent-113, score-1.332]

44 These features indicate the information of the constraints between bilingual subtrees, that are called bilingual subtree constraints. [sent-114, score-1.301]

45 (2009) propose a simple method to extract subtrees from large-scale monolingual data and use them as features to improve monolingual parsing. [sent-117, score-0.497]

46 Following their method, we parse large unannotated data with a monolingual parser and obtain a set of subtrees (STt) in the target language. [sent-118, score-0.581]

47 We encode the subtrees into string format that is expressed as st = w : hid(−w : hid)+1 , where w reexfperress to a awso srtd =in twhe : s huibdt(r−eew wan :d h ihdi)d+ +refers to the word ID of the word’s head (hid=0 means that this word is the root of a subtree). [sent-119, score-0.238]

48 Here, word ID refers to the ID (starting from 1) of a word in the subtree (words are ordered based on the positions of the original sentence). [sent-120, score-0.742]

49 If a subtree contains two nodes, we call it a bigramsubtree. [sent-125, score-0.691]

50 If a subtree contains three nodes, we call it a trigram-subtree. [sent-126, score-0.691]

51 2 Mapping rules To provide bilingual subtree constraints, we need to find the characteristics of subtree mapping for the two given languages. [sent-139, score-1.818]

52 MtoN (words) mapping means that a source subtree with M words is mapped onto a target subtree with N words. [sent-142, score-1.849]

53 For example, 2to3 means that a source bigram-subtree is mapped onto a target trigram-subtree. [sent-143, score-0.334]

54 The 3 projects signed today Figure 7: Example for relative clauses preceding the head noun 3) Genitive constructions precede head noun. [sent-174, score-0.412]

55 Since asking linguists to define the mapping rules is very expensive, we propose a simple method to easily obtain the mapping rules. [sent-180, score-0.363]

56 2 Bilingual subtree mapping To solve the mapping problems, we use a bilingual corpus, which includes sentence pairs, to automatically generate the mapping rules. [sent-183, score-1.364]

57 Figure 8: Example of auto-parsed bilingual sentence pair From these sentence pairs, we obtain subtree pairs. [sent-197, score-0.977]

58 First, we extract a subtree (sts) from a source sentence. [sent-198, score-0.806]

59 If the corresponding words form a subtree (stt) in the target sentence, sts and stt are a subtree pair. [sent-203, score-1.688]

60 That is, we have a subtree pair: “社 会(society):2-边 缘(fringe):0” and “fringe(W 2):0-of: 1-society(W 1):2”. [sent-206, score-0.691]

61 The extracted subtree pairs indicate the translation characteristics between Chinese and En- glish. [sent-207, score-0.691]

62 3 Generalized mapping rules To increase the mapping coverage, we generalize the mapping rules from the extracted subtree pairs by using the following procedure. [sent-211, score-1.204]

63 The rules are divided by “=>” into two parts: source (left) and target (right). [sent-212, score-0.315]

64 The source part is from the source subtree and the target part is from the target subtree. [sent-213, score-1.207]

65 For the target part, we use the word alignment information to represent the target words that have corresponding source words. [sent-215, score-0.486]

66 For example, we have the subtree pair: “社 会(society):2-边 缘(fringe):0” and “fringes(W 2):0-of: 1-society(W 1):2”, where “of” does not have a corresponding word, the POS tag of “社会(society)” is N, and the POS tag of “边缘(fringe)” is N. [sent-216, score-0.728]

67 The source part of the rule becomes “N:2-N:0” and the target part becomes “W 2:0-of: 1-W 1:2”. [sent-217, score-0.258]

68 The generalized mapping rules might generate incorrect target subtrees. [sent-222, score-0.361]

69 1, the generated subtrees are verified by looking up list STt before they are used in the parsing models. [sent-225, score-0.349]

70 3 Bilingual subtree features Informally, if the words form a subtree on the source side, then the corresponding words on the target side will also probably form a subtree. [sent-227, score-1.838]

71 For 25 example, in Figure 8, words “他 们(they)” and “处于(be on)” form a subtree , which is mapped onto the words “they” and “are” on the target side. [sent-228, score-0.91]

72 We now develop this idea as bilingual subtree features. [sent-230, score-0.937]

73 The conditions of generating bilingual subtree features are that at least two of these source words must have corresponding words on the target side and nouns and verbs must have corresponding words. [sent-232, score-1.43]

74 Then we obtain the corresponding target subtree based on the mapping rules. [sent-234, score-1.044]

75 Finally, we verify that the target subtree is included in STt. [sent-235, score-0.891]

76 T hose are the 3 p rojects signed today Figure 9: Example of features for parsing We consider four types of features based on 2to2, 3to3, 3to2, and 2to3 mappings. [sent-259, score-0.43]

77 In the 2to2, 3to3, and 3to2 cases, the target subtrees do not add new words. [sent-260, score-0.313]

78 1 Features for 2to2, 3to3, and 3to2 We design the features based on the mapping rules of 2to2, 3to3, and 3to2. [sent-265, score-0.284]

79 The possible relation to be verified forms source subtree “签 字(signed)/VV:2-的(NULL)/DEC:3- 项 目 (project)/NN:0” in which “项 目 (project)” is aligned to “projects” and “签 字(signed)” is aligned to “signed” as shown in Figure 9. [sent-267, score-0.984]

80 (2) Obtain target parts based on the matched mapping rules, whose source parts equal “V:2-的/DEC:3-N:0”. [sent-271, score-0.465]

81 (3) Generate possible subtrees by consider26 ing the dependency relation indicated in the target parts. [sent-274, score-0.432]

82 We generate a possible subtree “projects:0-signed: 1” from the target part “W 3:0W 1:1”, where “projects” is aligned to “项 目 (project)(W 3)” and “signed” is aligned to “签 字(signed)(W 1)”. [sent-275, score-0.972]

83 We also generate another pos- sible subtree “projects:2-signed:0” from “W 3:2W 1:0”. [sent-276, score-0.719]

84 (4) Verify that at least one of the generated possible subtrees is a target subtree, which is included in STt. [sent-277, score-0.313]

85 In the figure, “projects:0-signed: 1” is a target subtree in STt. [sent-279, score-0.834]

86 Then we obtain target parts such as “W 2:0-of/IN: 1-W 1:2”, “W 2:0-in/IN: 1W 1:2”, and so on, according to the matched mapping rules. [sent-286, score-0.366]

87 Finally, we verify that the subtree is a target subtree included in STt. [sent-292, score-1.582]

88 (2009) shows that the source subtree features (Fsrc−st) significantly improve performance. [sent-296, score-0.871]

89 The subtrees are obtained from the auto-parsed data on the source side. [sent-297, score-0.285]

90 Then they are used to verify the possible dependency relations among source words. [sent-298, score-0.291]

91 In our approach, we also use the same source subtree features described in Chen et al. [sent-299, score-0.871]

92 So the possible dependency relations are verified by the source and target subtrees. [sent-301, score-0.445]

93 5 Experiments All the bilingual data were taken from the translated portion of the Chinese Treebank (CTB) (Xue et al. [sent-305, score-0.278]

94 , 2006; DeNero and Klein, 2007) trained on a bilingual corpus having approximately 0. [sent-314, score-0.246]

95 Then we added four types of bilingual constraint features one by one to “Baseline2”. [sent-339, score-0.311]

96 36 points for the second-order model by adding all the bilingual subtree features. [sent-345, score-0.982]

97 As in the Chinese experiments, the parsers with bilingual subtree features outperformed the Baselines. [sent-352, score-1.026]

98 03tl5is17h 6 Conclusion We presented an approach using large automatically parsed monolingual data to provide bilingual subtree constraints to improve bitexts parsing. [sent-367, score-1.223]

99 Our approach remains the efficiency of monolingual parsing and exploits the subtree structure on the target side. [sent-368, score-1.076]

100 First, we may attempt to apply the bilingual subtree constraints to transition28 based parsing models (Nivre, 2003; Yamada and Matsumoto, 2003). [sent-372, score-1.101]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('subtree', 0.691), ('bilingual', 0.246), ('subtrees', 0.17), ('fork', 0.17), ('signed', 0.152), ('target', 0.143), ('mapping', 0.133), ('monolingual', 0.131), ('dependency', 0.119), ('fringe', 0.116), ('mton', 0.116), ('source', 0.115), ('parsing', 0.111), ('chinese', 0.111), ('meat', 0.102), ('side', 0.096), ('burkett', 0.093), ('ate', 0.088), ('huang', 0.086), ('wheel', 0.083), ('fringes', 0.077), ('bitexts', 0.073), ('reordering', 0.073), ('verified', 0.068), ('stt', 0.068), ('features', 0.065), ('mcdonald', 0.062), ('projects', 0.059), ('car', 0.058), ('ceremony', 0.058), ('hid', 0.058), ('sts', 0.058), ('verify', 0.057), ('rules', 0.057), ('ctb', 0.055), ('aligned', 0.055), ('nivre', 0.054), ('constraints', 0.053), ('refers', 0.051), ('precede', 0.05), ('alignment', 0.048), ('yes', 0.047), ('points', 0.045), ('head', 0.044), ('onto', 0.043), ('uas', 0.041), ('activate', 0.041), ('bies', 0.041), ('chen', 0.04), ('society', 0.04), ('obtain', 0.04), ('unannotated', 0.04), ('null', 0.04), ('kazama', 0.039), ('englishsource', 0.039), ('fbi', 0.039), ('fsrc', 0.039), ('nakazawa', 0.039), ('carreras', 0.038), ('today', 0.037), ('klein', 0.037), ('corresponding', 0.037), ('denero', 0.035), ('prepositional', 0.035), ('projective', 0.034), ('kruengkrai', 0.034), ('links', 0.033), ('mapped', 0.033), ('yamada', 0.032), ('parser', 0.032), ('translated', 0.032), ('sides', 0.031), ('encourage', 0.031), ('xue', 0.03), ('design', 0.029), ('genitive', 0.029), ('parsed', 0.029), ('proposes', 0.029), ('project', 0.029), ('generate', 0.028), ('torisawa', 0.028), ('uchimoto', 0.028), ('mst', 0.028), ('singapore', 0.027), ('span', 0.027), ('aligner', 0.026), ('liang', 0.026), ('pos', 0.026), ('tree', 0.026), ('matched', 0.026), ('constructions', 0.026), ('gigaword', 0.025), ('parse', 0.025), ('parts', 0.024), ('parsers', 0.024), ('graphbased', 0.024), ('mcnemar', 0.024), ('bitext', 0.024), ('said', 0.024), ('root', 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

Author: Wenliang Chen ; Jun'ichi Kazama ; Kentaro Torisawa

Abstract: This paper proposes a dependency parsing method that uses bilingual constraints to improve the accuracy of parsing bilingual texts (bitexts). In our method, a targetside tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. Then it is verified by checking the subtree list that is collected from large scale automatically parsed data on the target side. Our method, thus, requires gold standard trees only on the source side of a bilingual corpus in the training phase, unlike the joint parsing model, which requires gold standard trees on the both sides. Compared to the reordering constraint model, which requires the same training data as ours, our method achieved higher accuracy because ofricher bilingual constraints. Experiments on the translated portion of the Chinese Treebank show that our system outperforms monolingual parsers by 2.93 points for Chinese and 1.64 points for English.

2 0.18018249 71 acl-2010-Convolution Kernel over Packed Parse Forest

Author: Min Zhang ; Hui Zhang ; Haizhou Li

Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1

3 0.15868211 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.

4 0.14504194 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs

Author: Jinsong Su ; Yang Liu ; Yajuan Lv ; Haitao Mi ; Qun Liu

Abstract: Lexicalized reordering models play a crucial role in phrase-based translation systems. They are usually learned from the word-aligned bilingual corpus by examining the reordering relations of adjacent phrases. Instead of just checking whether there is one phrase adjacent to a given phrase, we argue that it is important to take the number of adjacent phrases into account for better estimations of reordering models. We propose to use a structure named reordering graph, which represents all phrase segmentations of a sentence pair, to learn lexicalized reordering models efficiently. Experimental results on the NIST Chinese-English test sets show that our approach significantly outperforms the baseline method. 1

5 0.14447172 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

Author: Jun Sun ; Min Zhang ; Chew Lim Tan

Abstract: We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features. The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method. We further apply the sub-tree alignment in machine translation with two methods. It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment. 1

6 0.14292061 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

7 0.13370076 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation

8 0.12311491 69 acl-2010-Constituency to Dependency Translation with Forests

9 0.1165652 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

10 0.11577257 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

11 0.11538769 99 acl-2010-Efficient Third-Order Dependency Parsers

12 0.1097828 262 acl-2010-Word Alignment with Synonym Regularization

13 0.10370117 169 acl-2010-Learning to Translate with Source and Target Syntax

14 0.1017108 125 acl-2010-Generating Templates of Entity Summaries with an Entity-Aspect Model and Pattern Mining

15 0.09608233 133 acl-2010-Hierarchical Search for Word Alignment

16 0.094449081 195 acl-2010-Phylogenetic Grammar Induction

17 0.093769513 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation

18 0.092741564 221 acl-2010-Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

19 0.092362814 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

20 0.088880166 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.228), (1, -0.164), (2, 0.038), (3, 0.052), (4, -0.028), (5, 0.007), (6, 0.028), (7, 0.013), (8, -0.125), (9, 0.126), (10, -0.071), (11, -0.053), (12, -0.024), (13, 0.089), (14, 0.102), (15, -0.032), (16, 0.037), (17, -0.036), (18, 0.105), (19, -0.155), (20, 0.006), (21, -0.112), (22, -0.013), (23, 0.013), (24, -0.053), (25, 0.072), (26, -0.115), (27, -0.038), (28, -0.056), (29, -0.087), (30, -0.09), (31, -0.05), (32, 0.106), (33, -0.047), (34, 0.074), (35, 0.021), (36, -0.047), (37, -0.032), (38, 0.025), (39, -0.019), (40, 0.093), (41, -0.079), (42, 0.021), (43, 0.053), (44, -0.009), (45, 0.025), (46, 0.13), (47, 0.066), (48, 0.029), (49, 0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95135218 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

Author: Wenliang Chen ; Jun'ichi Kazama ; Kentaro Torisawa

Abstract: This paper proposes a dependency parsing method that uses bilingual constraints to improve the accuracy of parsing bilingual texts (bitexts). In our method, a targetside tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. Then it is verified by checking the subtree list that is collected from large scale automatically parsed data on the target side. Our method, thus, requires gold standard trees only on the source side of a bilingual corpus in the training phase, unlike the joint parsing model, which requires gold standard trees on the both sides. Compared to the reordering constraint model, which requires the same training data as ours, our method achieved higher accuracy because ofricher bilingual constraints. Experiments on the translated portion of the Chinese Treebank show that our system outperforms monolingual parsers by 2.93 points for Chinese and 1.64 points for English.

2 0.63595438 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels

Author: Jun Sun ; Min Zhang ; Chew Lim Tan

Abstract: We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features. The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method. We further apply the sub-tree alignment in machine translation with two methods. It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment. 1

3 0.62103307 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.

4 0.55835992 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

5 0.54879344 99 acl-2010-Efficient Third-Order Dependency Parsers

Author: Terry Koo ; Michael Collins

Abstract: We present algorithms for higher-order dependency parsing that are “third-order” in the sense that they can evaluate substructures containing three dependencies, and “efficient” in the sense that they require only O(n4) time. Importantly, our new parsers can utilize both sibling-style and grandchild-style interactions. We evaluate our parsers on the Penn Treebank and Prague Dependency Treebank, achieving unlabeled attachment scores of 93.04% and 87.38%, respectively.

6 0.54629344 71 acl-2010-Convolution Kernel over Packed Parse Forest

7 0.54158342 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation

8 0.53791088 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

9 0.53256172 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

10 0.52953249 195 acl-2010-Phylogenetic Grammar Induction

11 0.50412875 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

12 0.49499163 262 acl-2010-Word Alignment with Synonym Regularization

13 0.49094501 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs

14 0.48707968 221 acl-2010-Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish

15 0.47965437 180 acl-2010-On Jointly Recognizing and Aligning Bilingual Named Entities

16 0.46885675 143 acl-2010-Importance of Linguistic Constraints in Statistical Dependency Parsing

17 0.46052235 79 acl-2010-Cross-Lingual Latent Topic Extraction

18 0.45871609 242 acl-2010-Tree-Based Deterministic Dependency Parsing - An Application to Nivre's Method -

19 0.42810854 169 acl-2010-Learning to Translate with Source and Target Syntax

20 0.42785805 241 acl-2010-Transition-Based Parsing with Confidence-Weighted Classification


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(7, 0.02), (14, 0.031), (18, 0.024), (25, 0.061), (26, 0.029), (42, 0.013), (43, 0.188), (44, 0.012), (59, 0.118), (73, 0.04), (76, 0.013), (78, 0.031), (83, 0.07), (84, 0.028), (98, 0.231)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93501723 67 acl-2010-Computing Weakest Readings

Author: Alexander Koller ; Stefan Thater

Abstract: We present an efficient algorithm for computing the weakest readings of semantically ambiguous sentences. A corpus-based evaluation with a large-scale grammar shows that our algorithm reduces over 80% of sentences to one or two readings, in negligible runtime, and thus makes it possible to work with semantic representations derived by deep large-scale grammars.

same-paper 2 0.87707365 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

Author: Wenliang Chen ; Jun'ichi Kazama ; Kentaro Torisawa

Abstract: This paper proposes a dependency parsing method that uses bilingual constraints to improve the accuracy of parsing bilingual texts (bitexts). In our method, a targetside tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. Then it is verified by checking the subtree list that is collected from large scale automatically parsed data on the target side. Our method, thus, requires gold standard trees only on the source side of a bilingual corpus in the training phase, unlike the joint parsing model, which requires gold standard trees on the both sides. Compared to the reordering constraint model, which requires the same training data as ours, our method achieved higher accuracy because ofricher bilingual constraints. Experiments on the translated portion of the Chinese Treebank show that our system outperforms monolingual parsers by 2.93 points for Chinese and 1.64 points for English.

3 0.85632646 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

Author: Boxing Chen ; George Foster ; Roland Kuhn

Abstract: This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. 1

4 0.8115803 79 acl-2010-Cross-Lingual Latent Topic Extraction

Author: Duo Zhang ; Qiaozhu Mei ; ChengXiang Zhai

Abstract: Probabilistic latent topic models have recently enjoyed much success in extracting and analyzing latent topics in text in an unsupervised way. One common deficiency of existing topic models, though, is that they would not work well for extracting cross-lingual latent topics simply because words in different languages generally do not co-occur with each other. In this paper, we propose a way to incorporate a bilingual dictionary into a probabilistic topic model so that we can apply topic models to extract shared latent topics in text data of different languages. Specifically, we propose a new topic model called Probabilistic Cross-Lingual Latent Semantic Analysis (PCLSA) which extends the Proba- bilistic Latent Semantic Analysis (PLSA) model by regularizing its likelihood function with soft constraints defined based on a bilingual dictionary. Both qualitative and quantitative experimental results show that the PCLSA model can effectively extract cross-lingual latent topics from multilingual text data.

5 0.81117105 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification

Author: Wenbin Jiang ; Qun Liu

Abstract: In this paper we describe an intuitionistic method for dependency parsing, where a classifier is used to determine whether a pair of words forms a dependency edge. And we also propose an effective strategy for dependency projection, where the dependency relationships of the word pairs in the source language are projected to the word pairs of the target language, leading to a set of classification instances rather than a complete tree. Experiments show that, the classifier trained on the projected classification instances significantly outperforms previous projected dependency parsers. More importantly, when this clas- , sifier is integrated into a maximum spanning tree (MST) dependency parser, obvious improvement is obtained over the MST baseline.

6 0.80888259 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

7 0.80713105 133 acl-2010-Hierarchical Search for Word Alignment

8 0.80455661 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features

9 0.8008489 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

10 0.79902214 20 acl-2010-A Transition-Based Parser for 2-Planar Dependency Structures

11 0.79728276 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages

12 0.79668021 170 acl-2010-Letter-Phoneme Alignment: An Exploration

13 0.79527587 262 acl-2010-Word Alignment with Synonym Regularization

14 0.79521263 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

15 0.79371011 253 acl-2010-Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing

16 0.79318011 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment

17 0.79250205 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

18 0.79180628 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

19 0.79152262 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs

20 0.79146743 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities