acl acl2010 acl2010-163 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jinsong Su ; Yang Liu ; Yajuan Lv ; Haitao Mi ; Qun Liu
Abstract: Lexicalized reordering models play a crucial role in phrase-based translation systems. They are usually learned from the word-aligned bilingual corpus by examining the reordering relations of adjacent phrases. Instead of just checking whether there is one phrase adjacent to a given phrase, we argue that it is important to take the number of adjacent phrases into account for better estimations of reordering models. We propose to use a structure named reordering graph, which represents all phrase segmentations of a sentence pair, to learn lexicalized reordering models efficiently. Experimental results on the NIST Chinese-English test sets show that our approach significantly outperforms the baseline method. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Box 2704, Beijing 100190, China {su j ins ong yliu lvya j uan htmi l iuqun} @ i . [sent-3, score-0.085]
2 cn ct , , , , Abstract Lexicalized reordering models play a crucial role in phrase-based translation systems. [sent-5, score-0.728]
3 They are usually learned from the word-aligned bilingual corpus by examining the reordering relations of adjacent phrases. [sent-6, score-1.109]
4 Instead of just checking whether there is one phrase adjacent to a given phrase, we argue that it is important to take the number of adjacent phrases into account for better estimations of reordering models. [sent-7, score-1.142]
5 We propose to use a structure named reordering graph, which represents all phrase segmentations of a sentence pair, to learn lexicalized reordering models efficiently. [sent-8, score-1.634]
6 1 Introduction Phrase-based translation systems (Koehn et al. [sent-10, score-0.034]
7 , 2003; Och and Ney, 2004) prove to be the stateof-the-art as they have delivered translation performance in recent machine translation evaluations. [sent-11, score-0.068]
8 While excelling at memorizing local translation and reordering, phrase-based systems have difficulties in modeling permutations among phrases. [sent-12, score-0.054]
9 As a result, it is important to develop effective reordering models to capture such non-local reordering. [sent-13, score-0.694]
10 , 2003) applies a simple distance-based distortion penalty to model the phrase movements. [sent-15, score-0.125]
11 More recently, many researchers have presented lexicalized reordering models that take advantage of lexical information to predict reordering (Tillmann, 2004; Xiong et al. [sent-16, score-1.505]
12 , 2006; Zens and Ney, 2006; Koehn et 12 Figure 1: Occurrence of a swap with different numbers of adjacent bilingual phrases: only one phrase in (a) and three phrases in (b). [sent-17, score-0.756]
13 Black squares denote word alignments and gray rectangles denote bilingual phrases. [sent-18, score-0.471]
14 [s,t] indicates the target-side span of bilingual phrase bp and [u,v] represents the source-side span of bilingual phrase bp. [sent-19, score-1.348]
15 These models are learned from a word-aligned corpus to predict three orientations of a phrase pair with respect to the previous bilingual phrase: monotone (M), swap (S), and discontinuous (D). [sent-22, score-0.783]
16 Take the bilingual phrase bp in Figure 1(a) for example. [sent-23, score-0.859]
17 , 2007) analyzes the word alignments at positions (s −1, u −1) laynzde (s − 1, v + 1). [sent-25, score-0.059]
18 nThtse a to prioesnittaiotinosn ( sof− bp uis− s1e)t taon dD ( s be −cau 1s,ev t +he position (s − 1, v + 1) contains no Dwo bredc alignment. [sent-26, score-0.478]
19 iTtihoen phrase-based reordering model (Tillmann, 2004) determines the presence of the adjacent bilingual phrase located in position (s −1, v + 1) and then treats the orientation of bp as (Ss. [sent-27, score-1.784]
20 −Gi1v,evn + no )co anndst trhaeinnt on tmsa txheim ourimen phrase length, the hierarchical phrase reordering model (Galley and Manning, 2008) also analyzes the adjacent bilingual phrases for bp and identifies its orientation as S. [sent-28, score-2.017]
21 However, given a bilingual phrase, the abovementioned models just consider the presence of an adjacent bilingual phrase rather than the number of adjacent bilingual phrases. [sent-29, score-1.255]
22 0c 2 C0o1n0fe Aresnsoceci Sathio rnt f Poarp Ceorsm,p paugteastio 1n2a–l1 L6i,nguistics Figure 2: (a) A parallel Chinese-English sentence pair and (b) its corresponding reordering graph. [sent-32, score-0.756]
23 In (b), we denote each bilingual phrase with a rectangle, where the upper and bottom numbers in the brackets represent the source and target spans of this bilingual phrase respectively. [sent-33, score-0.849]
24 M = monotone (solid lines), S = swap (dotted line), and D = discontinuous (segmented lines). [sent-34, score-0.251]
25 The bilingual phrases marked in the gray constitute a reordering example. [sent-35, score-1.135]
26 In Figure 1(a), bp is in a swap order with only one bilingual phrase. [sent-37, score-0.876]
27 In Figure 1(b), bp swaps with three bilingual phrases. [sent-38, score-0.778]
28 Lexicalized reordering models do not distinguish different numbers of adjacent phrase pairs, and just give bp the same count in the swap orientation. [sent-39, score-1.579]
29 In this paper, we propose a novel method to better estimate the reordering probabilities with the consideration of varying numbers of adjacent bilingual phrases. [sent-40, score-1.223]
30 Our method uses reordering graphs to represent all phrase segmentations of parallel sentence pairs, and then gets the fractional counts of bilingual phrases for orientations from reordering graphs in an inside-outside fashion. [sent-41, score-2.218]
31 Experimental results indicate that our method achieves significant improvements over the traditional lexicalized reordering model (Koehn et al. [sent-42, score-0.878]
32 This paper is organized as follows: in Section 2, we first give a brief introduction to the traditional lexicalized reordering model. [sent-44, score-0.838]
33 Then we introduce our method to estimate the reordering probabilities from reordering graphs. [sent-45, score-1.48]
34 2 Estimation of Reordering Probabilities Based on Reordering Graph In this section, we first describe the traditional lexicalized reordering model, and then illustrate how to construct reordering graphs to estimate the reorder13 ing probabilities. [sent-48, score-1.592]
35 2 Reordering GraPph For a parallel sentence pair, its reordering graph indicates all possible translation derivations consisting of the extracted bilingual phrases. [sent-52, score-1.162]
36 To construct a reordering graph, we first extract bilingual phrases using the way of (Och, 2003). [sent-53, score-1.09]
37 Then, the adjacent bilingual phrases are linked according to the targetside order. [sent-54, score-0.548]
38 Some bilingual phrases, which have no adjacent bilingual phrases because of maximum length limitation, are linked to the nearest bilingual phrases in the target-side order. [sent-55, score-1.225]
39 With the reordering graph, we can obtain all reordering examples containing the given bilingual phrase. [sent-57, score-1.688]
40 3 Estimation of Reordering Probabilities We estimate the reordering probabilities from reordering graphs. [sent-60, score-1.462]
41 Given a parallel sentence pair, there are many translation derivations corresponding to different paths in its reordering graph. [sent-61, score-0.841]
42 Assuming all derivations have a uniform probability, the fractional counts of bilingual phrases for orientations can be calculated by utilizing an algorithm in the inside-outside fashion. [sent-62, score-0.63]
43 Given a phrase pair bp in the reordering graph, we denote the number of paths from bs to bp with α(bp). [sent-63, score-1.866]
44 It can be computed in an iterative way α(bp) = Pbp0 α(bp0), where bp0 is one of the previous bilinPgual phrases of bp and α(bs)=1 . [sent-64, score-0.552]
45 In a similar way, thPe number of paths from be to bp, notated as β(bp), is simply β(bp) = Pbp00 β(bp00), where bp00 is one of the subsequent biPlingual phrases of bp and β(be)=1. [sent-65, score-0.612]
46 Here, we show thPe α and β values of all bilingual phrases of Figure 2 in Table 1. [sent-66, score-0.396]
47 Inspired by the parsing literature on pruning 14 src spantrg spanαβ shown in Figure 2. [sent-68, score-0.019]
48 Continuing with the reordering example described above, we obtain its fractional count using the formula (3): Count(M, bp1, bp2) = (1 2)/9 = 2/9. [sent-70, score-0.869]
49 onal count of bp in the orientation o is calculated as described below: Count(o,bp) = XCount(o,bp0,bp) (4) Xbp0 For example, we compute the fractional count of bp2 in the monotone orientation by the formula (4): Count(M, bp2) = 2/9. [sent-72, score-0.981]
50 As described in the lexicalized reordering model (Section 2. [sent-73, score-0.833]
51 1), we apply the formula (2) to calculate the final reordering probabilities. [sent-74, score-0.722]
52 3 Experiments We conduct experiments to investigate the effectiveness of our method on the msd-fe reordering model and the msd-bidirectional-fe reordering model. [sent-75, score-1.447]
53 The msdfe reordering model has three features, which represent the probabilities of bilingual phrases in three orientations: monotone, swap, or discontinuous. [sent-78, score-1.164]
54 If a msd-bidirectional-fe model is used, then the number of features doubles: one for each direction. [sent-79, score-0.022]
55 1 Experiment Setup Two different sizes of training corpora are used in our experiments: one is a small-scale corpus that comes from FBIS corpus consisting of 239K bilingual sentence pairs, the other is a large-scale corpus that includes 1. [sent-81, score-0.319]
56 GIZA++ (Och and Ney, 2003) and the heuristics “grow-diag-final-and” are used to generate a word-aligned corpus, where we extract bilingual phrases with maximum length 7. [sent-86, score-0.396]
57 We use SRILM Toolkits (Stolcke, 2002) to train a 4-gram language model on the Xinhua portion of Gigaword corpus. [sent-87, score-0.022]
58 In exception to the reordering probabilities, we use the same features in the comparative experiments. [sent-88, score-0.712]
59 The translation quality is evaluated by case-insensitive BLEU-4 metric (Papineni et al. [sent-90, score-0.034]
60 Finally, we conduct paired bootstrap sampling (Koehn, 2004) to test the significance in BLEU scores differences. [sent-92, score-0.038]
61 For the msd-fe model, the BLEU scores by our method are 30. [sent-95, score-0.037]
62 For the msd-bidirectional-fe model, our method obtains BLEU scores of 30. [sent-102, score-0.037]
63 1The phrase-based lexical reordering model (Tillmann, 2004) is also closely related to our model. [sent-109, score-0.716]
64 However, due to the limit of time and space, we only use Moses-style reordering model (Koehn et al. [sent-110, score-0.716]
65 In the experiments of the msd-fe model, in exception to the MT-05 test set, our method is superior to the baseline method. [sent-145, score-0.036]
66 For the msdbidirectional-fe model, the BLEU scores produced by our approach are 33. [sent-153, score-0.019]
67 4 Conclusion and Future Work In this paper, we propose a method to improve the reordering model by considering the effect of the number of adjacent bilingual phrases on the reordering probabilities estimation. [sent-160, score-1.991]
68 Our method is also general to other lexicalized reordering models. [sent-162, score-0.829]
69 We plan to apply our method to the complex lexicalized reordering models, for example, the hierarchical reordering model (Galley and Manning, 2008) and the MEBTG reordering model (Xiong et al. [sent-163, score-2.261]
70 In addition, how to further improve the reordering model by distinguishing the derivations with different probabilities will become another study emphasis in further research. [sent-165, score-0.806]
71 Bleu: a method for automatic evaluation of machine translation. [sent-215, score-0.018]
wordName wordTfidf (topN-words)
[('reordering', 0.694), ('bp', 0.456), ('bilingual', 0.3), ('swap', 0.12), ('lexicalized', 0.117), ('adjacent', 0.115), ('phrase', 0.103), ('orientations', 0.1), ('phrases', 0.096), ('orientation', 0.094), ('koehn', 0.092), ('fractional', 0.078), ('monotone', 0.075), ('count', 0.069), ('bs', 0.065), ('tillmann', 0.06), ('rectangle', 0.056), ('discontinuous', 0.056), ('och', 0.055), ('bleu', 0.053), ('probabilities', 0.052), ('huitan', 0.05), ('hzhengshi', 0.05), ('meetingsi', 0.05), ('xiong', 0.049), ('galley', 0.047), ('gray', 0.045), ('graph', 0.044), ('fai', 0.044), ('span', 0.043), ('paths', 0.042), ('rectangles', 0.04), ('graphs', 0.038), ('derivations', 0.038), ('nist', 0.035), ('analyzes', 0.034), ('translation', 0.034), ('parallel', 0.033), ('franz', 0.033), ('thpe', 0.032), ('zens', 0.032), ('hermann', 0.03), ('josef', 0.029), ('pair', 0.029), ('formula', 0.028), ('traditional', 0.027), ('philipp', 0.026), ('ney', 0.026), ('srilm', 0.026), ('segmentations', 0.026), ('alignments', 0.025), ('qun', 0.025), ('alexandra', 0.024), ('liu', 0.022), ('china', 0.022), ('uan', 0.022), ('fines', 0.022), ('swaps', 0.022), ('abovementioned', 0.022), ('nind', 0.022), ('taon', 0.022), ('htmi', 0.022), ('dwo', 0.022), ('sof', 0.022), ('mthael', 0.022), ('yliu', 0.022), ('model', 0.022), ('numbers', 0.022), ('estimate', 0.022), ('manning', 0.021), ('papineni', 0.021), ('denote', 0.021), ('experimental', 0.02), ('shu', 0.02), ('sdm', 0.02), ('iuqun', 0.02), ('nthtse', 0.02), ('memorizing', 0.02), ('conduct', 0.019), ('scores', 0.019), ('consisting', 0.019), ('uis', 0.019), ('numerator', 0.019), ('targetside', 0.019), ('continuing', 0.019), ('squares', 0.019), ('estimations', 0.019), ('wordbased', 0.019), ('por', 0.019), ('src', 0.019), ('contracts', 0.019), ('ong', 0.019), ('exception', 0.018), ('calculated', 0.018), ('ei', 0.018), ('method', 0.018), ('charniak', 0.018), ('linked', 0.018), ('notated', 0.018), ('christoph', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs
Author: Jinsong Su ; Yang Liu ; Yajuan Lv ; Haitao Mi ; Qun Liu
Abstract: Lexicalized reordering models play a crucial role in phrase-based translation systems. They are usually learned from the word-aligned bilingual corpus by examining the reordering relations of adjacent phrases. Instead of just checking whether there is one phrase adjacent to a given phrase, we argue that it is important to take the number of adjacent phrases into account for better estimations of reordering models. We propose to use a structure named reordering graph, which represents all phrase segmentations of a sentence pair, to learn lexicalized reordering models efficiently. Experimental results on the NIST Chinese-English test sets show that our approach significantly outperforms the baseline method. 1
Author: Marine Carpuat ; Yuval Marton ; Nizar Habash
Abstract: We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT). We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.
3 0.20900388 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
Author: Hailong Cao ; Eiichiro Sumita
Abstract: Source language parse trees offer very useful but imperfect reordering constraints for statistical machine translation. A lot of effort has been made for soft applications of syntactic constraints. We alternatively propose the selective use of syntactic constraints. A classifier is built automatically to decide whether a node in the parse trees should be used as a reordering constraint or not. Using this information yields a 0.8 BLEU point improvement over a full constraint-based system.
4 0.20522971 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
Author: Bing Xiang ; Yonggang Deng ; Bowen Zhou
Abstract: We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics. We demonstrate this approach on an English-to-Pashto translation task by combining the alignments obtained from syntactic reordering, stemming, and partial words. The combined alignment outperforms the baseline alignment, with significantly higher F-scores and better transla- tion performance.
5 0.18600406 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
6 0.14504194 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
7 0.13830726 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
8 0.11207514 54 acl-2010-Boosting-Based System Combination for Machine Translation
9 0.099384114 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
10 0.099009857 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
11 0.096090861 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
12 0.095235474 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
13 0.094487354 221 acl-2010-Syntax-to-Morphology Mapping in Factored Phrase-Based Statistical Machine Translation from English to Turkish
14 0.09306211 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
15 0.091708899 198 acl-2010-Predicate Argument Structure Analysis Using Transformation Based Learning
16 0.087029509 262 acl-2010-Word Alignment with Synonym Regularization
17 0.080795802 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
18 0.079956353 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
19 0.075797394 69 acl-2010-Constituency to Dependency Translation with Forests
20 0.075327612 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
topicId topicWeight
[(0, -0.156), (1, -0.235), (2, -0.052), (3, 0.038), (4, 0.048), (5, 0.061), (6, -0.091), (7, 0.006), (8, -0.057), (9, 0.04), (10, 0.123), (11, 0.084), (12, 0.004), (13, 0.032), (14, 0.074), (15, 0.01), (16, 0.035), (17, 0.113), (18, 0.03), (19, -0.138), (20, -0.051), (21, -0.007), (22, 0.04), (23, 0.021), (24, 0.069), (25, 0.066), (26, -0.205), (27, -0.101), (28, -0.038), (29, -0.121), (30, 0.052), (31, -0.021), (32, -0.166), (33, 0.091), (34, 0.043), (35, 0.091), (36, -0.011), (37, -0.045), (38, 0.006), (39, 0.018), (40, 0.293), (41, -0.099), (42, 0.145), (43, -0.016), (44, -0.154), (45, -0.091), (46, 0.02), (47, 0.044), (48, -0.007), (49, -0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.9723025 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs
Author: Jinsong Su ; Yang Liu ; Yajuan Lv ; Haitao Mi ; Qun Liu
Abstract: Lexicalized reordering models play a crucial role in phrase-based translation systems. They are usually learned from the word-aligned bilingual corpus by examining the reordering relations of adjacent phrases. Instead of just checking whether there is one phrase adjacent to a given phrase, we argue that it is important to take the number of adjacent phrases into account for better estimations of reordering models. We propose to use a structure named reordering graph, which represents all phrase segmentations of a sentence pair, to learn lexicalized reordering models efficiently. Experimental results on the NIST Chinese-English test sets show that our approach significantly outperforms the baseline method. 1
Author: Marine Carpuat ; Yuval Marton ; Nizar Habash
Abstract: We study the challenges raised by Arabic verb and subject detection and reordering in Statistical Machine Translation (SMT). We show that post-verbal subject (VS) constructions are hard to translate because they have highly ambiguous reordering patterns when translated to English. In addition, implementing reordering is difficult because the boundaries of VS constructions are hard to detect accurately, even with a state-of-the-art Arabic dependency parser. We therefore propose to reorder VS constructions into SV order for SMT word alignment only. This strategy significantly improves BLEU and TER scores, even on a strong large-scale baseline and despite noisy parses.
3 0.59870148 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation
Author: Narges Sharif Razavian ; Stephan Vogel
Abstract: Factored Statistical Machine Translation extends the Phrase Based SMT model by allowing each word to be a vector of factors. Experiments have shown effectiveness of many factors, including the Part of Speech tags in improving the grammaticality of the output. However, high quality part of speech taggers are not available in open domain for many languages. In this paper we used fixed length word suffix as a new factor in the Factored SMT, and were able to achieve significant improvements in three set of experiments: large NIST Arabic to English system, medium WMT Spanish to English system, and small TRANSTAC English to Iraqi system. 1
4 0.56539881 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
Author: Hailong Cao ; Eiichiro Sumita
Abstract: Source language parse trees offer very useful but imperfect reordering constraints for statistical machine translation. A lot of effort has been made for soft applications of syntactic constraints. We alternatively propose the selective use of syntactic constraints. A classifier is built automatically to decide whether a node in the parse trees should be used as a reordering constraint or not. Using this information yields a 0.8 BLEU point improvement over a full constraint-based system.
5 0.55475914 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
Author: Joern Wuebker ; Arne Mauser ; Hermann Ney
Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.
7 0.41285029 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
8 0.40310261 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
10 0.37976733 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
11 0.37596738 54 acl-2010-Boosting-Based System Combination for Machine Translation
12 0.32824928 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
13 0.32187513 262 acl-2010-Word Alignment with Synonym Regularization
14 0.31486839 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
15 0.30805176 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
16 0.29664627 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
17 0.29596424 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
18 0.29413298 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation
19 0.2931456 39 acl-2010-Automatic Generation of Story Highlights
20 0.29218292 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
topicId topicWeight
[(16, 0.041), (25, 0.052), (28, 0.208), (44, 0.016), (59, 0.133), (73, 0.044), (78, 0.03), (83, 0.08), (95, 0.012), (98, 0.238)]
simIndex simValue paperId paperTitle
1 0.92866427 246 acl-2010-Unsupervised Discourse Segmentation of Documents with Inherently Parallel Structure
Author: Minwoo Jeong ; Ivan Titov
Abstract: Documents often have inherently parallel structure: they may consist of a text and commentaries, or an abstract and a body, or parts presenting alternative views on the same problem. Revealing relations between the parts by jointly segmenting and predicting links between the segments, would help to visualize such documents and construct friendlier user interfaces. To address this problem, we propose an unsupervised Bayesian model for joint discourse segmentation and alignment. We apply our method to the “English as a second language” podcast dataset where each episode is composed of two parallel parts: a story and an explanatory lecture. The predicted topical links uncover hidden re- lations between the stories and the lectures. In this domain, our method achieves competitive results, rivaling those of a previously proposed supervised technique.
2 0.91117287 86 acl-2010-Discourse Structure: Theory, Practice and Use
Author: Bonnie Webber ; Markus Egg ; Valia Kordoni
Abstract: unkown-abstract
same-paper 3 0.88806748 163 acl-2010-Learning Lexicalized Reordering Models from Reordering Graphs
Author: Jinsong Su ; Yang Liu ; Yajuan Lv ; Haitao Mi ; Qun Liu
Abstract: Lexicalized reordering models play a crucial role in phrase-based translation systems. They are usually learned from the word-aligned bilingual corpus by examining the reordering relations of adjacent phrases. Instead of just checking whether there is one phrase adjacent to a given phrase, we argue that it is important to take the number of adjacent phrases into account for better estimations of reordering models. We propose to use a structure named reordering graph, which represents all phrase segmentations of a sentence pair, to learn lexicalized reordering models efficiently. Experimental results on the NIST Chinese-English test sets show that our approach significantly outperforms the baseline method. 1
4 0.8610909 49 acl-2010-Beyond NomBank: A Study of Implicit Arguments for Nominal Predicates
Author: Matthew Gerber ; Joyce Chai
Abstract: Despite its substantial coverage, NomBank does not account for all withinsentence arguments and ignores extrasentential arguments altogether. These arguments, which we call implicit, are important to semantic processing, and their recovery could potentially benefit many NLP applications. We present a study of implicit arguments for a select group of frequent nominal predicates. We show that implicit arguments are pervasive for these predicates, adding 65% to the coverage of NomBank. We demonstrate the feasibility of recovering implicit arguments with a supervised classification model. Our results and analyses provide a baseline for future work on this emerging task.
5 0.80084443 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
Author: Jun Sun ; Min Zhang ; Chew Lim Tan
Abstract: We propose Bilingual Tree Kernels (BTKs) to capture the structural similarities across a pair of syntactic translational equivalences and apply BTKs to sub-tree alignment along with some plain features. Our study reveals that the structural features embedded in a bilingual parse tree pair are very effective for sub-tree alignment and the bilingual tree kernels can well capture such features. The experimental results show that our approach achieves a significant improvement on both gold standard tree bank and automatically parsed tree pairs against a heuristic similarity based method. We further apply the sub-tree alignment in machine translation with two methods. It is suggested that the subtree alignment benefits both phrase and syntax based systems by relaxing the constraint of the word alignment. 1
6 0.7989943 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
7 0.79760742 79 acl-2010-Cross-Lingual Latent Topic Extraction
8 0.79436409 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models
9 0.7928952 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
10 0.79039127 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
11 0.79002345 133 acl-2010-Hierarchical Search for Word Alignment
12 0.78978312 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
13 0.7866708 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
14 0.78616154 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
15 0.78515995 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
16 0.78424269 262 acl-2010-Word Alignment with Synonym Regularization
17 0.78423363 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking
18 0.78238934 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation
19 0.78189063 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
20 0.78081179 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation