acl acl2010 acl2010-133 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
Reference: text
sentIndex sentText sentNum sentScore
1 Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. [sent-2, score-0.5]
2 We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. [sent-3, score-0.572]
3 We report results on Arabic-English word alignment and translation tasks. [sent-4, score-0.425]
4 1 Introduction Automatic word alignment is generally accepted as a first step in training any statistical machine translation system. [sent-8, score-0.425]
5 Generative alignment models like IBM Model-4 (Brown et al. [sent-10, score-0.352]
6 , 1993) have been in wide use for over 15 years, and while not perfect (see Figure 1), they are completely unsupervised, requiring no annotated training data to learn alignments that have powered many current state-of-the-art translation system. [sent-11, score-0.503]
7 Today, there exist human-annotated alignments and an abundance of other information for many language pairs potentially useful for inducing accurate alignments. [sent-12, score-0.43]
8 Circles represent links in a human-annotated alignment, and black boxes represent links in the Model-4 alignment. [sent-32, score-0.286]
9 has motivated much recent work in discriminative modeling for word alignment (Moore, 2005; Ittycheriah and Roukos, 2005; Liu et al. [sent-34, score-0.449]
10 We present in this paper a discriminative alignment model trained on relatively little data, with a simple, yet powerful hierarchical search procedure. [sent-39, score-0.56]
11 Using a foreign string and an English parse tree as input, we formulate a bottom-up search on the parse tree, with the structure of the tree as a backbone for building a hypergraph of possible alignments. [sent-41, score-0.656]
12 c As2s0o1c0ia Atisosnoc foiart Cionom fopru Ctaotmiopnuatla Lti on gaulis Lti cnsg,u piasgtiecs 157–16 , ﺍﻟﺮﺍﳋﻛﺟﺒ ﻞﺰ ﺍﻟﺮﺍﳋﻛﺟﺒ ﻞﺰ ﺍﻟﺮﺍﳋﻛﺟﺒ ﻞﺰ Figure 2: Example of approximate search through a hypergraph with beam size = 5. [sent-44, score-0.325]
13 Each partial alignment at each node is ranked according to its model score. [sent-46, score-0.637]
14 In this figure, we see that the partial alignment implied by the 1-best hypothesis at the leftmost NP node is constructed by composing the best hypothesis at the terminal node labeled “the” and the 2ndbest hypothesis at the terminal node labeled “man”. [sent-47, score-0.951]
15 ) Hypotheses at the root node imply full alignment structures. [sent-49, score-0.472]
16 We handle an arbitrary number of features, compute them efficiently, and score alignments using a linear model. [sent-51, score-0.539]
17 Our model can generate arbitrary alignments and learn from arbitrary gold alignments. [sent-54, score-0.603]
18 2 Word Alignment as a Hypergraph Algorithm input The input to our alignment algorithm is a sentence-pair (en1, f1m) and a parse tree over one of the input sentences. [sent-55, score-0.498]
19 Word alignments are built bottom-up on the parse tree. [sent-61, score-0.494]
20 Each node v in the tree holds partial alignments sorted by score. [sent-62, score-0.797]
21 Figure 3: Cube pruning with alignment hypotheses to select the top-k alignments at node v with children hu1,u2i. [sent-88, score-0.958]
22 Each box represents the combination of two partial alignments to create a larger one. [sent-90, score-0.595]
23 bE oaxc his b tohxe sum osefn nthtse t scores obifn nthateio cnh oildf alignments plus a cnotsm tboin caretiaoten cost. [sent-92, score-0.43]
24 Each partial alignment comprises the columns of the alignment matrix for the e-words spanned by v, and each is scored by a linear combination of feature functions. [sent-93, score-1.011]
25 Initial partial alignments are enumerated and scored at preterminal nodes, each spanning a single column of the word alignment matrix. [sent-95, score-1.11]
26 From here, we traverse the tree nodes bottomup, combining partial alignments from child nodes until we have constructed a single full alignment at the root node of the tree. [sent-98, score-1.266]
27 1 We use one set of feature functions for preterminal nodes, and another set for nonterminal nodes. [sent-100, score-0.246]
28 This is analogous to local and nonlocal feature functions for parse-reranking used by Huang (2008). [sent-101, score-0.292]
29 Using nonlocal features at a nonterminal node emits a combination cost for composing a set of child partial alignments. [sent-102, score-0.589]
30 Because combination costs come into play, we use cube pruning (Chiang, 2007) to approximate the k-best combinations at some nonterminal node v. [sent-103, score-0.361]
31 We find the oracle for a given (T,e,f) triple by proceeding through our search algorithm, forcing ourselves to always select correct links with respect to the gold alignment when possible, breaking ties arbitrarily. [sent-109, score-0.667]
32 1 Hierarchical search Initial alignments We can construct a word alignment hierarchically, bottom-up, by making use of the structure inherent in syntactic parse trees. [sent-113, score-0.905]
33 We can think of building a word alignment as filling in an M×N matrix (Figure 1), and we begin by visiting Me×achN preterminal nroe 1de), i ann tdh we tree. [sent-114, score-0.464]
34 At this level of the tree the span size is 1, and the par- tial alignment we have made spans a single column ofthe matrix. [sent-118, score-0.493]
35 We can make many such partial alignments depending on the links selected. [sent-119, score-0.71]
36 Each partial alignment is scored and stored in a sorted heap (Lines 9 and 13). [sent-121, score-0.652]
37 We limit the number of total partial alignments αv kept at each node to k. [sent-123, score-0.715]
38 If at any time we wish to push onto the heap a new partial alignment when the heap is full, we pop the current worst off the heap and replace it with our new partial alignment if its score is better than the current worst. [sent-124, score-1.385]
39 Building the hypergraph We now visit internal nodes (Line 16) in the tree in bottom-up order. [sent-125, score-0.319]
40 At each nonterminal node v we wish to combine the partial alignments of its children u1, . [sent-126, score-0.82]
41 We use cube pruning (Chiang, 2007; Huang and Chiang, 2007) to select the k-best combinations of the partial alignments of u1,. [sent-130, score-0.768]
42 H() * Figure 4: Correct version of Figure 1 after hypergraph alignment. [sent-146, score-0.199]
43 In the general case, cube pruning will operate on a d-dimensional hypercube, where d is the branching factor of node v. [sent-149, score-0.293]
44 We cannot enumerate and score every possibility; without the cube pruning approximation, we will have kc possible combinations at each node, exploding the search space exponentially. [sent-150, score-0.321]
45 Figure 3 depicts how we select the top-k alignments at a node v from its children h u1, u2 i. [sent-151, score-0.55]
46 3 Discriminative training We incorporate all our new features into a linear model and learn weights for each using the online averaged perceptron algorithm (Collins, 2002) with a few modifications for structured outputs inspired by Chiang et al. [sent-152, score-0.209]
47 We define: 2We find empirically that using binarized trees reduces search errors in cube pruning. [sent-154, score-0.176]
48 Figure 5: A common problem with GIZA++ Model 4 alignments is a weak distortion model. [sent-165, score-0.482]
49 We select the oracle alignment according to: y+ =ay ∈rg m(ixn)γ(y) (2) where (x) is a set of hypothesis alignments generated from input x. [sent-170, score-0.862]
50 Our hierarchical search framework allows us to compute these features when needed, and affords us extra useful syntactic information. [sent-178, score-0.234]
51 Huang (2008) defines a feature h to be local if and only if it can be factored among the local productions in a tree, and non-local otherwise. [sent-180, score-0.192]
52 Analogously for alignments, our class of local features are those that can be factored among the local partial alignments competing to comprise a larger span of the matrix, and non-local otherwise. [sent-181, score-0.86]
53 These features score a set of links and the words connected by them. [sent-182, score-0.24]
54 Feature development Our features are inspired by analysis of patterns contained among our gold alignment data and automatically generated parse trees. [sent-183, score-0.549]
55 We use both local lexical and nonlocal structural features as described below. [sent-184, score-0.262]
56 Negative weights essentially penalize alignments with links never seen before in the Model 4 alignment, and positive weights encourage such links. [sent-191, score-0.545]
57 Critically, this feature tells us how much to trust alignments involving nouns, verbs, adjectives, function words, punctuation, etc. [sent-194, score-0.488]
58 from the Model 4 alignments from which our p(e | f) and p(f | e) tables are built. [sent-195, score-0.43]
59 TIan-tuitively, alignments involving English partsof-speech more likely to be content words (e. [sent-197, score-0.43]
60 ehead f 6: Features PP-NP-head, NP-DT-head, and VP-VP-head fire on these tree-alignment patterns. [sent-208, score-0.239]
61 For example, PP-NP-head fires exactly when the head of the PP is aligned to exactly the same f words as the Figure head of it’s sister NP. [sent-209, score-0.38]
62 Negative weights penalize links never seen before in a baseline alignment used to initialize lexical p(e | f) and p(f | e) tables. [sent-239, score-0.467]
63 TWheis aflesaotu irnec returns t mhee adsiustraenc oef to tshteo diagonal of the matrix for any link in a partial alignment. [sent-245, score-0.252]
64 Although local features do not know the partial alignments at other spans, they do have access to the entire English sentence at every step because our input is constant. [sent-250, score-0.734]
65 If some e exists more than once in e1n we fire this feature on all links containing word e, returning again the distance to the diagonal for that link. [sent-251, score-0.369]
66 Punctuation-mismatch fires on any link that causes nonpunctuation to be aligned to punctuation. [sent-255, score-0.227]
67 These fire for for each link (e, f) and partof-speech tag. [sent-257, score-0.203]
68 Given the tag of e, this affords the model the ability to pay more or less attention to the features described above depending on the tag given to e. [sent-259, score-0.207]
69 Any aligned words in the span of the sister NP are aligned to words following Æ? [sent-273, score-0.335]
70 2 Nonlocal features These features comprise the combination cost component of a partial alignment score and may fire when concatenating two partial alignments to create a larger span. [sent-285, score-1.461]
71 Because these features can look into any two arbitrary subtrees, they are considered nonlocal features as defined by Huang (2008). [sent-286, score-0.323]
72 Likewise, we observe the head of a VP to align to the head of an immediate sister VP. [sent-289, score-0.298]
73 , 163 In Figure 4, when the search arrives at the left-most NPB node, the NP-DT-head fea- ture will fire given this structure and links over the span [the . [sent-309, score-0.428]
74 When search arrives at the second NPB node, it will fire given the structure and links over the span [the . [sent-313, score-0.428]
75 However, we also introduce nonlocal lexicalized features for the most common types of English and foreign prepositions to also compete with these general headword features. [sent-318, score-0.381]
76 PP features PP-of-prep, PP-from-prep, PPto-prep, PP-on-prep, and PP-in-prep fire at any PP whose left child is a preposition and right child is an NP. [sent-319, score-0.347]
77 The head of the PP is one ofthe enumerated English prepositions and is aligned to any of the three most common foreign words to which it has also been observed aligned in the gold alignments. [sent-320, score-0.476]
78 The last constraint on this pattern is that all words under the span of the sister NP, if aligned, must align to words following the foreign preposition. [sent-321, score-0.357]
79 For any pair of links (ei, f) and (ej, f) in which the e words differ but the f word is the same token in each, return the tree height of first common ancestor of ei and ej. [sent-325, score-0.237]
80 This feature captures the intuition that it is much worse to align two English words at different ends of the tree to the same foreign word, than it is to align two English words under the same NP to the same foreign word. [sent-326, score-0.54]
81 To see why a string distance feature that counts only the flat horizontal distance from ei to ej is not the best strategy, consider the following. [sent-327, score-0.182]
82 We wish to align a determiner to the same f word as its sister head noun under the same NP. [sent-328, score-0.329]
83 A string distance metric, with no knowledge of the relationship between determiner and noun will levy a much heavier penalty than its tree distance analog. [sent-330, score-0.217]
84 Very recent work in word alignment has also started to report downstream effects on BLEU score. [sent-332, score-0.352]
85 DeNero and Klein (2007) refine the distortion model of an HMM aligner to reflect tree distance instead of string distance. [sent-338, score-0.217]
86 (2008) start with the output from GIZA++ Model-4 union, and focus on increasing precision by deleting links based on a linear discriminative model exposed to syntactic and lexical information. [sent-340, score-0.212]
87 6 Experiments We evaluate our model and and resulting alignments on Arabic-English data against those induced by IBM Model-4 using GIZA++ (Och and Ney, 2003) with both the union and grow-diagfinal heuristics. [sent-343, score-0.485]
88 We use 1,000 sentence pairs and gold alignments from LDC2006E86 to train model parameters: 800 sentences for training, 100 for testing, and 100 as a second held-out development set to decide when to stop perceptron training. [sent-344, score-0.586]
89 Training epoch Figure 8: Learning curves for 10 random restarts over time for parallel averaged perceptron training. [sent-347, score-0.273]
90 7 6 76 392175408Model1HM odel4 Initial alignments Figure 9: Model robustness to the initial alignments from which the p(e | f) and p(f | e) features are ndtserfirvoemd. [sent-353, score-0.932]
91 The first three columns of Table 2 show the balanced F-measure, Precision, and Recall of our alignments versus the two GIZA++ Model-4 baselines. [sent-359, score-0.43]
92 164 F P R Arabic/English # Unknown BLEU M4 (union) M4 (grow-diag-final) Hypergraph alignment Table 2: F-measure, 45. [sent-363, score-0.352]
93 Figure 8 shows the stability of the search procedure over ten random restarts of parallel averaged perceptron training with 40 CPUs. [sent-381, score-0.281]
94 Figure 9 shows the robustness of the model to initial alignments used to derive lexical features p(e | f) and p(f | e). [sent-383, score-0.502]
95 In addition to IBM Model 4, we experiment wf |it eh) alignments nfr toom IB BMMo dMelo 1d aln 4d, the HMM model. [sent-384, score-0.43]
96 In each case, we significantly outperform the baseline GIZA++ Model 4 alignments on a heldout test set. [sent-385, score-0.473]
97 We align the same core subset with our trained hypergraph alignment model, and extract a second set of translation rules. [sent-389, score-0.718]
98 4 BLEU increase over a system trained with alignments from Model-4 union. [sent-404, score-0.43]
99 7 Conclusion We have opened up the word alignment task to advances in hypergraph algorithms currently used in parsing and machine translation decoding. [sent-405, score-0.624]
100 We treat word alignment as a parsing problem, and by taking advantage of English syntax and the hypergraph structure of our search algorithm, we report significant increases in both F-measure and BLEU score over standard baselines in use by most state-of-the-art MT systems today. [sent-406, score-0.663]
wordName wordTfidf (topN-words)
[('alignments', 0.43), ('alignment', 0.352), ('hypergraph', 0.199), ('partial', 0.165), ('fire', 0.152), ('nonlocal', 0.123), ('node', 0.12), ('giza', 0.118), ('cube', 0.117), ('links', 0.115), ('foreign', 0.106), ('bleu', 0.102), ('sister', 0.098), ('discriminative', 0.097), ('perceptron', 0.095), ('align', 0.094), ('chiang', 0.093), ('aligned', 0.089), ('huang', 0.089), ('heap', 0.087), ('ehead', 0.087), ('fires', 0.087), ('tree', 0.082), ('oracle', 0.08), ('npb', 0.076), ('preterminal', 0.076), ('itg', 0.075), ('arabic', 0.074), ('translation', 0.073), ('features', 0.072), ('forest', 0.07), ('ay', 0.069), ('nonterminal', 0.068), ('beam', 0.067), ('local', 0.067), ('parse', 0.064), ('fertility', 0.062), ('gold', 0.061), ('np', 0.06), ('span', 0.059), ('search', 0.059), ('sydney', 0.059), ('feature', 0.058), ('riesa', 0.058), ('boxes', 0.056), ('pruning', 0.056), ('arbitrary', 0.056), ('vancouver', 0.056), ('union', 0.055), ('galley', 0.054), ('taskar', 0.054), ('score', 0.053), ('pp', 0.053), ('head', 0.053), ('denero', 0.052), ('distortion', 0.052), ('hierarchical', 0.052), ('moore', 0.051), ('link', 0.051), ('randomized', 0.051), ('affords', 0.051), ('epoch', 0.051), ('collins', 0.049), ('ibm', 0.048), ('scored', 0.048), ('klein', 0.048), ('determiner', 0.047), ('restarts', 0.046), ('marcu', 0.044), ('distance', 0.044), ('english', 0.044), ('functions', 0.044), ('circles', 0.043), ('arrives', 0.043), ('heldout', 0.043), ('fossum', 0.043), ('mt', 0.043), ('tag', 0.042), ('averaged', 0.042), ('headword', 0.041), ('ittycheriah', 0.041), ('preposition', 0.041), ('child', 0.041), ('return', 0.04), ('prepositions', 0.039), ('talbot', 0.039), ('aligner', 0.039), ('enumerated', 0.039), ('nns', 0.039), ('parallel', 0.039), ('iterations', 0.038), ('nodes', 0.038), ('wish', 0.037), ('terminal', 0.037), ('ej', 0.036), ('enumerate', 0.036), ('isi', 0.036), ('meeting', 0.036), ('matrix', 0.036), ('annual', 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 133 acl-2010-Hierarchical Search for Word Alignment
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
2 0.40577587 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
Author: Bing Xiang ; Yonggang Deng ; Bowen Zhou
Abstract: We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics. We demonstrate this approach on an English-to-Pashto translation task by combining the alignments obtained from syntactic reordering, stemming, and partial words. The combined alignment outperforms the baseline alignment, with significantly higher F-scores and better transla- tion performance.
3 0.38016853 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
4 0.36561239 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
Author: Vamshi Ambati ; Stephan Vogel ; Jaime Carbonell
Abstract: Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, margin- and query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.
5 0.32423636 170 acl-2010-Letter-Phoneme Alignment: An Exploration
Author: Sittichai Jiampojamarn ; Grzegorz Kondrak
Abstract: Letter-phoneme alignment is usually generated by a straightforward application of the EM algorithm. We explore several alternative alignment methods that employ phonetics, integer programming, and sets of constraints, and propose a novel approach of refining the EM alignment by aggregation of best alignments. We perform both intrinsic and extrinsic evaluation of the assortment of methods. We show that our proposed EM-Aggregation algorithm leads to the improvement of the state of the art in letter-to-phoneme conversion on several different data sets.
6 0.27734733 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
7 0.25565025 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
8 0.24875855 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
9 0.20440255 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
10 0.20328681 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
11 0.19635646 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
12 0.19334736 169 acl-2010-Learning to Translate with Source and Target Syntax
13 0.1891803 69 acl-2010-Constituency to Dependency Translation with Forests
14 0.17310935 262 acl-2010-Word Alignment with Synonym Regularization
15 0.15696515 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
16 0.15100689 54 acl-2010-Boosting-Based System Combination for Machine Translation
17 0.1369759 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
18 0.13426889 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
19 0.13389474 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
20 0.11943664 94 acl-2010-Edit Tree Distance Alignments for Semantic Role Labelling
topicId topicWeight
[(0, -0.366), (1, -0.43), (2, -0.012), (3, -0.005), (4, 0.037), (5, 0.106), (6, -0.151), (7, 0.122), (8, 0.103), (9, -0.128), (10, -0.127), (11, -0.161), (12, -0.161), (13, 0.058), (14, -0.075), (15, 0.012), (16, 0.042), (17, 0.006), (18, 0.004), (19, -0.026), (20, 0.052), (21, 0.074), (22, 0.035), (23, -0.026), (24, -0.007), (25, 0.02), (26, 0.001), (27, 0.021), (28, 0.03), (29, 0.045), (30, -0.016), (31, -0.052), (32, -0.005), (33, 0.062), (34, -0.042), (35, -0.067), (36, -0.056), (37, 0.01), (38, -0.007), (39, 0.003), (40, -0.057), (41, 0.049), (42, -0.018), (43, 0.013), (44, 0.006), (45, 0.002), (46, -0.013), (47, -0.002), (48, 0.029), (49, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.97064424 133 acl-2010-Hierarchical Search for Word Alignment
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
2 0.91702807 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
Author: John DeNero ; Dan Klein
Abstract: We present a discriminative model that directly predicts which set ofphrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested collections of all the overlapping phrase pairs consistent with an underlying word alignment. Extraction set models provide two principle advantages over word-factored alignment models. First, we can incorporate features on phrase pairs, in addition to word links. Second, we can optimize for an extraction-based loss function that relates directly to the end task of generating translations. Our model gives improvements in alignment quality relative to state-of-the-art unsupervised and supervised baselines, as well as providing up to a 1.4 improvement in BLEU score in Chinese-to-English translation experiments.
3 0.90122378 90 acl-2010-Diversify and Combine: Improving Word Alignment for Machine Translation on Low-Resource Languages
Author: Bing Xiang ; Yonggang Deng ; Bowen Zhou
Abstract: We present a novel method to improve word alignment quality and eventually the translation performance by producing and combining complementary word alignments for low-resource languages. Instead of focusing on the improvement of a single set of word alignments, we generate multiple sets of diversified alignments based on different motivations, such as linguistic knowledge, morphology and heuristics. We demonstrate this approach on an English-to-Pashto translation task by combining the alignments obtained from syntactic reordering, stemming, and partial words. The combined alignment outperforms the baseline alignment, with significantly higher F-scores and better transla- tion performance.
4 0.89380544 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment
Author: Shujie Liu ; Chi-Ho Li ; Ming Zhou
Abstract: While Inversion Transduction Grammar (ITG) has regained more and more attention in recent years, it still suffers from the major obstacle of speed. We propose a discriminative ITG pruning framework using Minimum Error Rate Training and various features from previous work on ITG alignment. Experiment results show that it is superior to all existing heuristics in ITG pruning. On top of the pruning framework, we also propose a discriminative ITG alignment model using hierarchical phrase pairs, which improves both F-score and Bleu score over the baseline alignment system of GIZA++. 1
5 0.88013548 170 acl-2010-Letter-Phoneme Alignment: An Exploration
Author: Sittichai Jiampojamarn ; Grzegorz Kondrak
Abstract: Letter-phoneme alignment is usually generated by a straightforward application of the EM algorithm. We explore several alternative alignment methods that employ phonetics, integer programming, and sets of constraints, and propose a novel approach of refining the EM alignment by aggregation of best alignments. We perform both intrinsic and extrinsic evaluation of the assortment of methods. We show that our proposed EM-Aggregation algorithm leads to the improvement of the state of the art in letter-to-phoneme conversion on several different data sets.
6 0.86515468 24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment
7 0.69749546 262 acl-2010-Word Alignment with Synonym Regularization
8 0.69737297 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
9 0.64445955 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation
10 0.64132464 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
11 0.59659529 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
12 0.5937106 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
13 0.52166754 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
14 0.47676373 169 acl-2010-Learning to Translate with Source and Target Syntax
15 0.46314347 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
16 0.44962507 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
17 0.44769073 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
18 0.44735181 180 acl-2010-On Jointly Recognizing and Aligning Bilingual Named Entities
19 0.44481146 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration
20 0.42895526 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
topicId topicWeight
[(3, 0.134), (14, 0.034), (18, 0.012), (25, 0.089), (33, 0.018), (39, 0.015), (42, 0.017), (44, 0.011), (59, 0.141), (73, 0.051), (76, 0.025), (78, 0.038), (83, 0.104), (84, 0.024), (98, 0.219)]
simIndex simValue paperId paperTitle
1 0.9651981 264 acl-2010-Wrapping up a Summary: From Representation to Generation
Author: Josef Steinberger ; Marco Turchi ; Mijail Kabadjov ; Ralf Steinberger ; Nello Cristianini
Abstract: The main focus of this work is to investigate robust ways for generating summaries from summary representations without recurring to simple sentence extraction and aiming at more human-like summaries. This is motivated by empirical evidence from TAC 2009 data showing that human summaries contain on average more and shorter sentences than the system summaries. We report encouraging preliminary results comparable to those attained by participating systems at TAC 2009.
same-paper 2 0.9337101 133 acl-2010-Hierarchical Search for Word Alignment
Author: Jason Riesa ; Daniel Marcu
Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.
3 0.9020828 81 acl-2010-Decision Detection Using Hierarchical Graphical Models
Author: Trung H. Bui ; Stanley Peters
Abstract: We investigate hierarchical graphical models (HGMs) for automatically detecting decisions in multi-party discussions. Several types of dialogue act (DA) are distinguished on the basis of their roles in formulating decisions. HGMs enable us to model dependencies between observed features of discussions, decision DAs, and subdialogues that result in a decision. For the task of detecting decision regions, an HGM classifier was found to outperform non-hierarchical graphical models and support vector machines, raising the F1-score to 0.80 from 0.55.
4 0.89774823 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
Author: Xianpei Han ; Jun Zhao
Abstract: Name ambiguity problem has raised urgent demands for efficient, high-quality named entity disambiguation methods. In recent years, the increasing availability of large-scale, rich semantic knowledge sources (such as Wikipedia and WordNet) creates new opportunities to enhance the named entity disambiguation by developing algorithms which can exploit these knowledge sources at best. The problem is that these knowledge sources are heterogeneous and most of the semantic knowledge within them is embedded in complex structures, such as graphs and networks. This paper proposes a knowledge-based method, called Structural Semantic Relatedness (SSR), which can enhance the named entity disambiguation by capturing and leveraging the structural semantic knowledge in multiple knowledge sources. Empirical results show that, in comparison with the classical BOW based methods and social network based methods, our method can significantly improve the disambiguation performance by respectively 8.7% and 14.7%. 1
5 0.89522612 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
6 0.89298856 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
7 0.89056444 169 acl-2010-Learning to Translate with Source and Target Syntax
8 0.89050484 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
10 0.88651133 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
11 0.88651031 79 acl-2010-Cross-Lingual Latent Topic Extraction
12 0.88631499 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing
13 0.88468313 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment
14 0.88461792 83 acl-2010-Dependency Parsing and Projection Based on Word-Pair Classification
15 0.88386858 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints
16 0.88353968 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans
17 0.88342559 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery
18 0.88340735 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts
19 0.88331729 146 acl-2010-Improving Chinese Semantic Role Labeling with Rich Syntactic Features
20 0.88242483 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation