acl acl2011 acl2011-141 knowledge-graph by maker-knowledge-mining

141 acl-2011-Gappy Phrasal Alignment By Agreement


Source: pdf

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. [sent-8, score-0.338]

2 In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. [sent-9, score-0.085]

3 Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. [sent-10, score-0.426]

4 Expanding the state space to include “gappy phrases” (such as French ne ? [sent-11, score-0.348]

5 pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. [sent-12, score-0.558]

6 The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime. [sent-13, score-0.338]

7 1 Introduction Word alignment is an important part of statistical machine translation (MT) pipelines. [sent-14, score-0.338]

8 Phrase tables containing pairs of source and target language phrases are extracted from word alignments, forming the core of phrase-based statistical machine translation systems (Koehn et al. [sent-15, score-0.138]

9 Most syntactic machine translation systems extract synchronous context-free grammars (SCFGs) from aligned syntactic fragments (Galley et al. [sent-17, score-0.122]

10 , 2006), which in turn are derived from bilingual word alignments and syntactic ∗Author was a summer intern at Microsoft Research during this project. [sent-19, score-0.193]

11 French ne voudrais English would not 1308 pas voyager par chemin de fer like traveling by railroad Figure 1: French-English pair with complex word alignment. [sent-20, score-0.565]

12 As seen in the French-English example in Figure 1, many sentence pairs are naturally aligned with multi-word units in both languages (chemin de fer; would ? [sent-24, score-0.038]

13 Much work has addressed this problem: generative models for direct phrasal alignment (Marcu and Wong, 2002), heuristic word-alignment combinations (Koehn et al. [sent-27, score-0.6]

14 , 2003; Och and Ney, 2003), models with pseudoword collocations (Lambert and Banchs, 2006; Ma et al. [sent-28, score-0.037]

15 We present a new phrasal alignment model based on the hidden Markov framework (Vogel et al. [sent-32, score-0.681]

16 Our approach is semi-Markov: each state can generate multiple observations, representing wordto-phrase alignments. [sent-34, score-0.178]

17 We also augment the state space to include contiguous sequences. [sent-35, score-0.313]

18 , 2006) to this space, and find that agreement discourages EM from overfitting. [sent-38, score-0.197]

19 Finally, we make the alignment space more symmetric by in- cluding gappy (or non-contiguous) phrases. [sent-39, score-0.889]

20 This allows agreement to reinforce non-contiguous align- ProceedingPso orftla thned 4,9 Otrhe Agonnn,u Jauln Mee 1e9t-i2ng4, o 2f0 t1h1e. [sent-40, score-0.156]

21 M (F|E) Figure 2: The model of E given F can represent the phrasal alignment {e1, e2 } ∼ {f1}. [sent-43, score-0.596]

22 However, the model of F given Eali gcanmnneont:t {thee probability mass oisw deivsetrr,ib tuhteed m boedtwelee onf {e1} ∼ {f1} annodt {e2} ∼ {f1}. [sent-44, score-0.092]

23 Agreement torifb bthutee dfo brwetwardee ann {de ba}ck ∼w{fard} aHnMdM {e alignments t. [sent-45, score-0.197]

24 en Adsg rteoe place loefss t mass on phrasal bliacnkksand greater mass on word-to-word links. [sent-46, score-0.379]

25 Pruning the set of allowed phrases preserves the time complexity of the word-to-word HMM alignment model. [sent-49, score-0.38]

26 An early approach by Deng and Byrne (2005) changed the parameterization of the traditional word-based HMM model, modeling subsequent words from the same state using a bigram model. [sent-52, score-0.237]

27 However, this model changes only the parameterization and not the set of possible alignments. [sent-53, score-0.096]

28 (2006), which allow phrase-to-phrase alignments between the source and target domain. [sent-55, score-0.203]

29 As DeNero warns, though, an unconstrained model may overfit using unusual segmentations. [sent-56, score-0.037]

30 Interestingly, the phrase-based hidden semi-Markov model of Andr e´s-Ferrer and Juan (2009) does not seem to encounter these problems. [sent-57, score-0.122]

31 We suspect two main causes: first, the model interpolates with Model 1 (Brown et al. [sent-58, score-0.037]

32 , 1994), which may help prevent overfitting, and second, the model is monotonic, which screens out many possible alignments. [sent-59, score-0.151]

33 The second major inspiration is alignment by agreement by Liang et al. [sent-61, score-0.446]

34 Here, soft intersection between the forward (F→E) and backward 1309 (E→F) alignments during parameter estimation pro(dEu→cesF ) b aeltitgern mweonrtds- dtou-rwinogrd p correspondences. [sent-63, score-0.203]

35 T prhoi-s unsupervised approach produced alignments with incredibly low error rates on French-English, though only moderate gains in end-to-end machine translation results. [sent-64, score-0.245]

36 Likely this is because the symmetric portion ofthe HMM space contains only single word to single word links. [sent-65, score-0.227]

37 As shown in Figure 2, in order to retain the phrasal link f1 ∼ e1, e2 after agreement, we need the reverse phrasal link e1, e2 v f1 in the backward direction. [sent-66, score-0.675]

38 However, this is not possible in a word-based HMM where each observation must be generated by a single state. [sent-67, score-0.161]

39 Agreement tends to encourage 1-to-1 alignments with very high precision and but lower recall. [sent-68, score-0.156]

40 As each word alignment acts as a constraint on phrase extraction, the phrase-pairs obtained from those alignments have high recall and low precision. [sent-69, score-0.446]

41 2 Gappy Phrasal Alignment Our goal is to unify phrasal alignment and alignment by agreement. [sent-70, score-0.89]

42 We use a phrasal hidden semiMarkov alignment model, but without the monotonicity requirement of Andr e´s-Ferrer and Juan (2009). [sent-71, score-0.708]

43 Since phrases may be used in both the state and observation space of both sentences, agreement during EM training no longer penalizes phrasal links such as those in Figure 2. [sent-72, score-0.926]

44 Moreover, the benefits of agreement are preserved: meaningful phrasal links that are likely in both directions of alignment will be reinforced, while phrasal links likely in only one direction will be discouraged. [sent-73, score-1.183]

45 This avoids segmentation problems encountered by DeNero et al. [sent-74, score-0.046]

46 Even a semi-Markov model with phrases can represent the alignment between English not and French ne ? [sent-77, score-0.537]

47 To make the model more symmetric, we extend the state space to include gappy phrases as well. [sent-79, score-0.762]

48 1 The set of alignments in each model becomes symmetric, though the two directions model gappy phrases differently. [sent-80, score-0.771]

49 pas: when predicting French given English, the align- ment corresponds to generating multiple distinct ob1We only allow a single gap with one word on each end. [sent-82, score-0.147]

50 This is sufficient for the vast majority of the gapped phenomena that we have seen in our training data. [sent-83, score-0.081]

51 Observations→ wouldnotliketravelbinyg railroad Observations→ taes→S ne voudrapiassvoyageprar chemidne fer Figure 3: Example English-given-French and French-given-English alignments of the same sentence pair using the Hidden SemiMarkov Model (HSMM) for gapped-phrase-to-phrase alignment. [sent-84, score-0.449]

52 It allows the state side phrases (denoted by vertical blocks), observation side phrases (denoted by horizontal blocks), and state-side gaps (denoted by discontinuous blocks in the same column connected by a hollow vertical “bridge”). [sent-85, score-0.809]

53 Note both directions can capture the desired alignment for this sentence pair. [sent-86, score-0.334]

54 servations from the same state; in the other direction, the word not is generated by a single gappy phrase ne ? [sent-87, score-0.562]

55 Computing posteriors for agreement is somewhat complicated, so we resort to an approximation described later. [sent-89, score-0.156]

56 Exact inference retains a low-order polynomial runtime; we use pruning to increase speed. [sent-90, score-0.034]

57 1 Hidden Markov Alignment Models Our model can be seen as an extension of the standard word-based Hidden Markov Model (HMM) used in alignment (Vogel et al. [sent-92, score-0.327]

58 This generative model has the form p(O|S) = PA p(A, O|S), where S = (s1, . [sent-95, score-0.078]

59 , aJ) is the alignment between the two sequences. [sent-106, score-0.29]

60 Since some words are systematically inserted during translation, the target (state) word sequence is augmented with a special NULL word. [sent-107, score-0.034]

61 To retain the position of the last aligned word, the state space contains I copies of the NULL word, one for each position (Och and Ney, 2003). [sent-108, score-0.446]

62 The alignment uses positive positions for words and negative positions for NULL states, so aj ∈ {1. [sent-109, score-0.463]

63 First the length of the observation sequence is selected based on pl (J|I). [sent-115, score-0.262]

64 Then for each observation position, the stat(eJ |isI )s. [sent-116, score-0.126]

65 el eTchteend f boarse eda on othbsee prior snta pteo:s a null state with probability p0, or a non-null state at position aj with probability (1 − p0) · pj (aj |aj−1) where pj is a jump doibsatbriibliuttyio (n1. [sent-117, score-1.139]

66 2 Gappy Semi-Markov Models The HMM alignment model identifies a wordto-word correspondence between the observation 2Note that jump distances beyond -10 parameter to prevent sparsity. [sent-120, score-0.621]

67 or 10 share a single words and the state words. [sent-121, score-0.213]

68 First, we allow contiguous phrases on the observation side, which makes the model semi-Markov: at each time stamp, the model may emit more than one observation word. [sent-123, score-0.548]

69 Next, we also allow contiguous and gappy phrases on the state side, leading to an alignment model that can retain phrasal links after agreement (see Section 4). [sent-124, score-1.706]

70 Since a single state may generate multiple observation words, we add a new variable K representing the number of states. [sent-126, score-0.374]

71 The alignment variable is augmented to allow contiguous and non-contiguous ranges of words. [sent-128, score-0.491]

72 We allow only a single gap, but of unlimited length. [sent-129, score-0.082]

73 The null state is still present, and is again represented by negative numbers. [sent-130, score-0.292]

74 , aK) ∈ A(I) < A(I) ={(i1, i2, g)|0 i1 ≤ i2 ≤ I, g ∈ {GAP, CONTIG}}∪ {(−i, −i, CONTIG) | 0 < i≤ I} We add one more random variable to capture the total number of observations generated by each state. [sent-134, score-0.142]

75 , lK) | 0 = l0 < · · · < lK = J} The generative model takes the following form: YK p(A, L, O|S) =pl(J|I)pf(K|J) Y pj(ak|ak−1)· , kY= Y1 pt(lk, ollk+1 |S[ak] lk−1) First, the length of the observation sequence (J) is selected, based on the number of words in the state-side sentence (I). [sent-138, score-0.204]

76 Since it does not affect the alignment, pl is modeled as a uniform distribution. [sent-139, score-0.136]

77 Next, we pick the total number of states to use (K), which must be less than the number of observations (J). [sent-140, score-0.212]

78 Short state sequences receive an exponential penalty: pf(K|J) ∝ η(J−K) if 0 K J, or 0 otherwise. [sent-141, score-0.178]

79 A(K Kh|aJrs)h ∝ penalty (small positive v Ja,lue or ro 0f η) may prevent the systematic overuse of phrases. [sent-142, score-0.18]

80 3 ≤ ≤ 3We found that this penalty was crucial to prevent overfitting in independent training. [sent-143, score-0.209]

81 We retain the first-order Markov assumption: the selection of each state is conditioned only on the prior state. [sent-146, score-0.268]

82 The transition distribution is identical to the word-based HMM for single word states. [sent-147, score-0.035]

83 For phrasal and gappy states, we jump into the first word of that state, and out of the last word of that state, and then pay a cost according to how many words are covered within that state. [sent-148, score-0.765]

84 If a = (i1, i2, g), then the beginning word of a is F(a) = i1, the ending word is L(a) = i2, and the length N(a) is 2 for gapped states, 0 for null states, and last(a) − first(a) s1t faoters a,ll 0 o ftohrer nsu. [sent-149, score-0.195]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gappy', 0.407), ('alignment', 0.29), ('phrasal', 0.269), ('state', 0.178), ('aj', 0.173), ('alignments', 0.156), ('agreement', 0.156), ('oj', 0.141), ('pj', 0.14), ('semimarkov', 0.139), ('pl', 0.136), ('lk', 0.128), ('observation', 0.126), ('hmm', 0.125), ('ne', 0.12), ('pas', 0.115), ('null', 0.114), ('ak', 0.109), ('observations', 0.107), ('symmetric', 0.107), ('contig', 0.092), ('saj', 0.092), ('taes', 0.092), ('fer', 0.092), ('phrases', 0.09), ('retain', 0.09), ('jump', 0.089), ('contiguous', 0.085), ('pt', 0.085), ('hidden', 0.085), ('chemin', 0.081), ('railroad', 0.081), ('gapped', 0.081), ('french', 0.08), ('prevent', 0.079), ('pf', 0.074), ('overfitting', 0.07), ('denero', 0.067), ('gap', 0.065), ('states', 0.065), ('monotonicity', 0.064), ('blocks', 0.063), ('discontinuous', 0.062), ('penalty', 0.06), ('parameterization', 0.059), ('juan', 0.058), ('links', 0.057), ('markov', 0.057), ('yj', 0.056), ('mass', 0.055), ('vertical', 0.054), ('space', 0.05), ('andr', 0.048), ('translation', 0.048), ('allow', 0.047), ('backward', 0.047), ('encountered', 0.046), ('side', 0.046), ('position', 0.045), ('directions', 0.044), ('denoted', 0.043), ('vogel', 0.043), ('direction', 0.041), ('generative', 0.041), ('warns', 0.041), ('overuse', 0.041), ('discourages', 0.041), ('unify', 0.041), ('aeir', 0.041), ('eda', 0.041), ('incredibly', 0.041), ('othbsee', 0.041), ('parsimonious', 0.041), ('stamp', 0.041), ('torifb', 0.041), ('voudrais', 0.041), ('pick', 0.04), ('aligned', 0.038), ('banchs', 0.037), ('kh', 0.037), ('sof', 0.037), ('ear', 0.037), ('ofi', 0.037), ('intern', 0.037), ('lae', 0.037), ('pseudoword', 0.037), ('model', 0.037), ('microsoft', 0.037), ('synchronous', 0.036), ('single', 0.035), ('lambert', 0.035), ('cluding', 0.035), ('yk', 0.035), ('traveling', 0.035), ('reinforced', 0.035), ('deng', 0.035), ('screens', 0.035), ('variable', 0.035), ('augmented', 0.034), ('pruning', 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

2 0.24185802 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

3 0.23648979 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

4 0.20683251 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

Author: Coskun Mermer ; Murat Saraclar

Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r

5 0.20575489 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser

Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.

6 0.16524047 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

7 0.16488506 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

8 0.14946838 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

9 0.14093204 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

10 0.12626989 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

11 0.11408769 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

12 0.10366201 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

13 0.097998105 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

14 0.091146447 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

15 0.087294914 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

16 0.083813131 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

17 0.082285687 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

18 0.0821364 284 acl-2011-Simple Unsupervised Grammar Induction from Raw Text with Cascaded Finite State Models

19 0.081423327 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

20 0.079809412 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.205), (1, -0.143), (2, 0.09), (3, 0.112), (4, 0.042), (5, 0.019), (6, 0.037), (7, 0.038), (8, -0.058), (9, 0.088), (10, 0.154), (11, 0.181), (12, 0.002), (13, 0.133), (14, -0.133), (15, 0.049), (16, 0.09), (17, -0.014), (18, -0.143), (19, 0.026), (20, -0.025), (21, 0.003), (22, -0.109), (23, -0.011), (24, 0.006), (25, 0.058), (26, 0.005), (27, 0.067), (28, -0.086), (29, -0.073), (30, 0.058), (31, -0.007), (32, 0.004), (33, -0.051), (34, 0.043), (35, -0.022), (36, -0.007), (37, 0.02), (38, 0.015), (39, 0.053), (40, -0.032), (41, 0.005), (42, 0.062), (43, 0.026), (44, -0.073), (45, -0.066), (46, -0.036), (47, -0.014), (48, -0.004), (49, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97666931 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

2 0.87631094 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

3 0.82401901 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

Author: Wang Ling ; Tiago Luis ; Joao Graca ; Isabel Trancoso ; Luisa Coheur

Abstract: In most statistical machine translation systems, the phrase/rule extraction algorithm uses alignments in the 1-best form, which might contain spurious alignment points. The usage ofweighted alignment matrices that encode all possible alignments has been shown to generate better phrase tables for phrase-based systems. We propose two algorithms to generate the well known MSD reordering model using weighted alignment matrices. Experiments on the IWSLT 2010 evaluation datasets for two language pairs with different alignment algorithms show that our methods produce more accurate reordering models, as can be shown by an increase over the regular MSD models of 0.4 BLEU points in the BTEC French to English test set, and of 1.5 BLEU points in the DIALOG Chinese to English test set.

4 0.81763357 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith

Abstract: We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs.

5 0.79139966 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

Author: Shujian Huang ; Stephan Vogel ; Jiajun Chen

Abstract: Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in both search efficiency and accuracy. In this paper, we conduct a detailed study of the causes of spurious ambiguity and how it effects parsing and discriminative learning. We also propose a variant of the grammar which eliminates those ambiguities. Our grammar shows advantages over previous grammars in both synthetic and real-world experiments.

6 0.78778738 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

7 0.77357531 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

8 0.75325912 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

9 0.70600641 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

10 0.69506913 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

11 0.69483078 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

12 0.65778226 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

13 0.64081204 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

14 0.51508361 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

15 0.48172823 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

16 0.47472268 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

17 0.45395976 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

18 0.45324704 342 acl-2011-full-for-print

19 0.44509465 321 acl-2011-Unsupervised Discovery of Rhyme Schemes

20 0.43296787 139 acl-2011-From Bilingual Dictionaries to Interlingual Document Representations


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.029), (17, 0.118), (26, 0.023), (37, 0.084), (39, 0.039), (41, 0.06), (51, 0.198), (55, 0.047), (59, 0.039), (72, 0.08), (88, 0.012), (91, 0.028), (96, 0.165)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84868461 276 acl-2011-Semi-Supervised SimHash for Efficient Document Similarity Search

Author: Qixia Jiang ; Maosong Sun

Abstract: Searching documents that are similar to a query document is an important component in modern information retrieval. Some existing hashing methods can be used for efficient document similarity search. However, unsupervised hashing methods cannot incorporate prior knowledge for better hashing. Although some supervised hashing methods can derive effective hash functions from prior knowledge, they are either computationally expensive or poorly discriminative. This paper proposes a novel (semi-)supervised hashing method named Semi-Supervised SimHash (S3H) for high-dimensional data similarity search. The basic idea of S3H is to learn the optimal feature weights from prior knowledge to relocate the data such that similar data have similar hash codes. We evaluate our method with several state-of-the-art methods on two large datasets. All the results show that our method gets the best performance. 1

same-paper 2 0.8434335 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

3 0.83028495 55 acl-2011-Automatically Predicting Peer-Review Helpfulness

Author: Wenting Xiong ; Diane Litman

Abstract: Identifying peer-review helpfulness is an important task for improving the quality of feedback that students receive from their peers. As a first step towards enhancing existing peerreview systems with new functionality based on helpfulness detection, we examine whether standard product review analysis techniques also apply to our new context of peer reviews. In addition, we investigate the utility of incorporating additional specialized features tailored to peer review. Our preliminary results show that the structural features, review unigrams and meta-data combined are useful in modeling the helpfulness of both peer reviews and product reviews, while peer-review specific auxiliary features can further improve helpfulness prediction.

4 0.8058399 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

Author: Sameer Singh ; Amarnag Subramanya ; Fernando Pereira ; Andrew McCallum

Abstract: Cross-document coreference, the task of grouping all the mentions of each entity in a document collection, arises in information extraction and automated knowledge base construction. For large collections, it is clearly impractical to consider all possible groupings of mentions into distinct entities. To solve the problem we propose two ideas: (a) a distributed inference technique that uses parallelism to enable large scale processing, and (b) a hierarchical model of coreference that represents uncertainty over multiple granularities of entities to facilitate more effective approximate inference. To evaluate these ideas, we constructed a labeled corpus of 1.5 million disambiguated mentions in Web pages by selecting link anchors referring to Wikipedia entities. We show that the combination of the hierarchical model with distributed inference quickly obtains high accuracy (with error reduction of 38%) on this large dataset, demonstrating the scalability of our approach.

5 0.76712 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks

Author: Alla Rozovskaya ; Dan Roth

Abstract: We consider the problem of correcting errors made by English as a Second Language (ESL) writers and address two issues that are essential to making progress in ESL error correction - algorithm selection and model adaptation to the first language of the ESL learner. A variety of learning algorithms have been applied to correct ESL mistakes, but often comparisons were made between incomparable data sets. We conduct an extensive, fair comparison of four popular learning methods for the task, reversing conclusions from earlier evaluations. Our results hold for different training sets, genres, and feature sets. A second key issue in ESL error correction is the adaptation of a model to the first language ofthe writer. Errors made by non-native speakers exhibit certain regularities and, as we show, models perform much better when they use knowledge about error patterns of the nonnative writers. We propose a novel way to adapt a learned algorithm to the first language of the writer that is both cheaper to implement and performs better than other adaptation methods.

6 0.7458657 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

7 0.74192142 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

8 0.73937631 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

9 0.73625743 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

10 0.73609132 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

11 0.73449242 30 acl-2011-Adjoining Tree-to-String Translation

12 0.73209834 88 acl-2011-Creating a manually error-tagged and shallow-parsed learner corpus

13 0.73106927 252 acl-2011-Prototyping virtual instructors from human-human corpora

14 0.73094171 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

15 0.73059583 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

16 0.7301029 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

17 0.72890067 311 acl-2011-Translationese and Its Dialects

18 0.72835088 38 acl-2011-An Empirical Investigation of Discounting in Cross-Domain Language Models

19 0.72819877 175 acl-2011-Integrating history-length interpolation and classes in language modeling

20 0.72774422 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers