acl acl2011 acl2011-221 knowledge-graph by maker-knowledge-mining

221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Source: pdf

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. [sent-2, score-0.323]

2 A similar model generating e from f will make different alignment predictions. [sent-3, score-0.34]

3 Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. [sent-4, score-0.661]

4 This paper presents a graphical model that embeds two directional aligners into a single model. [sent-5, score-0.748]

5 Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. [sent-6, score-0.881]

6 Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. [sent-7, score-0.892]

7 The standard approach to word alignment employs directional Markov models that align the words of a sentence f to those of its translation e, such as IBM Model 4 (Brown et al. [sent-10, score-0.905]

8 , 1993) or the HMM-based alignment model (Vogel et al. [sent-11, score-0.34]

9 Machine translation systems typically combine the predictions of two directional models, one which aligns f to e and the other e to f (Och et al. [sent-13, score-0.634]

10 Combination can reduce errors and relax the one-to-many structural restriction of directional models. [sent-15, score-0.539]

11 Common combination methods include the union or intersection of directional alignments, as 420 Klaus Macherey Google Research kmach@ google com . [sent-16, score-0.664]

12 Inference in a probabilistic model resolves the conflicting predictions of two directional models, while taking into account each model’s uncertainty over its output. [sent-20, score-0.636]

13 This result is achieved by embedding two directional HMM-based alignment models into a larger bidirectional graphical model. [sent-21, score-1.173]

14 The full model structure and potentials allow the two embedded directional models to disagree to some extent, but reward agreement. [sent-22, score-0.731]

15 Moreover, the bidirectional model enforces a one-to-one phrase alignment structure, similar to the output of phrase alignment models (Marcu and Wong, 2002; DeNero et al. [sent-23, score-1.045]

16 However, we can employ dual decomposition as an approximate inference technique (Rush et al. [sent-28, score-0.524]

17 In this approach, we iteratively apply the same efficient sequence algorithms for the underlying directional models, and thereby optimize a dual bound on the model objective. [sent-30, score-0.835]

18 Our model-based approach to aligner combination yields improvements in alignment quality and phrase extraction quality in Chinese-English experiments, relative to typical heuristic combinations methods applied to the predictions of independent directional models. [sent-33, score-1.061]

19 Ac s2s0o1ci1a Atiosnso fcoirat Cio nm foprut Caotimonpaulta Lti nognuails Lti cnsg,u piasgteics 420–429, 2 Model Definition Our bidirectional model G = (V, D) is a globally normalized, iuonndairle mctoedde graphical m,oDd)el i so fa th gleo bwaolrlyd alignment for a fixed sentence pair (e, f). [sent-36, score-0.679]

20 Each vertex in the vertex set V corresponds to a model varitaebxle i Vi, aen vde reteaxch s uentd Vir ceoctrereds edge sin to oth ae edge ls veta rDicorresponds etoa a pair doirfe evcatreidab eldesg (Vi, Vj). [sent-37, score-0.404]

21 P(v) ∝ vYi∈V ωi(vi) ·Y µij(vi,vj) Y (vi,Yvj)∈D Our model contains two directional hidden Markov alignment models, which we review in Section 2. [sent-41, score-0.853]

22 1 HMM-Based Alignment Model This section describes the classic hidden Markov model (HMM) based alignment model (Vogel et al. [sent-45, score-0.392]

23 P(f|e) aisl dye ifnindeedx ithne etwe rmords so fo a ela bteyn it alignment vector a, where aj = i indicates that word position i of e aligns to word position j of f. [sent-49, score-0.708]

24 2 The highest probability word alignment vector under the model for a given sentence pair (e, f) can be computed exactly using the standard Viterbi algorithm for HMMs in O(|e|2 · |f|) time. [sent-57, score-0.366]

25 ed trivially into a set of word alignment links A: Aa = {(i, j) : aj = i,i = 0} . [sent-59, score-0.717]

26 We have defined a directional model that generates f from e. [sent-61, score-0.565]

27 P(e,b|f) =Y|e|Df→e(bi|bi−1)Mf→e(ei|fbi) Yj=1 The vector b can be interpreted as a set of alignment links that is one-to-many: each value iappears at most once in the set. [sent-66, score-0.342]

28 2 A Bidirectional Alignment Model We can combine two HMM-based directional alignment models by embedding them in a larger model 2In experiments, we set po = 10−6. [sent-69, score-0.969]

29 that includes all of the random variables of two directional models, along with additional structure that promotes agreecm11(ean)t and resolves dci1s2(car)epancies. [sent-85, score-0.577]

30 The original directional models include observed word sequences e and f, along with the two latent alignment vectors a and b defined in Section 2. [sent-86, score-0.838]

31 = Me→f(fj|aeir)e Hωjo(aw)(i) ωi(b)(j) = Mf→e(ei|fj) The edge potentials between a and b encode the transition model in Equation 1. [sent-89, score-0.315]

32 This matrix encodes the alignment links proposed by the bidirectional model: Ac = {(i, j) : cij = 1} . [sent-91, score-0.723]

33 422 Each model node for anc 1e(lbe)ment cij ∈12 b{)0, 1} is connected to aj and bi via coherence edges. [sent-92, score-0.701]

34 Instead, they are fixed functions that promote consistency between the integer-vaalu1ed directioana2l alignment av3ectors a and b and the boolean-valued matrix c. [sent-97, score-0.343]

35 Consider the assignment aj = i, where i = 0 indicates that word fj is null-aligned, and i ≥ 1indicates that fj aligns tois ei. [sent-98, score-0.597]

36 Tll-haeli gcnoheedr,e anncde potential ensures the following relationship between the vari- able assignment aj = iand the variables ci0j, for any i0 ∈ [1, |e|] . [sent-99, score-0.521]

37 Collectively, the list of cases above enforce an intuitive correspondence: an alignment aj = iensures that cij must be 1, adjacent neighbors may be 1 but incur a cost, and all other elements are 0. [sent-104, score-0.789]

38 These yeodgue potential functions takes an integer value ifor some variable aj and a binary value k for some ci0j. [sent-106, score-0.438]

39 In this way, we relax the one-to-many constraints of the directional models. [sent-111, score-0.539]

40 However, all of the information about how words align is expressed by the vertex and edge potentials on a and b. [sent-112, score-0.337]

41 The coherence edges and the link matrix c only serve to resolve conflicts between the directional models and communicate information between them. [sent-113, score-0.639]

42 For any assignment to (a, b, c) with non-zero probability, c must encode a one-to-one phrase alignment with a maximum phrase length of 3. [sent-117, score-0.443]

43 For every pair of indices (i, j) and (i0, j0), the following cycle exists in the graph: cij → ci0j0 bi → cij0 → aj0 → → bi0 → ci0j → aj → cij Additional cycles also exist in the graph through the edges between aj−1 and aj and between bi−1 and bi. [sent-122, score-1.086]

44 The general phrase alignment problem under an arbitrary model is known to be NP-hard (DeNero and Klein, 2008). [sent-123, score-0.393]

45 The dual decomposition inference approach allows us to exploit this sub-graph structure (Rush et al. [sent-130, score-0.524]

46 The technique of dual decomposition has recently been shown to yield state-of-the-art performance in dependency parsing (Koo et al. [sent-133, score-0.426]

47 2 Dual Problem Formulation To describe a dual decomposition inference procedure for our model, we first restate the inference problem under our graphical model in terms of the two overlapping subgraphs that admit tractable inference. [sent-136, score-0.827]

48 In this case, the dual problem decomposes into two terms that are each local to an acyclic subgraph. [sent-144, score-0.325]

49 As in previous work, we solve for the dual variable u by repeatedly performing inference in the two decoupled maximization problems. [sent-147, score-0.405]

50 In fact, we can make a stronger claim: we can reuse the Viterbi inference algorithm for linear chain graphical models that applies to the embedded directional HMM models. [sent-152, score-0.783]

51 ac Tthoree add dinittioo ntahel evremrtsex o potentials of this linear chain model, because the optimal 424 c(a) Figure 3: The tree-structured subgraph Ga can be mapped Ftoi an equivalent c-hstraiunc-tsutrreudct suurbedgr amphod Gel by optimizing over ci0j for aj = i. [sent-156, score-0.59]

52 choice of each cij can be determined from aj and the model parameters. [sent-157, score-0.553]

53 If aj = i, then cij = 1according to our edge potential defined in Equation 2. [sent-158, score-0.658]

54 Hence, setting aj = irequires the inclusion of the corre- vertex potential ωj(a)(i), sponding as well as u(i,j). [sent-159, score-0.52]

55 For i0 i, either ci0j = 0, which contributes nothing to Equation 5, or ci0j = 1, which contributes u(i0, j) −α, according to our edge potential between aj a,njd) ci0j. [sent-160, score-0.532]

56 Defining this potential allows us to collapse the source-side sub-graph inference problem defined by Equation 5, into a simple linear chain model that only includes potential functions M0j and . [sent-162, score-0.306]

57 4 Dual Decomposition Algorithm Now that we have the means to efficiently evaluate Equation 4 for fixed u, we can define the full dual decomposition algorithm for our model, which searches for a u that optimizes Equation 4. [sent-171, score-0.517]

58 The full dual decomposition optimization procedure appears in Algorithm 1. [sent-174, score-0.458]

59 5 Convergence and Early Stopping Our dual decomposition algorithm provides an inference method that is exact upon convergence. [sent-179, score-0.55]

60 Therefore, our approach does not require any additional communication overhead relative to the independent directional models in a distributed aligner implementation. [sent-190, score-0.644]

61 4 Related Work Alignment combination normally involves selecting some A from the output of two directional models. [sent-195, score-0.557]

62 sCoommem Aon f approaches iuntc loufd tew forming otnhea lu mnioodne or intersection of the directional sets. [sent-196, score-0.556]

63 , 2003), produce alignment link sets that include all of A∩ and some subsmete notf A∪ b saestsed th on itnhcel relationship of multiple links (Och e At al. [sent-199, score-0.342]

64 In addition, supervised word alignment models often use the output of directional unsupervised aligners as features or pruning signals. [sent-201, score-0.971]

65 In the case that a supervised model is restricted to proposing alignment links that appear in the output of a directional aligner, these models can be interpreted as a combination technique (Deng and Zhou, 2009). [sent-202, score-1.017]

66 Such a model-based approach differs from ours in that it requires a supervised dataset and treats the directional aligners’ output as fixed. [sent-203, score-0.542]

67 This approach to jointly learning two directional alignment models yields state-of-the-art unsupervised performance. [sent-206, score-0.838]

68 In fact, we employ agreement-based training to estimate the parameters of the directional aligners in our experi- ments. [sent-208, score-0.617]

69 A parallel idea that closely relates to our bidirectional model is posterior regularization, which has also been applied to the word alignment problem (Gra ¸ca et al. [sent-209, score-0.6]

70 This approach also yields state-of-the-art unsupervised alignment performance on some datasets, along with improvements in end-to-end translation quality (Ganchev et al. [sent-212, score-0.323]

71 More importantly, we have changed the output space of the model to be a one-to-one phrase alignment via the coherence edge potential functions. [sent-216, score-0.614]

72 Another similar line of work applies belief propagation to factor graphs that enforce a one-to-one word alignment (Cromi` eres and Kurohashi, 2009). [sent-217, score-0.403]

73 Although differing in both model and inference, our work and theirs both find improvements from defining graphical models for alignment that do not admit exact polynomial-time inference algorithms. [sent-220, score-0.596]

74 35%% Table 1: The bidirectional model’s dual decomposition algorithm substantially increases the overlap between the predictions of the directional models, measured by the number of links in their intersection. [sent-223, score-1.29]

75 In this way, we can show that the bidirectional model improves alignment quality and enables the extraction of more correct phrase pairs. [sent-225, score-0.623]

76 We trained the model on a portion of FBIS data that has been used previously for alignment model evaluation (Ayan and Dorr, 2006; Haghighi et al. [sent-228, score-0.392]

77 We trained the parameters of the directional models using the agreement training variant ofthe expectation maximization algorithm (Liang et al. [sent-232, score-0.613]

78 Agreement-trained IBM Model 1 was used to initialize the parameters of the HMM-based alignment models (Brown et al. [sent-234, score-0.325]

79 Both IBM Model 1 and the HMM alignment models were trained for 5 iterations on a 6. [sent-236, score-0.357]

80 2 Convergence Analysis With n = 250 maximum iterations, our dual decomposition inference algorithm only converges 6. [sent-241, score-0.55]

81 2% of the time, perhaps largely due to the fact that the two directional models have different one-to-many structural constraints. [sent-242, score-0.55]

82 1R0462 Table 2: Alignment error rate results for the bidirectional model versus the baseline directional models. [sent-246, score-0.795]

83 We can measure the agreement between models as the fraction of alignment links in the union A∪ that also appear in the intersection A∩ oef u tnhieo two directional models. [sent-254, score-0.999]

84 Table 1 shows a 47% relative increase in the fraction of links that both models agree on by running dual decomposition (bidirectional), relative to independent directional inference (baseline). [sent-255, score-1.182]

85 3 Alignment Error Evaluation To evaluate alignment error of the baseline directional aligners, we must apply a combination procedure such as union or intersection to Aa and Ab. [sent-258, score-0.952]

86 Likewise, ihn a osr udnerio to rev ianltuearstee alignment error Afor our combined model in cases where the inference algorithm does not converge, we must apply combiIn cases where the algorithm nation to c(a) and c(b). [sent-259, score-0.49]

87 First, we measure alignment error rate (AER), which compares the pro427 posed alignment set A to the sure set S and possible speots ePd i anl itghnem annotation, ow thheere s uSre ⊆ Pt S. [sent-262, score-0.576]

88 The bidirectional model improves both precision and recall relative to all heuristic combination techniques, including grow-diag-final (Koehn et al. [sent-264, score-0.381]

89 Extraction-based evaluations of alignment better coincide with the role of word aligners in machine translation systems (Ayan and Dorr, 2006). [sent-268, score-0.427]

90 Finally, we evaluated our bidirectional model in a large-scale end-to-end phrase-based machine translation system from Chinese to English, based on the alignment template approach (Och and Ney, 2004). [sent-278, score-0.605]

91 The translation model weights were tuned for both the baseline and bidirectional alignments using lattice-based minimum error rate training (Kumar et al. [sent-279, score-0.448]

92 82% after training IBM Model 1 for 3 iterations and training the HMM-based alignment model for 3 iterations. [sent-287, score-0.372]

93 As our model only provides small improvements in alignment precision and recall for the union combiner, the magnitude of the BLEU improvement is not surprising. [sent-289, score-0.404]

94 6 Conclusion We have presented a graphical model that combines two classical HMM-based alignment models. [sent-290, score-0.419]

95 Our bidirectional model, which requires no additional learning and no supervised data, can be applied using dual decomposition with only a constant factor additional computation relative to independent directional inference. [sent-291, score-1.225]

96 The resulting predictions improve the precision and recall of both alignment links and extraced phrase pairs in Chinese-English experiments. [sent-292, score-0.436]

97 Because our technique is defined declaratively in terms of a graphical model, it can be extended in a straightforward manner, for instance with additional potentials on c or improvements to the component directional models. [sent-294, score-0.721]

98 An alignment algorithm using belief propagation and a structure-based distortion model. [sent-325, score-0.418]

99 Using word-dependent transition models in HMM-based word alignment for statistical machine. [sent-361, score-0.365]

100 On dual decomposition and linear programming relaxations for natural language processing. [sent-403, score-0.426]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('directional', 0.513), ('aj', 0.375), ('alignment', 0.288), ('dual', 0.27), ('bidirectional', 0.23), ('decomposition', 0.156), ('alignments', 0.131), ('potentials', 0.129), ('cij', 0.126), ('aligners', 0.104), ('equation', 0.102), ('inference', 0.098), ('denero', 0.095), ('edge', 0.094), ('bi', 0.084), ('vertex', 0.082), ('graphical', 0.079), ('aa', 0.076), ('ci', 0.074), ('aligner', 0.067), ('coherence', 0.064), ('union', 0.064), ('fj', 0.064), ('potential', 0.063), ('ga', 0.063), ('ayan', 0.059), ('itg', 0.059), ('jb', 0.058), ('subgraph', 0.056), ('links', 0.054), ('po', 0.053), ('vi', 0.053), ('phrase', 0.053), ('model', 0.052), ('viterbi', 0.05), ('assignment', 0.049), ('aer', 0.048), ('rush', 0.048), ('aligns', 0.045), ('enforces', 0.044), ('combination', 0.044), ('intersection', 0.043), ('ja', 0.042), ('admit', 0.042), ('predictions', 0.041), ('gra', 0.04), ('transition', 0.04), ('bmiadoserlincteoaliugcnr', 0.039), ('brunning', 0.039), ('cromi', 0.039), ('eres', 0.039), ('sition', 0.039), ('belief', 0.038), ('propagation', 0.038), ('maximization', 0.037), ('models', 0.037), ('yj', 0.035), ('optimizes', 0.035), ('translation', 0.035), ('variables', 0.034), ('ganchev', 0.034), ('stopping', 0.034), ('shindo', 0.034), ('converge', 0.033), ('ibm', 0.033), ('haghighi', 0.033), ('convergence', 0.033), ('och', 0.033), ('optimization', 0.032), ('iterations', 0.032), ('align', 0.032), ('subgraphs', 0.032), ('posterior', 0.03), ('fixed', 0.03), ('chain', 0.03), ('resolves', 0.03), ('certificate', 0.03), ('decomposes', 0.03), ('deng', 0.03), ('burkett', 0.03), ('supervised', 0.029), ('markov', 0.029), ('heuristic', 0.028), ('distortion', 0.028), ('fbis', 0.028), ('joao', 0.028), ('relative', 0.027), ('vogel', 0.027), ('xb', 0.027), ('likewise', 0.026), ('embedding', 0.026), ('relax', 0.026), ('naseem', 0.026), ('optimality', 0.026), ('algorithm', 0.026), ('multinomial', 0.026), ('josef', 0.025), ('matrix', 0.025), ('mf', 0.025), ('acyclic', 0.025)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

2 0.24185802 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

3 0.24108081 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

4 0.23724531 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

Author: Coskun Mermer ; Murat Saraclar

Abstract: In this work, we compare the translation performance of word alignments obtained via Bayesian inference to those obtained via expectation-maximization (EM). We propose a Gibbs sampler for fully Bayesian inference in IBM Model 1, integrating over all possible parameter values in finding the alignment distribution. We show that Bayesian inference outperforms EM in all of the tested language pairs, domains and data set sizes, by up to 2.99 BLEU points. We also show that the proposed method effectively addresses the well-known rare word problem in EM-estimated models; and at the same time induces a much smaller dictionary of bilingual word-pairs. .t r

5 0.21785416 106 acl-2011-Dual Decomposition for Natural Language Processing

Author: Alexander M. Rush and Michael Collins

Abstract: unkown-abstract

6 0.17414029 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

7 0.15909626 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

8 0.14601485 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

9 0.13362224 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

10 0.13055243 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

11 0.12866496 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

12 0.12586944 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing

13 0.11627126 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

14 0.1149434 267 acl-2011-Reversible Stochastic Attribute-Value Grammars

15 0.11147678 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

16 0.10698476 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

17 0.097744524 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

18 0.093851149 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

19 0.089879833 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

20 0.089629151 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.214), (1, -0.132), (2, 0.055), (3, 0.062), (4, 0.051), (5, 0.005), (6, 0.018), (7, 0.094), (8, -0.064), (9, 0.11), (10, 0.202), (11, 0.21), (12, 0.042), (13, 0.158), (14, -0.223), (15, 0.093), (16, 0.104), (17, -0.021), (18, -0.146), (19, 0.001), (20, 0.002), (21, -0.026), (22, -0.039), (23, 0.057), (24, -0.027), (25, 0.025), (26, -0.033), (27, 0.046), (28, -0.049), (29, -0.049), (30, -0.027), (31, 0.054), (32, -0.075), (33, 0.06), (34, -0.021), (35, 0.02), (36, 0.035), (37, 0.016), (38, -0.064), (39, -0.004), (40, -0.01), (41, -0.002), (42, 0.023), (43, 0.008), (44, 0.024), (45, -0.084), (46, 0.011), (47, 0.031), (48, 0.085), (49, 0.059)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95695049 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

2 0.83436179 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

3 0.82612723 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

Author: Kapil Thadani ; Kathleen McKeown

Abstract: The task of aligning corresponding phrases across two related sentences is an important component of approaches for natural language problems such as textual inference, paraphrase detection and text-to-text generation. In this work, we examine a state-of-the-art structured prediction model for the alignment task which uses a phrase-based representation and is forced to decode alignments using an approximate search approach. We propose instead a straightforward exact decoding technique based on integer linear programming that yields order-of-magnitude improvements in decoding speed. This ILP-based decoding strategy permits us to consider syntacticallyinformed constraints on alignments which significantly increase the precision of the model.

4 0.7705757 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

Author: Shujian Huang ; Stephan Vogel ; Jiajun Chen

Abstract: Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in both search efficiency and accuracy. In this paper, we conduct a detailed study of the causes of spurious ambiguity and how it effects parsing and discriminative learning. We also propose a variant of the grammar which eliminates those ambiguities. Our grammar shows advantages over previous grammars in both synthetic and real-world experiments.

5 0.75633055 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith

Abstract: We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs.

6 0.73598564 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

7 0.70339686 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

8 0.68279159 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

9 0.66981232 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

10 0.66108364 106 acl-2011-Dual Decomposition for Natural Language Processing

11 0.6569429 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

12 0.65093935 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

13 0.61712021 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

14 0.54096764 342 acl-2011-full-for-print

15 0.5265848 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation

16 0.52473921 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

17 0.48058996 205 acl-2011-Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments

18 0.47901541 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

19 0.43649179 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

20 0.42506951 139 acl-2011-From Bilingual Dictionaries to Interlingual Document Representations

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.019), (17, 0.065), (26, 0.017), (37, 0.113), (39, 0.038), (41, 0.072), (53, 0.028), (55, 0.035), (59, 0.034), (64, 0.048), (72, 0.047), (90, 0.16), (91, 0.039), (96, 0.173), (97, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.90942955 212 acl-2011-Local Histograms of Character N-grams for Authorship Attribution

Author: Hugo Jair Escalante ; Thamar Solorio ; Manuel Montes-y-Gomez

Abstract: This paper proposes the use of local histograms (LH) over character n-grams for authorship attribution (AA). LHs are enriched histogram representations that preserve sequential information in documents; they have been successfully used for text categorization and document visualization using word histograms. In this work we explore the suitability of LHs over n-grams at the character-level for AA. We show that LHs are particularly helpful for AA, because they provide useful information for uncovering, to some extent, the writing style of authors. We report experimental results in AA data sets that confirm that LHs over character n-grams are more helpful for AA than the usual global histograms, yielding results far superior to state of the art approaches. We found that LHs are even more advantageous in challenging conditions, such as having imbalanced and small training sets. Our results motivate further research on the use of LHs for modeling the writing style of authors for related tasks, such as authorship verification and plagiarism detection.

2 0.90506035 222 acl-2011-Model-Portability Experiments for Textual Temporal Analysis

Author: Oleksandr Kolomiyets ; Steven Bethard ; Marie-Francine Moens

Abstract: We explore a semi-supervised approach for improving the portability of time expression recognition to non-newswire domains: we generate additional training examples by substituting temporal expression words with potential synonyms. We explore using synonyms both from WordNet and from the Latent Words Language Model (LWLM), which predicts synonyms in context using an unsupervised approach. We evaluate a state-of-the-art time expression recognition system trained both with and without the additional training examples using data from TempEval 2010, Reuters and Wikipedia. We find that the LWLM provides substantial improvements on the Reuters corpus, and smaller improvements on the Wikipedia corpus. We find that WordNet alone never improves performance, though intersecting the examples from the LWLM and WordNet provides more stable results for Wikipedia. 1

same-paper 3 0.88684571 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

4 0.87989867 64 acl-2011-C-Feel-It: A Sentiment Analyzer for Micro-blogs

Author: Aditya Joshi ; Balamurali AR ; Pushpak Bhattacharyya ; Rajat Mohanty

Abstract: Social networking and micro-blogging sites are stores of opinion-bearing content created by human users. We describe C-Feel-It, a system which can tap opinion content in posts (called tweets) from the micro-blogging website, Twitter. This web-based system categorizes tweets pertaining to a search string as positive, negative or objective and gives an aggregate sentiment score that represents a sentiment snapshot for a search string. We present a qualitative evaluation of this system based on a human-annotated tweet corpus.

5 0.86238801 258 acl-2011-Ranking Class Labels Using Query Sessions

Author: Marius Pasca

Abstract: The role of search queries, as available within query sessions or in isolation from one another, in examined in the context of ranking the class labels (e.g., brazilian cities, business centers, hilly sites) extracted from Web documents for various instances (e.g., rio de janeiro). The co-occurrence of a class label and an instance, in the same query or within the same query session, is used to reinforce the estimated relevance of the class label for the instance. Experiments over evaluation sets of instances associated with Web search queries illustrate the higher quality of the query-based, re-ranked class labels, relative to ranking baselines using documentbased counts.

6 0.81892288 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

7 0.80825055 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

8 0.8077296 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

9 0.80683237 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

10 0.80574393 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

11 0.80522263 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

12 0.80361825 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks

13 0.80354518 311 acl-2011-Translationese and Its Dialects

14 0.80278206 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

15 0.80258989 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

16 0.80248833 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

17 0.80061692 28 acl-2011-A Statistical Tree Annotator and Its Applications

18 0.80017793 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

19 0.80017471 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

20 0.80014461 44 acl-2011-An exponential translation model for target language morphology