acl acl2011 acl2011-93 knowledge-graph by maker-knowledge-mining

93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment


Source: pdf

Author: Shujian Huang ; Stephan Vogel ; Jiajun Chen

Abstract: Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in both search efficiency and accuracy. In this paper, we conduct a detailed study of the causes of spurious ambiguity and how it effects parsing and discriminative learning. We also propose a variant of the grammar which eliminates those ambiguities. Our grammar shows advantages over previous grammars in both synthetic and real-world experiments.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn Abstract Word alignment has an exponentially large search space, which often makes exact inference infeasible. [sent-4, score-0.235]

2 Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. [sent-5, score-0.376]

3 However, spurious ambiguity may occur in synchronous parsing and cause problems in both search efficiency and accuracy. [sent-6, score-0.625]

4 In this paper, we conduct a detailed study of the causes of spurious ambiguity and how it effects parsing and discriminative learning. [sent-7, score-0.668]

5 We also propose a variant of the grammar which eliminates those ambiguities. [sent-8, score-0.122]

6 Our grammar shows advantages over previous grammars in both synthetic and real-world experiments. [sent-9, score-0.231]

7 1 Introduction In statistical machine translation, word alignment attempts to find word correspondences in parallel sentence pairs. [sent-10, score-0.184]

8 The search space of word alignment will grow exponentially with the length of source and target sentences, which makes the inference for complex models infeasible (Brown et al. [sent-11, score-0.21]

9 Recently, inversion transduction grammars (Wu, 1997), namely ITG, have been used to constrain the search space for word alignment (Zhang and Gildea, 2005; Cherry and Lin, 2007; Haghighi et al. [sent-13, score-0.426]

10 ITG is a family of grammars in which the right hand side of the rule is either two nonterminals or a terminal sequence. [sent-16, score-0.215]

11 The most general case of the ITG family is the bracketing transduction grammar 379 Carnegie Mellon University voge l cs . [sent-17, score-0.282]

12 [AA] denotes a monotone concatenation and hAAi denotes an inverted concatenation. [sent-25, score-0.281]

13 Synchronous parsing of ITG may generate a large number of different derivations for the same underlying word alignment. [sent-27, score-0.256]

14 This is often referred to as the spurious ambiguity problem. [sent-28, score-0.514]

15 Calculating and saving those derivations will slow down the parsing speed significantly. [sent-29, score-0.278]

16 Furthermore, spurious derivations may fill up the n-best list and supersede potentially good results, making it harder to find the best alignment. [sent-30, score-0.522]

17 Besides, over-counting those spurious derivations will also affect the likelihood estimation. [sent-31, score-0.522]

18 In order to reduce spurious derivations, Wu (1997), Haghighi et al. [sent-32, score-0.336]

19 These grammars have different behaviors in parsing efficiency and accuracy, but so far no detailed comparison between them has been done. [sent-35, score-0.16]

20 In this paper, we formally analyze alignments under ITG constraints and the different causes of spurious ambiguity for those alignments. [sent-36, score-0.645]

21 We do an empirical study of the influence of spurious ambiguity on parsing and discriminative learning by compar- ing different grammars in both synthetic and realdata experiments. [sent-37, score-0.771]

22 A new variant of the grammar is proposed, which efficiently removes all spurious ambiguities. [sent-39, score-0.481]

23 Our grammar shows advantages over previous ones in both experiments. [sent-40, score-0.095]

24 2 ITG Alignment Family By lexical rules like A → e/f, each ITG derivation actually represents a unique alignment G bet dwereivenat tiohen two sequences. [sent-44, score-0.377]

25 Thus the family of ITG derivations represents a family of word alignment. [sent-45, score-0.342]

26 The ITG alignment family is a set of word alignments that has at least one BTG derivation. [sent-47, score-0.36]

27 ITG alignment family is only a subset of word alignments because there are cases, known as insideoutside alignments (Wu, 1997), that could not be represented by any ITG derivation. [sent-48, score-0.458]

28 On the other hand, an ITG alignment may have multiple derivations. [sent-49, score-0.184]

29 For a given grammar G, spurious ambiguity in word alignment is the case where two or more derivations d1, d2, . [sent-51, score-0.979]

30 dk of G have the same underlying word alignment A. [sent-54, score-0.184]

31 A grammar G is nonspurious if for any given word alignment, there exist at most one derivation under G. [sent-55, score-0.197]

32 In any given derivation, an ITG rule applies by either generating a bilingual word pair (lexical rules) or splitting the current alignment into two parts, which will recursively generate two sub-derivations (transition rules). [sent-56, score-0.255]

33 Applying a monotone (or inverted) concatenation transition rule forms a monotone tsplit (or inverted t-split) of the original alignment (Figure 2). [sent-58, score-0.661]

34 1 Branching Ambiguity As shown in Figure 2, left-branching and rightbranching will produce different derivations under 380 A → [AB] | [BB] | [CB] | [AC] | [BC] | [CC] BA → hAAi | hBAi | hCAi | hACi | hBCi | hCCi BCA → e/f | ? [sent-60, score-0.209]

35 Branching ambiguity was identified and solved in Wu (1997), using the grammar in Figure 3, denoted as LG. [sent-64, score-0.273]

36 LG uses two separate non-terminals for monotone and inverted concatenation, respectively. [sent-65, score-0.219]

37 It only allows left branching of such non-terminals, by excluding rules like A → [BA] . [sent-66, score-0.125]

38 For each ITG alignment A, in which all the words are aligned, LG will produce a unique derivation. [sent-68, score-0.244]

39 Induction hypothesis: the theorem holds for any A with length less than n. [sent-71, score-0.078]

40 For A of length n, let s be the right most t-split which splits A into S1 and S2. [sent-72, score-0.024]

41 Assume that there exists another t-split s0, splitting A into S11 and (S12S2). [sent-74, score-0.049]

42 Because A is fixed and fully aligned, it is easy to see that if s is a monotone t-split, s0 could only be monotone, and S12 and S2 in the right sub-derivation oft-split s0 could only be combined by monotone concatenation as well. [sent-75, score-0.378]

43 So s0 will have a right branching of monotone concatenation, which contradicts with the definition of LG because right branching of monotone concatenations is prohibited. [sent-76, score-0.594]

44 A similar contradiction occurs if s is an inverted t-split. [sent-77, score-0.095]

45 , S1 and S2 have a unique derivation, because their lengths are less than n. [sent-81, score-0.06]

46 For any given sentence pair (e, f) and its alignment A, let (e0, f0) be the sentence pairs with all null-aligned words removed from (e, f). [sent-85, score-0.184]

47 The alignment skeleton AS is the alignment between (e0, f0) that preserves all links in A. [sent-86, score-0.458]

48 From Theorem 1 we know that every ITG align- ment has a unique LG derivation for its alignment skeleton (Figure 4 (c)). [sent-87, score-0.411]

49 0C1fe012 fe23fe 4 (a) (b) (c) Figure 4: Null-word attachment for the same alignment. [sent-91, score-0.046]

50 ((a) and (b) are spurious derivations under LG caused by null-aligned words attachment. [sent-92, score-0.522]

51 The dotted lines have omitted some unary rules for simplicity. [sent-94, score-0.031]

52 /f Figure 5: A Left heavy Grammar with Fixed Null-word attachment (LGFN). [sent-98, score-0.072]

53 no explicit correspondence in the other language and tend to stay unaligned. [sent-99, score-0.058]

54 These null-aligned words, also called singletons, should be attached to some other nodes in the derivation. [sent-100, score-0.045]

55 It will produce different derivations if those null-aligned words are attached by different rules, or to different nodes. [sent-101, score-0.231]

56 3 LGFN Grammar We propose here a new variant of ITG, denoted as LGFN (Figure 5). [sent-108, score-0.027]

57 Our grammar takes similar transition rules as LG and efficiently constrains the attachment of null-aligned words. [sent-109, score-0.222]

58 We will empirically compare those different grammars in the next section. [sent-110, score-0.09]

59 LGFN has a unique mapping from the derivation of any given ITG alignment A to the derivation of its alignment skeleton AS. [sent-112, score-0.697]

60 , Ctk0 , together with the aligned word-pair C00 that directly follows, to the node C exactly in the way of Equation 1. [sent-119, score-0.066]

61 ] (1) The mapping exists when every null-aligned sequence has an aligned word-pair after it. [sent-133, score-0.09]

62 Note that our grammar attaches null-aligned words in a right-branching manner, which means it builds the span only when there is an aligned wordpair. [sent-135, score-0.161]

63 After initialization, any newly-built span will contain at least one aligned word-pair. [sent-136, score-0.066]

64 LGFN has a unique derivation for each ITG alignment, i. [sent-141, score-0.162]

65 1 Synthetic Experiments We automatically generated 1000 fully aligned ITG alignments of length 20 by generating random permutations first and checking ITG constraints using a linear time algorithm (Zhang et al. [sent-146, score-0.164]

66 Sparser alignments were generated by random removal of alignment links according to a given null-aligned word ratio. [sent-148, score-0.307]

67 Four grammars were used to parse these alignments, namely LG (Wu, 1997), HaG (Haghighi et al. [sent-149, score-0.09]

68 Table 1 shows the average number of derivations per alignment generated under LG and HaG. [sent-153, score-0.37]

69 The number of derivations produced by LG increased dramatically because LG has no restrictions on nullaligned word attachment. [sent-154, score-0.259]

70 HaG also produced a large number of spurious derivations as the number of null-aligned words increased. [sent-155, score-0.522]

71 Both LiuG and LGFN produced a unique derivation for each alignment, as expected. [sent-156, score-0.162]

72 9+ Table 1: Average #derivations per alignment for LG and HaG v. [sent-163, score-0.184]

73 ) g)Pns it(mesrai6541230 0 0 0HL aF5G 10L Giu1G 5205 PPercenttage off nullll-a lliignedd wordds Figure 6: Total parsing time (in seconds) v. [sent-167, score-0.07]

74 the 10-best alignments for sentence pairs that have 10% of words unaligned, the top 109 HaG derivations should be generated, while the top 10 LiuG or LGFN derivations are already enough. [sent-170, score-0.47]

75 Figure 6 shows the total parsing time using each grammar. [sent-171, score-0.07]

76 LG and HaG showed better performances when most of the words were aligned because their grammars are simpler and less constrained. [sent-172, score-0.156]

77 However, when the number of null-aligned words increased, the parsing times for LG and HaG became much longer, caused by the calculation of the large number of spurious derivations. [sent-173, score-0.406]

78 The parsing times of LGFN and LiuG also slowly increased, but parsing LGFN consistently took less time than LiuG. [sent-175, score-0.14]

79 It should be noticed that the above results came from parsing according to some given alignment. [sent-176, score-0.07]

80 When searching without knowing the correct alignment, it is possible for every word to stay unaligned, which makes spurious ambiguity a much more serious issue. [sent-177, score-0.572]

81 2 Discriminative Learning Experiments To further study how spurious ambiguity affects the discriminative learning, we implemented a framework following Haghighi et al. [sent-179, score-0.565]

82 probabilities (collected from FBIS data), relative distances, matchings of high frequency words, matchings of pos-tags, etc. [sent-186, score-0.082]

83 For each sentence pair (e, f), we optimized with alignment results generated from the nbest parsing results. [sent-189, score-0.287]

84 We ran MIRA training for 20 iterations and evaluated the alignments of the best-scored derivations on the test set using the average weights. [sent-191, score-0.306]

85 We used the manually aligned Chinese-English corpus in NIST MT02 evaluation. [sent-192, score-0.066]

86 3% words stay null-aligned in each sentence, but if restricted to sure links the average ratio increases to 22. [sent-195, score-0.083]

87 Training with HaG only obtained similar results with 1best trained LGFN, which demonstrated that spurious ambiguity highly affected the nbest list here, resulting in a less accurate training. [sent-198, score-0.547]

88 Actually, the 20-best parsing using HaG only generated 4. [sent-199, score-0.07]

89 We also trained a similar discriminative model but extended the lexical rule of LGFN to accept at maximum 3 consecutive words. [sent-204, score-0.074]

90 The model was used to align FBIS data for machine translation experiments. [sent-205, score-0.024]

91 Without initializing by phrases extracted from existing alignments (Cherry and Lin, 2007) or using complicated block features (Haghighi et al. [sent-206, score-0.098]

92 , 2006) score over 5 test sets for a typical phrase-based translation system, Moses (Koehn et al. [sent-213, score-0.024]

93 5 Conclusion Great efforts have been made in reducing spurious ambiguities in parsing combinatory categorial grammar (Karttunen, 1986; Eisner, 1996). [sent-215, score-0.57]

94 However, to our knowledge, we give the first detailed analysis on spurious ambiguity of word alignment. [sent-216, score-0.514]

95 Empirical comparisons between different grammars also validates our analysis. [sent-217, score-0.09]

96 This paper makes its own contribution in demonstrating that spurious ambiguity has a negative impact on discriminative learning. [sent-218, score-0.565]

97 We will continue working on this line of research and improve our discriminative learning model in the future, for example, by adding more phrase level features. [sent-219, score-0.051]

98 It is worth noting that the definition of spurious ambiguity actually varies for different tasks. [sent-220, score-0.558]

99 It will also be interesting to explore spurious ambiguity and its effects in those different tasks. [sent-224, score-0.514]

100 Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. [sent-288, score-0.335]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lgfn', 0.426), ('itg', 0.371), ('spurious', 0.336), ('lg', 0.324), ('hag', 0.24), ('derivations', 0.186), ('alignment', 0.184), ('ambiguity', 0.178), ('monotone', 0.146), ('btg', 0.108), ('haai', 0.107), ('liug', 0.107), ('derivation', 0.102), ('alignments', 0.098), ('grammar', 0.095), ('branching', 0.094), ('grammars', 0.09), ('transduction', 0.086), ('haghighi', 0.08), ('theorem', 0.078), ('family', 0.078), ('aa', 0.078), ('inverted', 0.073), ('parsing', 0.07), ('aer', 0.066), ('aligned', 0.066), ('inversion', 0.066), ('skeleton', 0.065), ('concatenation', 0.062), ('unique', 0.06), ('stay', 0.058), ('haci', 0.053), ('hbai', 0.053), ('hbci', 0.053), ('hcai', 0.053), ('hcci', 0.053), ('discriminative', 0.051), ('stroudsburg', 0.051), ('nanjing', 0.047), ('synthetic', 0.046), ('attachment', 0.046), ('attached', 0.045), ('definition', 0.044), ('csk', 0.043), ('ba', 0.043), ('proof', 0.043), ('unaligned', 0.042), ('cherry', 0.042), ('wu', 0.041), ('synchronous', 0.041), ('matchings', 0.041), ('combinatory', 0.039), ('fbis', 0.039), ('bb', 0.036), ('nbest', 0.033), ('causes', 0.033), ('cb', 0.032), ('rules', 0.031), ('liu', 0.03), ('categorial', 0.03), ('hao', 0.03), ('bc', 0.029), ('pa', 0.028), ('transition', 0.027), ('variant', 0.027), ('exponentially', 0.026), ('dashed', 0.026), ('heavy', 0.026), ('increased', 0.026), ('snover', 0.025), ('lemma', 0.025), ('splitting', 0.025), ('cn', 0.025), ('links', 0.025), ('exists', 0.024), ('crammer', 0.024), ('right', 0.024), ('zhang', 0.024), ('translation', 0.024), ('restrictions', 0.024), ('radical', 0.023), ('voge', 0.023), ('karttunen', 0.023), ('nullaligned', 0.023), ('rightbranching', 0.023), ('shujie', 0.023), ('association', 0.023), ('efficiently', 0.023), ('ct', 0.023), ('ab', 0.023), ('rule', 0.023), ('bilingual', 0.023), ('och', 0.022), ('iterations', 0.022), ('contradiction', 0.022), ('contradicts', 0.022), ('jiajun', 0.022), ('saving', 0.022), ('gildea', 0.021), ('cc', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

Author: Shujian Huang ; Stephan Vogel ; Jiajun Chen

Abstract: Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in both search efficiency and accuracy. In this paper, we conduct a detailed study of the causes of spurious ambiguity and how it effects parsing and discriminative learning. We also propose a variant of the grammar which eliminates those ambiguities. Our grammar shows advantages over previous grammars in both synthetic and real-world experiments.

2 0.17771283 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

3 0.15604578 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

Author: Graham Neubig ; Taro Watanabe ; Eiichiro Sumita ; Shinsuke Mori ; Tatsuya Kawahara

Abstract: We present an unsupervised model for joint phrase alignment and extraction using nonparametric Bayesian methods and inversion transduction grammars (ITGs). The key contribution is that phrases of many granularities are included directly in the model through the use of a novel formulation that memorizes phrases generated not only by terminal, but also non-terminal symbols. This allows for a completely probabilistic model that is able to create a phrase table that achieves competitive accuracy on phrase-based machine translation tasks directly from unaligned sentence pairs. Experiments on several language pairs demonstrate that the proposed model matches the accuracy of traditional two-step word alignment/phrase extraction approach while reducing the phrase table to a fraction of the original size.

4 0.12866496 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

5 0.12106571 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

Author: Ning Xi ; Guangchao Tang ; Boyuan Li ; Yinggong Zhao

Abstract: In this paper, we present a new word alignment combination approach on language pairs where one language has no explicit word boundaries. Instead of combining word alignments of different models (Xiang et al., 2010), we try to combine word alignments over multiple monolingually motivated word segmentation. Our approach is based on link confidence score defined over multiple segmentations, thus the combined alignment is more robust to inappropriate word segmentation. Our combination algorithm is simple, efficient, and easy to implement. In the Chinese-English experiment, our approach effectively improved word alignment quality as well as translation performance on all segmentations simultaneously, which showed that word alignment can benefit from complementary knowledge due to the diversity of multiple and monolingually motivated segmentations. 1

6 0.11600646 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

7 0.11467126 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

8 0.10774905 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

9 0.097998105 141 acl-2011-Gappy Phrasal Alignment By Agreement

10 0.090822123 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

11 0.090384349 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

12 0.089887157 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

13 0.089098848 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

14 0.084427431 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

15 0.083199956 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

16 0.083067738 30 acl-2011-Adjoining Tree-to-String Translation

17 0.08042907 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

18 0.080325738 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations

19 0.079970099 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

20 0.077322759 61 acl-2011-Binarized Forest to String Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.167), (1, -0.156), (2, 0.071), (3, -0.009), (4, 0.029), (5, 0.001), (6, -0.07), (7, 0.016), (8, -0.049), (9, 0.011), (10, 0.08), (11, 0.114), (12, 0.017), (13, 0.077), (14, -0.084), (15, 0.034), (16, 0.12), (17, -0.013), (18, -0.092), (19, 0.02), (20, -0.032), (21, -0.009), (22, -0.096), (23, 0.003), (24, -0.006), (25, 0.053), (26, 0.027), (27, 0.037), (28, -0.017), (29, -0.07), (30, 0.024), (31, 0.045), (32, -0.023), (33, 0.019), (34, 0.014), (35, -0.016), (36, 0.002), (37, -0.008), (38, 0.005), (39, 0.059), (40, -0.049), (41, 0.05), (42, -0.051), (43, 0.031), (44, -0.053), (45, 0.015), (46, 0.102), (47, 0.002), (48, 0.084), (49, 0.081)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95621598 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

Author: Shujian Huang ; Stephan Vogel ; Jiajun Chen

Abstract: Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in both search efficiency and accuracy. In this paper, we conduct a detailed study of the causes of spurious ambiguity and how it effects parsing and discriminative learning. We also propose a variant of the grammar which eliminates those ambiguities. Our grammar shows advantages over previous grammars in both synthetic and real-world experiments.

2 0.76488316 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition

Author: John DeNero ; Klaus Macherey

Abstract: Unsupervised word alignment is most often modeled as a Markov process that generates a sentence f conditioned on its translation e. A similar model generating e from f will make different alignment predictions. Statistical machine translation systems combine the predictions of two directional models, typically using heuristic combination procedures like grow-diag-final. This paper presents a graphical model that embeds two directional aligners into a single model. Inference can be performed via dual decomposition, which reuses the efficient inference algorithms of the directional models. Our bidirectional model enforces a one-to-one phrase constraint while accounting for the uncertainty in the underlying directional models. The resulting alignments improve upon baseline combination heuristics in word-level and phrase-level evaluations.

3 0.76339746 141 acl-2011-Gappy Phrasal Alignment By Agreement

Author: Mohit Bansal ; Chris Quirk ; Robert Moore

Abstract: We propose a principled and efficient phraseto-phrase alignment model, useful in machine translation as well as other related natural language processing problems. In a hidden semiMarkov model, word-to-phrase and phraseto-word translations are modeled directly by the system. Agreement between two directional models encourages the selection of parsimonious phrasal alignments, avoiding the overfitting commonly encountered in unsupervised training with multi-word units. Expanding the state space to include “gappy phrases” (such as French ne ? pas) makes the alignment space more symmetric; thus, it allows agreement between discontinuous alignments. The resulting system shows substantial improvements in both alignment quality and translation quality over word-based Hidden Markov Models, while maintaining asymptotically equivalent runtime.

4 0.71296167 265 acl-2011-Reordering Modeling using Weighted Alignment Matrices

Author: Wang Ling ; Tiago Luis ; Joao Graca ; Isabel Trancoso ; Luisa Coheur

Abstract: In most statistical machine translation systems, the phrase/rule extraction algorithm uses alignments in the 1-best form, which might contain spurious alignment points. The usage ofweighted alignment matrices that encode all possible alignments has been shown to generate better phrase tables for phrase-based systems. We propose two algorithms to generate the well known MSD reordering model using weighted alignment matrices. Experiments on the IWSLT 2010 evaluation datasets for two language pairs with different alignment algorithms show that our methods produce more accurate reordering models, as can be shown by an increase over the regular MSD models of 0.4 BLEU points in the BTEC French to English test set, and of 1.5 BLEU points in the DIALOG Chinese to English test set.

5 0.71017128 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

Author: Jinxi Xu ; Jinying Chen

Abstract: Word alignment is a central problem in statistical machine translation (SMT). In recent years, supervised alignment algorithms, which improve alignment accuracy by mimicking human alignment, have attracted a great deal of attention. The objective of this work is to explore the performance limit of supervised alignment under the current SMT paradigm. Our experiments used a manually aligned ChineseEnglish corpus with 280K words recently released by the Linguistic Data Consortium (LDC). We treated the human alignment as the oracle of supervised alignment. The result is surprising: the gain of human alignment over a state of the art unsupervised method (GIZA++) is less than 1point in BLEU. Furthermore, we showed the benefit of improved alignment becomes smaller with more training data, implying the above limit also holds for large training conditions. 1

6 0.70624655 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

7 0.69234729 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction

8 0.67749172 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features

9 0.6412462 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation

10 0.61215633 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

11 0.54253358 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

12 0.52351516 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

13 0.52097017 154 acl-2011-How to train your multi bottom-up tree transducer

14 0.51642287 340 acl-2011-Word Alignment via Submodular Maximization over Matroids

15 0.50801736 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

16 0.50059795 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

17 0.49839076 180 acl-2011-Issues Concerning Decoding with Synchronous Context-free Grammar

18 0.48525023 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

19 0.48506162 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars

20 0.48294732 219 acl-2011-Metagrammar engineering: Towards systematic exploration of implemented grammars


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.028), (6, 0.274), (17, 0.064), (26, 0.022), (37, 0.081), (39, 0.04), (41, 0.08), (53, 0.016), (55, 0.034), (59, 0.039), (72, 0.053), (91, 0.037), (96, 0.136), (98, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.81601191 229 acl-2011-NULEX: An Open-License Broad Coverage Lexicon

Author: Clifton McFate ; Kenneth Forbus

Abstract: Broad coverage lexicons for the English language have traditionally been handmade. This approach, while accurate, requires too much human labor. Furthermore, resources contain gaps in coverage, contain specific types of information, or are incompatible with other resources. We believe that the state of open-license technology is such that a comprehensive syntactic lexicon can be automatically compiled. This paper describes the creation of such a lexicon, NU-LEX, an open-license feature-based lexicon for general purpose parsing that combines WordNet, VerbNet, and Wiktionary and contains over 100,000 words. NU-LEX was integrated into a bottom up chart parser. We ran the parser through three sets of sentences, 50 sentences total, from the Simple English Wikipedia and compared its performance to the same parser using Comlex. Both parsers performed almost equally with NU-LEX finding all lex-items for 50% of the sentences and Comlex succeeding for 52%. Furthermore, NULEX’s shortcomings primarily fell into two categories, suggesting future research directions. 1

same-paper 2 0.74057984 93 acl-2011-Dealing with Spurious Ambiguity in Learning ITG-based Word Alignment

Author: Shujian Huang ; Stephan Vogel ; Jiajun Chen

Abstract: Word alignment has an exponentially large search space, which often makes exact inference infeasible. Recent studies have shown that inversion transduction grammars are reasonable constraints for word alignment, and that the constrained space could be efficiently searched using synchronous parsing algorithms. However, spurious ambiguity may occur in synchronous parsing and cause problems in both search efficiency and accuracy. In this paper, we conduct a detailed study of the causes of spurious ambiguity and how it effects parsing and discriminative learning. We also propose a variant of the grammar which eliminates those ambiguities. Our grammar shows advantages over previous grammars in both synthetic and real-world experiments.

3 0.70913696 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis

Author: Amjad Abu-Jbara ; Dragomir Radev

Abstract: In this paper we present Clairlib, an opensource toolkit for Natural Language Processing, Information Retrieval, and Network Analysis. Clairlib provides an integrated framework intended to simplify a number of generic tasks within and across those three areas. It has a command-line interface, a graphical interface, and a documented API. Clairlib is compatible with all the common platforms and operating systems. In addition to its own functionality, it provides interfaces to external software and corpora. Clairlib comes with a comprehensive documentation and a rich set of tutorials and visual demos.

4 0.59323472 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

Author: Shasha Liao ; Ralph Grishman

Abstract: Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference. 1

5 0.59210062 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

Author: Yee Seng Chan ; Dan Roth

Abstract: In this paper, we observe that there exists a second dimension to the relation extraction (RE) problem that is orthogonal to the relation type dimension. We show that most of these second dimensional structures are relatively constrained and not difficult to identify. We propose a novel algorithmic approach to RE that starts by first identifying these structures and then, within these, identifying the semantic type of the relation. In the real RE problem where relation arguments need to be identified, exploiting these structures also allows reducing pipelined propagated errors. We show that this RE framework provides significant improvement in RE performance.

6 0.59034479 196 acl-2011-Large-Scale Cross-Document Coreference Using Distributed Inference and Hierarchical Models

7 0.5893814 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

8 0.58933049 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

9 0.58906633 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

10 0.58755147 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks

11 0.58607626 40 acl-2011-An Error Analysis of Relation Extraction in Social Media Documents

12 0.58416355 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

13 0.58413994 119 acl-2011-Evaluating the Impact of Coder Errors on Active Learning

14 0.5841378 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

15 0.58374619 311 acl-2011-Translationese and Its Dialects

16 0.58354998 58 acl-2011-Beam-Width Prediction for Efficient Context-Free Parsing

17 0.5822168 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment

18 0.58134252 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

19 0.581285 244 acl-2011-Peeling Back the Layers: Detecting Event Role Fillers in Secondary Contexts

20 0.58121383 37 acl-2011-An Empirical Evaluation of Data-Driven Paraphrase Generation Techniques