emnlp emnlp2013 emnlp2013-201 knowledge-graph by maker-knowledge-mining

201 emnlp-2013-What is Hidden among Translation Rules


Source: pdf

Author: Libin Shen ; Bowen Zhou

Abstract: Most of the machine translation systems rely on a large set of translation rules. These rules are treated as discrete and independent events. In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. We present a preliminary generative model to test this idea. Experimental results show about one point improvement on TER-BLEU over a strong baseline in Chinese-to-English translation.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 com Abstract Most of the machine translation systems rely on a large set of translation rules. [sent-3, score-0.406]

2 These rules are treated as discrete and independent events. [sent-4, score-0.098]

3 In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. [sent-5, score-0.296]

4 We present a preliminary generative model to test this idea. [sent-6, score-0.135]

5 , 2008), employ a large rule set that may contain tens of millions of translation rules or even more. [sent-11, score-0.434]

6 In these systems, each translation rule has about 20 dense features, which represent key statistics collected from the training data, such as word translation probability, phrase translation probability etc. [sent-12, score-0.82]

7 Except for these common features, there is no connection among the translation rules. [sent-13, score-0.232]

8 The use of sparse features as in (Arun and Koehn, 2007; Watanabe et al. [sent-15, score-0.08]

9 In their work, there are as many as 10,000 features defined on the appearance of certain frequent words and Part of Speech (POS) tags in rules. [sent-18, score-0.032]

10 However, these sparse features fire quite randomly 839 Bowen Zhou IBM T. [sent-20, score-0.08]

11 Thus, there is still plenty of space to better model translation rules. [sent-25, score-0.203]

12 In this paper, we will explore the relationship among translation rules. [sent-26, score-0.203]

13 We no longer view rules as discrete or unrelated events. [sent-27, score-0.098]

14 Instead, we view rules, which are observed from training data, as random variables generated by a hidden model. [sent-28, score-0.116]

15 All possible generative processes can be represented with factorized structures such as weighted hypergraphs and finite state machines. [sent-30, score-0.157]

16 This approach leads to a compact model that has better generalization capability and allows translation rules not explicitly observed in training date. [sent-31, score-0.417]

17 This paper reports work-in-progress to exploit hidden relations among rules. [sent-32, score-0.082]

18 2 Hidden Models Let G = {(r, f)} be a grammar observed from parallLeel training rd,afta),} w beheareg f mism mtaher frequency oofm a p bairlianl-gual translation rule r. [sent-34, score-0.444]

19 Let M be a hidden model that generates every traLnseltat Mion b ruel ea r. [sent-35, score-0.082]

20 Fdeorn example, aMt g ceoneulrdat e bse meveodryterlaends wlatiithon a weighted hypergraph or f cioniuteld st baete m machine. [sent-36, score-0.113]

21 For the sake of convenience, in this section we assume M is a meta-grammar M = {m}, where ewaecahs m represents a emtae-tga-rarmulme. [sent-37, score-0.062]

22 Fro Mr ea =ch { tmra}n,sl wahtieorne r, there exists a hypergraph Hr that represents all possible derivations Dr = {d} that can generate rule r. [sent-38, score-0.315]

23 Here, each derivation d= i s{ a hyperpath using me reutlaerules Md, where Md ⊆ M. [sent-39, score-0.063]

24 Thus, we can use hypergraph Hr to charac⊆teriz Me r. [sent-40, score-0.085]

25 TThrauns,sl wateio cna nr u ulesse ei nh yGProceSe datintlges, o Wfa tsh ein 2g01to3n, C UoSnfAe,re 1n8c-e2 o1n O Ecmtopbier ic 2a0l1 M3. [sent-41, score-0.334]

26 hc o2d0s1 i3n A Nsastoucria lti Loan fgoura Cgoem Ppruotcaetsiosin agl, L piang eusis 8t3ic9s–84 , can share nodes and meta-rules in their hypergraphs, so that M is more compact model than G. [sent-43, score-0.082]

27 uce three methods to quantify Hr as features of rule r. [sent-45, score-0.133]

28 It should be noted that there are more ways to exploit the compact model of M than these three. [sent-46, score-0.125]

29 1 Type 1: A Generative Model Let θ be the parameters of a statistical model Pr(m; θ) for meta-rules m in meta-grammar M estPimr(amted;θ f)r foomr mtheet o-brusleersve md tnra mnseltaat-igorna grammar Gs. [sent-48, score-0.159]

30 tTimhea probability oef o a strearnvseldat itorann rslualeti r can m bem caarlc uG-. [sent-49, score-0.032]

31 Pr(r; θ) ∝ = Pr(Hr; θ) dX∈DrPr(d;θ) (1) By assuming separability, Pr(d;θ) = Y Pr(m;θ) (2) m∈YMd we can further decompose rule probability Pr(r; θ) as below. [sent-51, score-0.2]

32 Pr(r;θ) = X Y Pr(m;θ) d∈XDr m∈YMd (3) In practice, Pr(r; θ) in (3) can be calculated through bottom-up dynamic programming on hypergraph Hr. [sent-52, score-0.085]

33 Hypergraphs of different rules can share nodes and meta-rules. [sent-53, score-0.098]

34 This reveals the underlying relationship among translation rules. [sent-54, score-0.263]

35 As a by-product of this generative model, we use the log-likelihood of a translation rule, log Pr(r; θ), as a new dense feature. [sent-55, score-0.351]

36 2 Type 2 : Meta-Rules as Sparse Features As given in (3), likelihood of a translation rule is a function over Pr(m; θ), in which θ is estimated from the training data with a generative model. [sent-58, score-0.438]

37 Following this practice, we treat each meta-rule m as a sparse feature. [sent-61, score-0.08]

38 Feature value f(m) = 1 if 840 and only if m is used in hypergraph Hr. [sent-62, score-0.085]

39 3 Type 3 : Posterior as Feature Values A natural question on the binary sparse features de- fined above is why all the active features have the same value of 1. [sent-67, score-0.08]

40 We use these meta-rules to represent a translation rule in feature space. [sent-68, score-0.336]

41 Intuitively, for meta-rules with closer connection to the translation rules, we hope to use relatively larger feature values to increase their effect. [sent-69, score-0.232]

42 We formalize this intuition with the posterior probability that a meta-rule m is used to generate r, as below. [sent-70, score-0.102]

43 f(m) ≡ = = Pr(m|r; θ) Pr(m, r; θ) Pr(r;θ) (4) Pd∈Dr,m∈MdPr(d;θ) Pr(r; θ) The posterior in (4) could be too sharp. [sent-71, score-0.07]

44 Following the common practice, we smooth the posterior features with a scaling factor α. [sent-72, score-0.07]

45 f(m) ≡ Pr(m|r)α We use Type 3(α) to represent the posterior model with a scaling factor of α in experiments. [sent-73, score-0.07]

46 With proper definition of the underlying model M, we can eesrtim deaftien tθi ownit ohf ft thhee etr uadnditeirolnyailn gEM m algorithm or Bayesian methods. [sent-77, score-0.107]

47 In the next section, we will present an example of the hidden model. [sent-78, score-0.082]

48 Here, translation rules and their frequencies in G are observed data, an rudl dse arinvdat tihoeni rd f rfeoqru eeancchi rsu ilen r i sa heid odbesne. [sent-80, score-0.335]

49 r eAdt the Expectation step, we search all derivations d in Dr of each rule r and calculate their probabilities according to equation (2). [sent-81, score-0.199]

50 At the Maximization step, we re-estimate θ on all derivations in proportion to their posterior probability. [sent-82, score-0.136]

51 3 Case Study In Section 2, we explored the use of meta-grammars as the underlying model M and developed three mase tthheods u ntdoe drleyfiinneg f meaotudreels . [sent-83, score-0.06]

52 M MSiam nidlar d techniques can be applied to finite state machines and other underlying models. [sent-84, score-0.06]

53 Now, we introduce a POS-based underlying model to illustrate the generic model proposed in Section 2. [sent-85, score-0.06]

54 1 Meta-rules on POS tags Let r ∈ G be a translation rule composed of a pair of source an Gd b target wnsolardti strings (Fw, Ew). [sent-88, score-0.463]

55 oLfe ta Fp arn odf Ep be the POS tags for the source and target sides respectively. [sent-89, score-0.159]

56 For the sake of simplicity as the first attempt, we treat non-terminal as a special word X with POS tag X. [sent-90, score-0.075]

57 Suppose we have a Chinese-to-English translation rule as below. [sent-91, score-0.336]

58 yuehan qu zhijiage ⇒ john leaves for chicago We call NR VV NR ⇒ NNP VBZ IN NNP (5) a translation rule in POS tags. [sent-92, score-0.336]

59 We will propose an underlying model M to generaWtee etr wainl sl aptrioopno sruel easn uinn dPeOrSly tags inodsteelad M Mof t otra gnesn-lation rules themselves. [sent-93, score-0.293]

60 For the rest of this section, we take translation rules in POS tags as the target of our generative model. [sent-94, score-0.476]

61 We define meta-rules on pairs of POS tag strings, e. [sent-95, score-0.044]

62 ion rule in (5) into a product on meta-rule probabilities via various derivations, such as • Pr(NR VV, NNP VBZ) Pr(NR, IVN, NNP), VaBndZ Pr(NR, NNP) Pr( VV, VBZ IN) Pr(NR, NNP). [sent-100, score-0.133]

63 2 The Underlying Model and Features Now, we introduce a generative model M for translation rules in POS tags. [sent-102, score-0.403]

64 We still use the example in (5) as shown in Figure 1, where the top box represents the source side and the bottom box represents the target side. [sent-103, score-0.257]

65 • 841 Figure 1: An example × We first generate the number of source tokens of a translation rule with a uniform distribution for up to, for example, 7 tokens. [sent-105, score-0.39]

66 Then we split the source side into chunks with a binomial distribution with a Bernoulli variable at the gap between each two continuous words, which splits the two words into two chunks with a probability of p. [sent-106, score-0.289]

67 For example, the probability of obtaining two chunks NR VV and NR is (1 − p)p, as shown in Figure u1n. [sent-107, score-0.091]

68 Suppose we split the target side into two parts, NNP VBZ and IN NNP, which respects the word alignments. [sent-108, score-0.079]

69 TNNheP probability fRor ⇒ ⇒th IeN Nfi NrstN mPe,t aas- sruholew isn Pr(|E| = 2 | |F| = 2) Pr(NR VV, NNP VBZ | |F| = 2, |E| = 2), where |F| represents the number of source tokens, awnhde |E| Fth|e r enpurmesbeenrt so tfh target tboekren osf. [sent-111, score-0.158]

70 Similarly, tnhse, probability o nfu utmheb eserc oofnd ta one tios as nfso. [sent-112, score-0.06]

71 As for sparse features, we will obtain 7 meta-rule features as below. [sent-117, score-0.08]

72 3 Implementation Details Even though the size of all possible meta-rules is much smaller than the space of translation rules, it is still too large to work with existing optimization methods for sparse features in MT, i. [sent-120, score-0.283]

73 Specifically, we first divide all the meta-rules into 100 bins, (|F| , |E|), where |F| is the number of words on the source side, ahnedre |E| |th ise target side, o0f < |F| , |E| ≤ 1so0u. [sent-127, score-0.123]

74 842 A shortcoming of this filtering method is that all these features are positive indicators, while lowfrequency negative indicators are discarded. [sent-129, score-0.06]

75 Test-1 is from a similar source of the tune set, and it contains 1239 sentences. [sent-140, score-0.054]

76 The baseline rule set contains about 17 million rules. [sent-143, score-0.133]

77 It contains about 40 dense features, including a 6-gram LM. [sent-144, score-0.046]

78 The sparse feature optimization algorithm is similar to the MIRA recipe described in (Chiang et al. [sent-145, score-0.08]

79 It should be noted that, even though our metric of tuning is T-B, the baseline system already provides a very competitive BLEU score on MT08-WB as compared the best system in the evaluation1 , thanks to comprehensive features in the baseline system and more data in training. [sent-151, score-0.043]

80 When we use meta-rules as binary sparse features in Type 2, we obtain about one point improvement on T-B on both sets. [sent-156, score-0.08]

81 This shows the advantage of tuning individual meta-rule weights over a generative model. [sent-157, score-0.102]

82 5 Discussion In the case study of Section 3, we use POS-based rules as hidden states. [sent-161, score-0.18]

83 However, it should be noted that the hidden structures surely do not have to be POS tags. [sent-162, score-0.125]

84 For example, an alternative could be unsupervised NT splitting similar to (Huang et al. [sent-163, score-0.039]

85 The meta-grammar based approach was also motivated by the insight acquired on mono-lingual linguistic grammar generation, especially in the TAG related research (Xia, 2001; Prolo, 2002). [sent-165, score-0.104]

86 (DeNeefe and Knight, 2009) re-visited the use of adjoining operation in the context of Statistical MT, and reported encouraging results. [sent-169, score-0.095]

87 gov/iad/mig/tests/mt/2008/ 843 hand, (Dras, 1999) showed how a meta-level grammar could help in modeling parallel operations in (Shieber and Schabes, 1990). [sent-173, score-0.074]

88 Our work is another effort of statistical modeling of well-recognized linguistic insight in NLP and MT. [sent-174, score-0.064]

89 6 Conclusions and Future Work In this paper, we introduced a novel method to model translation rules as observed generation output of a compact hidden model. [sent-175, score-0.499]

90 As a case study to capitalize this model, we presented three methods to enrich rule modeling with features defined on a hidden model. [sent-176, score-0.215]

91 • To try other prior distributions to generate the nTuom trbyer o othfe source t doiksetnrisb. [sent-179, score-0.054]

92 a To incorporate rich models into the generative process, e. [sent-181, score-0.102]

93 To improve the posterior model with better paTraom iemteprr estimation, e. [sent-184, score-0.07]

94 To replace the exhaustive translation rule set Twoith r a compact emxhetaau grammar stlhatatio can clree asteet and parameterize new translation rules dynamically, which is the ultimate goal of this line of work. [sent-187, score-0.793]

95 Online learning methods for discriminative training of phrase based statistical machine translation. [sent-195, score-0.034]

96 A meta-level grammar: redefining synchronous tag for translation and paraphrase. [sent-214, score-0.321]

97 Soft syntactic constraints for hierarchical phrase-based translation using latent syntactic distributions. [sent-218, score-0.203]

98 SPMT: Statistical machine translation with syntactified target language phrases. [sent-241, score-0.272]

99 A new string-to-dependency machine translation algorithm with a target dependency language model. [sent-262, score-0.244]

100 A study of translation edit rate with targeted human annotation. [sent-272, score-0.203]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('nnp', 0.426), ('pr', 0.387), ('nr', 0.334), ('vbz', 0.307), ('vv', 0.244), ('translation', 0.203), ('rule', 0.133), ('ymd', 0.128), ('generative', 0.102), ('rules', 0.098), ('shieber', 0.096), ('adjoining', 0.095), ('hypergraph', 0.085), ('chiang', 0.084), ('hidden', 0.082), ('compact', 0.082), ('sparse', 0.08), ('schabes', 0.076), ('grammar', 0.074), ('hr', 0.073), ('type', 0.072), ('posterior', 0.07), ('derivations', 0.066), ('cmejrek', 0.064), ('joshi', 0.061), ('underlying', 0.06), ('em', 0.059), ('chunks', 0.059), ('pos', 0.055), ('fm', 0.055), ('hypergraphs', 0.055), ('source', 0.054), ('md', 0.051), ('matsoukas', 0.051), ('deneefe', 0.051), ('snover', 0.051), ('shen', 0.048), ('bowen', 0.047), ('etr', 0.047), ('binomial', 0.047), ('mt', 0.047), ('dense', 0.046), ('synchronous', 0.046), ('arun', 0.045), ('fth', 0.045), ('tag', 0.044), ('noted', 0.043), ('target', 0.041), ('ibm', 0.04), ('splitting', 0.039), ('side', 0.038), ('libin', 0.038), ('mira', 0.038), ('dr', 0.038), ('yves', 0.036), ('derivation', 0.035), ('watanabe', 0.035), ('decompose', 0.035), ('observed', 0.034), ('statistical', 0.034), ('preliminary', 0.033), ('categorical', 0.033), ('och', 0.033), ('bleu', 0.033), ('probability', 0.032), ('sides', 0.032), ('indicators', 0.032), ('fs', 0.032), ('tags', 0.032), ('alignment', 0.032), ('practice', 0.032), ('aravind', 0.031), ('koehn', 0.031), ('marcu', 0.031), ('box', 0.031), ('sake', 0.031), ('represents', 0.031), ('insight', 0.03), ('connection', 0.029), ('papineni', 0.029), ('knight', 0.028), ('spmt', 0.028), ('syntactified', 0.028), ('nfu', 0.028), ('charac', 0.028), ('lowfrequency', 0.028), ('metagrammar', 0.028), ('ahnedre', 0.028), ('tnhse', 0.028), ('pioneer', 0.028), ('eoft', 0.028), ('redefining', 0.028), ('americas', 0.028), ('micciulla', 0.028), ('helsinki', 0.028), ('atht', 0.028), ('uinn', 0.028), ('otra', 0.028), ('hyperpath', 0.028), ('bse', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 201 emnlp-2013-What is Hidden among Translation Rules

Author: Libin Shen ; Bowen Zhou

Abstract: Most of the machine translation systems rely on a large set of translation rules. These rules are treated as discrete and independent events. In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. We present a preliminary generative model to test this idea. Experimental results show about one point improvement on TER-BLEU over a strong baseline in Chinese-to-English translation.

2 0.22223173 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

3 0.20308894 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

Author: Martin Cmejrek ; Haitao Mi ; Bowen Zhou

Abstract: Machine translation benefits from system combination. We propose flexible interaction of hypergraphs as a novel technique combining different translation models within one decoder. We introduce features controlling the interactions between the two systems and explore three interaction schemes of hiero and forest-to-string models—specification, generalization, and interchange. The experiments are carried out on large training data with strong baselines utilizing rich sets of dense and sparse features. All three schemes significantly improve results of any single system on four testsets. We find that specification—a more constrained scheme that almost entirely uses forest-to-string rules, but optionally uses hiero rules for shorter spans—comes out as the strongest, yielding improvement up to 0.9 (T -B )/2 points. We also provide a detailed experimental and qualitative analysis of the results.

4 0.1809127 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

5 0.16029817 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

Author: Kuzman Ganchev ; Dipanjan Das

Abstract: We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resourceimpoverished target language that incorporates soft constraints via posterior regularization. To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. We show improvements over strong baselines for two tasks: part-of-speech tagging and namedentity segmentation.

6 0.14931226 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

7 0.1492521 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

8 0.12378244 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

9 0.11017966 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

10 0.10547532 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

11 0.1019611 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

12 0.10140709 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

13 0.092606962 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

14 0.082288906 2 emnlp-2013-A Convex Alternative to IBM Model 2

15 0.081985228 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

16 0.07662005 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

17 0.07547047 145 emnlp-2013-Optimal Beam Search for Machine Translation

18 0.075432539 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

19 0.075022385 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

20 0.073910482 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.227), (1, -0.254), (2, 0.101), (3, 0.061), (4, 0.081), (5, -0.061), (6, -0.018), (7, 0.01), (8, 0.075), (9, 0.051), (10, -0.058), (11, -0.039), (12, -0.015), (13, 0.07), (14, -0.026), (15, 0.004), (16, 0.043), (17, 0.042), (18, 0.053), (19, -0.065), (20, -0.049), (21, 0.079), (22, -0.037), (23, -0.148), (24, -0.048), (25, -0.04), (26, -0.04), (27, -0.151), (28, -0.019), (29, -0.027), (30, -0.104), (31, -0.11), (32, -0.012), (33, -0.008), (34, 0.199), (35, -0.137), (36, 0.109), (37, 0.146), (38, -0.013), (39, 0.061), (40, 0.033), (41, 0.015), (42, 0.066), (43, 0.055), (44, -0.102), (45, -0.039), (46, -0.019), (47, -0.045), (48, 0.118), (49, 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94900584 201 emnlp-2013-What is Hidden among Translation Rules

Author: Libin Shen ; Bowen Zhou

Abstract: Most of the machine translation systems rely on a large set of translation rules. These rules are treated as discrete and independent events. In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. We present a preliminary generative model to test this idea. Experimental results show about one point improvement on TER-BLEU over a strong baseline in Chinese-to-English translation.

2 0.83118874 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

Author: Martin Cmejrek ; Haitao Mi ; Bowen Zhou

Abstract: Machine translation benefits from system combination. We propose flexible interaction of hypergraphs as a novel technique combining different translation models within one decoder. We introduce features controlling the interactions between the two systems and explore three interaction schemes of hiero and forest-to-string models—specification, generalization, and interchange. The experiments are carried out on large training data with strong baselines utilizing rich sets of dense and sparse features. All three schemes significantly improve results of any single system on four testsets. We find that specification—a more constrained scheme that almost entirely uses forest-to-string rules, but optionally uses hiero rules for shorter spans—comes out as the strongest, yielding improvement up to 0.9 (T -B )/2 points. We also provide a detailed experimental and qualitative analysis of the results.

3 0.77253407 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

Author: Jesus Gonzalez-Rubio ; Daniel Ortiz-Martinez ; Jose-Miguel Benedi ; Francisco Casacuberta

Abstract: Current automatic machine translation systems are not able to generate error-free translations and human intervention is often required to correct their output. Alternatively, an interactive framework that integrates the human knowledge into the translation process has been presented in previous works. Here, we describe a new interactive machine translation approach that is able to work with phrase-based and hierarchical translation models, and integrates error-correction all in a unified statistical framework. In our experiments, our approach outperforms previous interactive translation systems, and achieves estimated effort reductions of as much as 48% relative over a traditional post-edition system.

4 0.72665274 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

5 0.6634382 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation

Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib

Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.

6 0.62944919 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

7 0.55038726 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

8 0.54053801 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

9 0.52493733 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation

10 0.49261621 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

11 0.47791323 145 emnlp-2013-Optimal Beam Search for Machine Translation

12 0.47636709 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

13 0.45101327 2 emnlp-2013-A Convex Alternative to IBM Model 2

14 0.4291935 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

15 0.42582422 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

16 0.42274562 57 emnlp-2013-Dependency-Based Decipherment for Resource-Limited Machine Translation

17 0.40303749 156 emnlp-2013-Recurrent Continuous Translation Models

18 0.38792923 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases

19 0.38482612 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge

20 0.37694201 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.027), (18, 0.017), (22, 0.032), (26, 0.016), (30, 0.103), (50, 0.011), (51, 0.138), (66, 0.483), (71, 0.011), (75, 0.012), (77, 0.044)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92623514 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

Author: Shi Feng ; Le Zhang ; Binyang Li ; Daling Wang ; Ge Yu ; Kam-Fai Wong

Abstract: Extensive experiments have validated the effectiveness of the corpus-based method for classifying the word’s sentiment polarity. However, no work is done for comparing different corpora in the polarity classification task. Nowadays, Twitter has aggregated huge amount of data that are full of people’s sentiments. In this paper, we empirically evaluate the performance of different corpora in sentiment similarity measurement, which is the fundamental task for word polarity classification. Experiment results show that the Twitter data can achieve a much better performance than the Google, Web1T and Wikipedia based methods.

2 0.87099832 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases

Author: Victor Chahuneau ; Eva Schlinger ; Noah A. Smith ; Chris Dyer

Abstract: Translation into morphologically rich languages is an important but recalcitrant problem in MT. We present a simple and effective approach that deals with the problem in two phases. First, a discriminative model is learned to predict inflections of target words from rich source-side annotations. Then, this model is used to create additional sentencespecific word- and phrase-level translations that are added to a standard translation model as “synthetic” phrases. Our approach relies on morphological analysis of the target language, but we show that an unsupervised Bayesian model of morphology can successfully be used in place of a supervised analyzer. We report significant improvements in translation quality when translating from English to Russian, Hebrew and Swahili.

same-paper 3 0.85601729 201 emnlp-2013-What is Hidden among Translation Rules

Author: Libin Shen ; Bowen Zhou

Abstract: Most of the machine translation systems rely on a large set of translation rules. These rules are treated as discrete and independent events. In this short paper, we propose a novel method to model rules as observed generation output of a compact hidden model, which leads to better generalization capability. We present a preliminary generative model to test this idea. Experimental results show about one point improvement on TER-BLEU over a strong baseline in Chinese-to-English translation.

4 0.6801939 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification

Author: Jun-Ping Ng ; Min-Yen Kan ; Ziheng Lin ; Wei Feng ; Bin Chen ; Jian Su ; Chew Lim Tan

Abstract: In this paper we classify the temporal relations between pairs of events on an article-wide basis. This is in contrast to much of the existing literature which focuses on just event pairs which are found within the same or adjacent sentences. To achieve this, we leverage on discourse analysis as we believe that it provides more useful semantic information than typical lexico-syntactic features. We propose the use of several discourse analysis frameworks, including 1) Rhetorical Structure Theory (RST), 2) PDTB-styled discourse relations, and 3) topical text segmentation. We explain how features derived from these frameworks can be effectively used with support vector machines (SVM) paired with convolution kernels. Experiments show that our proposal is effective in improving on the state-of-the-art significantly by as much as 16% in terms of F1, even if we only adopt less-than-perfect automatic discourse analyzers and parsers. Making use of more accurate discourse analysis can further boost gains to 35%.

5 0.56527191 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

Author: Svitlana Volkova ; Theresa Wilson ; David Yarowsky

Abstract: Theresa Wilson Human Language Technology Center of Excellence Johns Hopkins University Baltimore, MD t aw@ j hu .edu differences may Different demographics, e.g., gender or age, can demonstrate substantial variation in their language use, particularly in informal contexts such as social media. In this paper we focus on learning gender differences in the use of subjective language in English, Spanish, and Russian Twitter data, and explore cross-cultural differences in emoticon and hashtag use for male and female users. We show that gender differences in subjective language can effectively be used to improve sentiment analysis, and in particular, polarity classification for Spanish and Russian. Our results show statistically significant relative F-measure improvement over the gender-independent baseline 1.5% and 1% for Russian, 2% and 0.5% for Spanish, and 2.5% and 5% for English for polarity and subjectivity classification.

6 0.5590266 143 emnlp-2013-Open Domain Targeted Sentiment

7 0.52334553 83 emnlp-2013-Exploring the Utility of Joint Morphological and Syntactic Learning from Child-directed Speech

8 0.5166328 19 emnlp-2013-Adaptor Grammars for Learning Non-Concatenative Morphology

9 0.51607549 30 emnlp-2013-Automatic Extraction of Morphological Lexicons from Morphologically Annotated Corpora

10 0.50527704 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

11 0.5032385 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

12 0.50109243 99 emnlp-2013-Implicit Feature Detection via a Constrained Topic Model and SVM

13 0.50103813 181 emnlp-2013-The Effects of Syntactic Features in Automatic Prediction of Morphology

14 0.49850249 8 emnlp-2013-A Joint Learning Model of Word Segmentation, Lexical Acquisition, and Phonetic Variability

15 0.49772397 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

16 0.49346745 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

17 0.49102485 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models

18 0.4854086 156 emnlp-2013-Recurrent Continuous Translation Models

19 0.48368919 113 emnlp-2013-Joint Language and Translation Modeling with Recurrent Neural Networks

20 0.48317167 187 emnlp-2013-Translation with Source Constituency and Dependency Trees