emnlp emnlp2013 emnlp2013-101 knowledge-graph by maker-knowledge-mining

101 emnlp-2013-Improving Alignment of System Combination by Using Multi-objective Optimization

Source: pdf

Author: Tian Xia ; Zongcheng Ji ; Shaodan Zhai ; Yidong Chen ; Qun Liu ; Shaojun Wang

Abstract: This paper proposes a multi-objective optimization framework which supports heterogeneous information sources to improve alignment in machine translation system combination techniques. In this area, most of techniques usually utilize confusion networks (CN) as their central data structure to compact an exponential number of an potential hypotheses, and because better hypothesis alignment may benefit constructing better quality confusion networks, it is natural to add more useful information to improve alignment results. However, these information may be heterogeneous, so the widely-used Viterbi algorithm for searching the best alignment may not apply here. In the multi-objective optimization framework, each information source is viewed as an independent objective, and a new goal of improving all objectives can be searched by mature algorithms. The solutions from this framework, termed Pareto optimal solutions, are then combined to construct confusion networks. Experiments on two Chinese-to-English translation datasets show significant improvements, 0.97 and 1.06 BLEU points over a strong Indirected Hidden Markov Model-based (IHMM) system, and 4.75 and 3.53 points over the best single machine translation systems.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu + ++ Abstract This paper proposes a multi-objective optimization framework which supports heterogeneous information sources to improve alignment in machine translation system combination techniques. [sent-13, score-0.561]

2 However, these information may be heterogeneous, so the widely-used Viterbi algorithm for searching the best alignment may not apply here. [sent-15, score-0.356]

3 In the multi-objective optimization framework, each information source is viewed as an independent objective, and a new goal of improving all objectives can be searched by mature algorithms. [sent-16, score-0.303]

4 The solutions from this framework, termed Pareto optimal solutions, are then combined to construct confusion networks. [sent-17, score-0.343]

5 Experiments on two Chinese-to-English translation datasets show significant improvements, 0. [sent-18, score-0.047]

6 1 Introduction System combination (SC) techniques power of boosting translation quality in several percent over the best among all chine translation systems (Bangalore et have the BLEU by input maal. [sent-23, score-0.121]

7 A central data structure in the SC is the confusion network, and its quality greatly affects the final performance. [sent-34, score-0.187]

8 (2008) pro- posed a new hypothesis alignment algorithm for constructing high-quality confusion networks called Indirect Hidden Markov Model (IHMM), which does better in synonym matching compared with the classic translation edit rate (TER) based algorithm (Rosti et al. [sent-36, score-0.76]

9 Now, current state-of-the-art SC systems have been using IHMM or variants in their alignment algorithms more or less (Li et al. [sent-40, score-0.295]

10 Our motivation derives from an observation that in an ideal alignment ofa pair of sentences, many-tomany alignments often exist. [sent-43, score-0.381]

11 IHMM for system combination, HMM in GIZA++ software for statistical machine translation (SMT) (Och and Ney, 2000; Koehn et al. [sent-47, score-0.075]

12 However, it appears to be intractable in an IHMM model to search the optimal solution by simply defining a new goal as a product of probabilities ProceSe datintlges, o Wfa tsh ein 2g01to3n, C UoSnfAe,re 1n8c-e2 o1n O Ecmtopbier ic 2a0l1 M3. [sent-49, score-0.113]

13 (2006) adopts a simple and effective variational inference algorithm. [sent-53, score-0.033]

14 Further, different alignment algorithms capture different information and linguistic phenomena for a pair of sentences, hence more information would be expected to benefit the final alignment. [sent-54, score-0.316]

15 Liang’s method may not be suitable for this expected outcome. [sent-55, score-0.045]

16 We propose to adopt multi-objective optimization framework to support heterogeneous information sources which may induce difficulties in a conventional search algorithm. [sent-56, score-0.167]

17 In this framework, there exist a variety of matured multi-objective optimization algorithms, e. [sent-57, score-0.118]

18 In this work, we select the multi-objective evolutionary al- gorithm because of its public open source software (http://www. [sent-62, score-0.152]

19 On the other hand, this framework is also totally unsupervised. [sent-67, score-0.03]

20 This framework views any useful information benefiting alignment as an independent objective, and researchers just need to write short codes for objective definitions. [sent-69, score-0.325]

21 The search algorithm seeks for potentially better solutions which are no worse than the current solution set. [sent-70, score-0.147]

22 The output from multiobjective optimization algorithms includes a set of solutions, called Pareto optimal solutions, each one being a many-to-many alignment. [sent-71, score-0.204]

23 We then combine and normalize them into a unique one-to-one alignment to perform confusion network construction (Section 3. [sent-72, score-0.542]

24 Our work is conducted on the classic pipeline which has three modules, pair-wise hypothesis alignment, confusion network construction, and training. [sent-74, score-0.365]

25 Now many work integrates neighboring modules to avoid propagated errors to gain improved performance. [sent-75, score-0.088]

26 (2009) combine the first and the second module, and He and Toutanova (2009) combine all modules into one directly. [sent-78, score-0.061]

27 Because of the independence between modules, a system is relatively 536 simple to maintain, and improvements on each module might contribute to final performance additively. [sent-80, score-0.076]

28 (2009) in the second module adopts a different data structure called lattice which could directly use our better many-to-many alignment for construction. [sent-84, score-0.404]

29 Experiments on the Chinese-to-English task on two datasets use four objectives, IHMM probability (Section 3. [sent-85, score-0.053]

30 Results show multi-objective optimization framework efficiently integrates different information to gain approximately 1 BLEU point improvement over a strong baseline. [sent-90, score-0.153]

31 2 Background We briefly give an introduction to confusion networks, and because the IHMM based alignment is an important objective in our multi-objective framework, here we also provide detailed definition of formulas for completeness of content. [sent-91, score-0.537]

32 1 Confusion Network Table 1 shows hypotheses h1 and h2 are aligned to selected backbone h0. [sent-93, score-0.283]

33 When alignment algorithm obtains good enough results, the expected output “he prefers apples ” is included in its corresponding confusion network in Figure 1. [sent-94, score-0.7]

34 This suggests developing better alignment algorithm may help creating high-quality confusion networks. [sent-95, score-0.51]

35 This also motivates us to use the BLEU of oracle hypotheses to approximately measure the quality of a set of CNs. [sent-96, score-0.091]

36 h0:hefeelslikeapples h1:hepreferεapples h2 :him prefers to apples Table 1: A toy example of hypothesis alignment, where h0 is the backbone hypothesis. [sent-100, score-0.347]

37 A confusion network G = (V, E) is a directed acyclic graph with a unique source and sink vertex, feel like ? [sent-103, score-0.297]

38 Figure 1: A classic confusion network, and the bold path the expected output. [sent-108, score-0.263]

39 Compared with TER-based alignment performing literal matching, IHMM supports synonym comparison in redefining emission probabilities in an IHMM model. [sent-115, score-0.445]

40 eJ) be a hypothesis aligned to the backbone, both being English sentences in our experiments. [sent-122, score-0.105]

41 Suppose the ajth wo=rd { ian fIis aligned to jth mweonrdt. [sent-127, score-0.069]

42 1B1ec bauucseke tas,ll (th≤e hypotheses (in− system )c,ocm(≥bin 6a). [sent-132, score-0.066]

43 - tion are in the same language, the IHMM model would support more monotonic alignments, and non-monotonic alignments will be penalized. [sent-133, score-0.053]

44 psem(e|f) ≈ X pdic(c|f) · pdic(e|c) (6) cX∈src Note that psem(e|f) has been updated with different source sentences. [sent-139, score-0.028]

45 The surface similarity (e|f) is measured by the literal matching rate: psur psur(e,f) = exp{ρ[mLaMx(P|(ff|,,e|e)|)− 1]} (7) where LMP(f, e) is the length of the longest matched prefix, and ρ is a smoothing parameter. [sent-140, score-0.126]

46 One natural way is to scalarize multiple objectives into one by assigning it with a weight vector. [sent-142, score-0.157]

47 This method allows a simple optimization algorithm in many cases, while in system combination, it would cause problems. [sent-143, score-0.124]

48 In the first module, in order to train suitable weights of objectives, extra labeled data is needed, besides that, the efficient Viterbi algorithm for searching the optimal alignment would not work for the alignment objectives in this work. [sent-144, score-0.896]

49 More, the parameter training in the third module relies on the CNs constructed from the output of the first module, which increases the instability of the whole system. [sent-145, score-0.098]

50 Therefore, an unsupervised multi-objective algorithm may be a good choice allowing for more alignment information. [sent-146, score-0.323]

51 There exist other alternative optimization algorithms in the multi-objective optimization framework, though the evolutionary algorithm is adopted here, we only introduce some general concepts. [sent-147, score-0.363]

52 1 Pareto Optimal Solutions A general multi-objective optimization problem consists of a number of objectives and is associated with a number of constraints. [sent-149, score-0.253]

53 All the functions fi, gj , hk map a solution x into a scalar. [sent-162, score-0.104]

54 In this work, we refer to x = {xi,j |xi,j ∈ {0, 1}} as a potential alignment oo xf a pair |oxf hypotheses, where xi,j is a boolean value to denote whether the ith word in the first hypothesis is aligned to the jth word in the second hypothesis. [sent-164, score-0.427]

55 Here the definition of x seems different from that of a in Formula 1, and they could convert to each other. [sent-165, score-0.022]

56 Using a line-based access style, a matrix can be unfolded as a vector. [sent-166, score-0.05]

57 We refer to f as IHMM alignment probability (He et al. [sent-167, score-0.348]

58 , 2009), total four objectives from two directions, and the larger the objectives, the better. [sent-169, score-0.157]

59 If fi(x) ≥ fi(x0) holds for all i, we call the alignment x )do ≥mi fnates the alignment x0. [sent-174, score-0.59]

60 If there xi,j 538 X: Reversed IHMM Probability (1e-8) Figure 2: Sample solutions with only two objectives. [sent-175, score-0.092]

61 Other points p2 , p4, p6 are dominated by at least one point in the Pareto optimal solutions. [sent-177, score-0.093]

62 does not exist any alignment x00 to dominate x, we cdaoells sth neo alignment x to nbme ennotn- xd00om toin daotmedi. [sent-178, score-0.644]

63 A alignment x is said to be Pareto optimal if there is no other alignment x0 found to dominate x. [sent-180, score-0.686]

64 In Figure 2, p1 dominates p2, and p2 dominates p4. [sent-181, score-0.058]

65 To summarize, a point is dominated by the ones on its upper and right side with ties. [sent-182, score-0.065]

66 In some cases, Pareto optimal solutions can be used for good candidate solutions. [sent-184, score-0.156]

67 Considering the IHMM model, maximizing Y axis, the top-4 best alignments are p1, p2, p3, p4. [sent-185, score-0.053]

68 But from the view of Pareto optimal, the top-4 alignments would be p1, p3, p5, p7 without order, which considers a greater range than a single optimization model. [sent-186, score-0.149]

69 In our method, we just combine these Pareto optimal solutions equally into a unique alignment (Section 3. [sent-187, score-0.451]

70 Our adopted multi-objective optimization searching algorithm is the non-dominated sorting genetic algorithm II (NSGA-II) (Deb et al. [sent-189, score-0.21]

71 NSGAIIhas a complexity of O(mn2), where m is the number of objectives and n is the population size in an evolutionary algorithm. [sent-196, score-0.253]

72 2 Objectives in Evolutionary Algorithm The optimization objectives in our experiments can be categorized as an IHMM alignment probability (He et al. [sent-198, score-0.601]

73 , 2008) and GIZA++ alignment probability f1 f2 f3 S: ? [sent-199, score-0.348]

74 Backbone Figure 3: The same alignment (f1, e1) (f1, e2) (f2 , e3) in two IHMM models. [sent-211, score-0.295]

75 The upper one is a typical example in IHMM, and in the bottom one, because any word in the observation is required not to correspond to two statuses, it has a minor trouble. [sent-212, score-0.094]

76 1 IHMM Probability A typical IHMM alignment is demonstrated in the upper graph of Figure 3, where a backbone is acting the role of a status sequence. [sent-218, score-0.565]

78 However, pthe(2 same alignment (f1, e1) (f1, e2) (f2, e3), if we change the alignment direction, the backbone being observations, would be a bit different. [sent-220, score-0.765]

79 Look at the bottom graph of Figure 3, the observation f1 has two statuses, e1 and e2 at the same time, it becomes ambiguous to compute the transitional probability between pt(3| 1) and pt(3|2). [sent-222, score-0.136]

80 This is because IHMM algorithm 1d)ea lasn dwi pth( oneto-many alignments, and MOEA permits many-tomany alignments. [sent-223, score-0.028]

81 A new status is defined, rather than a single position pt(j |i), but as a set of positions pt({j} |{i}). [sent-225, score-0.089]

82 The positions uint one status need not to b(e{ adjacent to ehaec hp oostihteiorn. [sent-226, score-0.081]

83 The redefined transitional probability pt({j}|{i}) =|{j}|1 · |{i}|Xi,jpt(j|i) The redefined emission probability po(j|{i}) = Yipo(j|i) We need to note that there is no guarantee on 539 the closed property of probabilities, though these approximations prove to be effective in a practical sense. [sent-227, score-0.284]

84 Straightforwardly, when there is only one position in a new status, the expanded IHMM degenerates to the standard IHMM. [sent-228, score-0.052]

86 All probabilities appearing in below formulas can be looked up in GIZA++. [sent-234, score-0.055]

87 In order to increase the coverage ofwords, we collect all the hypothesis pairs in both the tuning set and the test set and feed them into GIZA++. [sent-242, score-0.063]

88 This is an off-line operation, which makes it not suitable for an online translation system. [sent-243, score-0.071]

89 In our experiments, a pure GIZA++ based system combination does not perform as well as IHMM based, but does benefit the final translation quality if combined in our multiobjective optimization framework. [sent-245, score-0.214]

90 Using a l1i}n}e- tboa seendaccess style, the matrix could be unfolded as a vector with |I| · |J| bits of length. [sent-250, score-0.05]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ihmm', 0.628), ('alignment', 0.295), ('pt', 0.235), ('pareto', 0.199), ('confusion', 0.187), ('backbone', 0.175), ('objectives', 0.157), ('ej', 0.14), ('po', 0.139), ('aj', 0.127), ('giza', 0.117), ('rosti', 0.111), ('deb', 0.109), ('psem', 0.109), ('psur', 0.1), ('optimization', 0.096), ('evolutionary', 0.096), ('solutions', 0.092), ('fi', 0.091), ('apples', 0.08), ('module', 0.076), ('pdic', 0.075), ('null', 0.072), ('hypotheses', 0.066), ('optimal', 0.064), ('hypothesis', 0.063), ('modules', 0.061), ('network', 0.06), ('status', 0.059), ('classic', 0.055), ('alignments', 0.053), ('probability', 0.053), ('fertility', 0.05), ('indirected', 0.05), ('redefined', 0.05), ('statuses', 0.05), ('transitional', 0.05), ('unfolded', 0.05), ('translation', 0.047), ('multiobjective', 0.044), ('gj', 0.044), ('xiamen', 0.044), ('bleu', 0.043), ('aligned', 0.042), ('heterogeneous', 0.041), ('sc', 0.039), ('upper', 0.036), ('viterbi', 0.036), ('src', 0.035), ('hk', 0.033), ('formulas', 0.033), ('adopts', 0.033), ('observation', 0.033), ('searching', 0.033), ('dominate', 0.032), ('framework', 0.03), ('networks', 0.03), ('position', 0.03), ('prefers', 0.029), ('dominated', 0.029), ('dominates', 0.029), ('feng', 0.029), ('yj', 0.028), ('source', 0.028), ('algorithm', 0.028), ('ei', 0.028), ('software', 0.028), ('emission', 0.028), ('combination', 0.027), ('integrates', 0.027), ('synonym', 0.027), ('jth', 0.027), ('solution', 0.027), ('literal', 0.026), ('oracle', 0.025), ('interpolation', 0.025), ('supports', 0.025), ('markov', 0.025), ('minor', 0.025), ('adopted', 0.025), ('suitable', 0.024), ('hidden', 0.023), ('sim', 0.022), ('exist', 0.022), ('probabilities', 0.022), ('hwy', 0.022), ('zij', 0.022), ('tabu', 0.022), ('degenerates', 0.022), ('lik', 0.022), ('ehaec', 0.022), ('fsoro', 0.022), ('instability', 0.022), ('mature', 0.022), ('redefining', 0.022), ('sink', 0.022), ('zongcheng', 0.022), ('definition', 0.022), ('expected', 0.021), ('smt', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 101 emnlp-2013-Improving Alignment of System Combination by Using Multi-objective Optimization

Author: Tian Xia ; Zongcheng Ji ; Shaodan Zhai ; Yidong Chen ; Qun Liu ; Shaojun Wang

2 0.15201946 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

Abstract: We introduce a novel discriminative model for phrase-based monolingual alignment using a semi-Markov CRF. Our model achieves stateof-the-art alignment accuracy on two phrasebased alignment datasets (RTE and paraphrase), while doing significantly better than other strong baselines in both non-identical alignment and phrase-only alignment. Additional experiments highlight the potential benefit of our alignment model to RTE, paraphrase identification and question answering, where even a naive application of our model’s alignment score approaches the state ofthe art.

3 0.09216889 2 emnlp-2013-A Convex Alternative to IBM Model 2

Author: Andrei Simion ; Michael Collins ; Cliff Stein

Abstract: The IBM translation models have been hugely influential in statistical machine translation; they are the basis of the alignment models used in modern translation systems. Excluding IBM Model 1, the IBM translation models, and practically all variants proposed in the literature, have relied on the optimization of likelihood functions or similar functions that are non-convex, and hence have multiple local optima. In this paper we introduce a convex relaxation of IBM Model 2, and describe an optimization algorithm for the relaxation based on a subgradient method combined with exponentiated-gradient updates. Our approach gives the same level of alignment accuracy as IBM Model 2.

4 0.082966626 139 emnlp-2013-Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora

Author: Katsuhito Sudoh ; Shinsuke Mori ; Masaaki Nagata

Abstract: This paper proposes a novel noise-aware character alignment method for bootstrapping statistical machine transliteration from automatically extracted phrase pairs. The model is an extension of a Bayesian many-to-many alignment method for distinguishing nontransliteration (noise) parts in phrase pairs. It worked effectively in the experiments of bootstrapping Japanese-to-English statistical machine transliteration in patent domain using patent bilingual corpora.

5 0.078989506 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

Author: Xinyan Xiao ; Deyi Xiong

Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.

6 0.066714928 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

7 0.065627903 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

8 0.06282115 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

9 0.061986662 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

10 0.060140524 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

11 0.053910889 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

12 0.05310056 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

13 0.051520951 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

14 0.0481354 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

15 0.047848258 145 emnlp-2013-Optimal Beam Search for Machine Translation

16 0.045554321 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

17 0.044508174 201 emnlp-2013-What is Hidden among Translation Rules

18 0.04407404 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

19 0.042814147 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

20 0.039970074 179 emnlp-2013-Summarizing Complex Events: a Cross-Modal Solution of Storylines Extraction and Reconstruction

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.139), (1, -0.105), (2, 0.025), (3, 0.036), (4, 0.023), (5, 0.004), (6, 0.022), (7, 0.041), (8, -0.005), (9, 0.003), (10, -0.007), (11, 0.036), (12, 0.11), (13, 0.08), (14, 0.048), (15, -0.061), (16, 0.065), (17, 0.055), (18, 0.113), (19, -0.052), (20, -0.0), (21, 0.011), (22, -0.014), (23, 0.086), (24, -0.002), (25, 0.091), (26, -0.049), (27, -0.082), (28, 0.131), (29, -0.118), (30, -0.071), (31, 0.098), (32, 0.041), (33, 0.163), (34, -0.007), (35, 0.138), (36, 0.022), (37, -0.092), (38, -0.121), (39, -0.038), (40, 0.065), (41, 0.09), (42, 0.108), (43, -0.095), (44, 0.178), (45, 0.013), (46, 0.013), (47, -0.128), (48, 0.108), (49, 0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96449095 101 emnlp-2013-Improving Alignment of System Combination by Using Multi-objective Optimization

Author: Tian Xia ; Zongcheng Ji ; Shaodan Zhai ; Yidong Chen ; Qun Liu ; Shaojun Wang

2 0.70010918 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

Author: Xuchen Yao ; Benjamin Van Durme ; Chris Callison-Burch ; Peter Clark

3 0.69883341 139 emnlp-2013-Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora

Author: Katsuhito Sudoh ; Shinsuke Mori ; Masaaki Nagata

4 0.671143 2 emnlp-2013-A Convex Alternative to IBM Model 2

Author: Andrei Simion ; Michael Collins ; Cliff Stein

5 0.39420736 33 emnlp-2013-Automatic Knowledge Acquisition for Case Alternation between the Passive and Active Voices in Japanese

Author: Ryohei Sasano ; Daisuke Kawahara ; Sadao Kurohashi ; Manabu Okumura

Abstract: We present a method for automatically acquiring knowledge for case alternation between the passive and active voices in Japanese. By leveraging several linguistic constraints on alternation patterns and lexical case frames obtained from a large Web corpus, our method aligns a case frame in the passive voice to a corresponding case frame in the active voice and finds an alignment between their cases. We then apply the acquired knowledge to a case alternation task and prove its usefulness.

6 0.33176085 3 emnlp-2013-A Corpus Level MIRA Tuning Strategy for Machine Translation

7 0.33045629 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation

8 0.31458545 151 emnlp-2013-Paraphrasing 4 Microblog Normalization

9 0.31121948 40 emnlp-2013-Breaking Out of Local Optima with Count Transforms and Model Recombination: A Study in Grammar Induction

10 0.30795416 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

11 0.30737862 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

12 0.29492721 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

13 0.29274598 161 emnlp-2013-Rule-Based Information Extraction is Dead! Long Live Rule-Based Information Extraction Systems!

14 0.28649417 195 emnlp-2013-Unsupervised Spectral Learning of WCFG as Low-rank Matrix Completion

15 0.28325501 129 emnlp-2013-Measuring Ideological Proportions in Political Speeches

16 0.27956244 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

17 0.27644753 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

18 0.2609812 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

19 0.25272393 159 emnlp-2013-Regularized Minimum Error Rate Training

20 0.23817983 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.369), (3, 0.021), (18, 0.041), (22, 0.052), (30, 0.084), (43, 0.012), (50, 0.022), (51, 0.172), (66, 0.042), (71, 0.026), (75, 0.02), (77, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.82779866 101 emnlp-2013-Improving Alignment of System Combination by Using Multi-objective Optimization

Author: Tian Xia ; Zongcheng Ji ; Shaodan Zhai ; Yidong Chen ; Qun Liu ; Shaojun Wang

2 0.69489169 128 emnlp-2013-Max-Violation Perceptron and Forced Decoding for Scalable MT Training

Author: Heng Yu ; Liang Huang ; Haitao Mi ; Kai Zhao

Abstract: While large-scale discriminative training has triumphed in many NLP problems, its definite success on machine translation has been largely elusive. Most recent efforts along this line are not scalable (training on the small dev set with features from top ∼100 most frequent wt woridths) f eaantdu overly complicated. oWste f iren-stead present a very simple yet theoretically motivated approach by extending the recent framework of “violation-fixing perceptron”, using forced decoding to compute the target derivations. Extensive phrase-based translation experiments on both Chinese-to-English and Spanish-to-English tasks show substantial gains in BLEU by up to +2.3/+2.0 on dev/test over MERT, thanks to 20M+ sparse features. This is the first successful effort of large-scale online discriminative training for MT. 1Introduction Large-scale discriminative training has witnessed great success in many NLP problems such as parsing (McDonald et al., 2005) and tagging (Collins, 2002), but not yet for machine translation (MT) despite numerous recent efforts. Due to scalability issues, most of these recent methods can only train on a small dev set of about a thousand sentences rather than on the full training set, and only with 2,000–10,000 rather “dense-like” features (either unlexicalized or only considering highest-frequency words), as in MIRA (Watanabe et al., 2007; Chiang et al., 2008; Chiang, 2012), PRO (Hopkins and May, 2011), and RAMP (Gimpel and Smith, 2012). However, it is well-known that the most important features for NLP are lexicalized, most of which can not ∗ Work done while visiting City University of New York. Corresponding author. † 1112 be seen on a small dataset. Furthermore, these methods often involve complicated loss functions and intricate choices of the “target” derivations to update towards or against (e.g. k-best/forest oracles, or hope/fear derivations), and are thus hard to replicate. As a result, the classical method of MERT (Och, 2003) remains the default training algorithm for MT even though it can only tune a handful of dense features. See also Section 6 for other related work. As a notable exception, Liang et al. (2006) do train a structured perceptron model on the training data with sparse features, but fail to outperform MERT. We argue this is because structured perceptron, like many structured learning algorithms such as CRF and MIRA, assumes exact search, and search errors inevitably break theoretical properties such as convergence (Huang et al., 2012). Empirically, it is now well accepted that standard perceptron performs poorly when search error is severe (Collins and Roark, 2004; Zhang et al., 2013). To address the search error problem we propose a very simple approach based on the recent framework of “violation-fixing perceptron” (Huang et al., 2012) which is designed specifically for inexact search, with a theoretical convergence guarantee and excellent empirical performance on beam search parsing and tagging. The basic idea is to update when search error happens, rather than at the end of the search. To adapt it to MT, we extend this framework to handle latent variables corresponding to the hidden derivations. We update towards “gold-standard” derivations computed by forced decoding so that each derivation leads to the exact reference translation. Forced decoding is also used as a way of data selection, since those reachable sentence pairs are generally more literal and of higher quality, which the training should focus on. When the reachable subset is small for some language pairs, we augment Proce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et.h ?oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is1t1ic2s–1 23, it by including reachable prefix-pairs when the full sentence pair is not. We make the following contributions: 1. Our work is the first successful effort to scale online structured learning to a large portion of the training data (as opposed to the dev set). 2. Our work is the first to use a principled learning method customized for inexact search which updates on partial derivations rather than full ones in order to fix search errors. We adapt it to MT using latent variables for derivations. 3. Contrary to the common wisdom, we show that simply updating towards the exact reference translation is helpful, which is much simpler than k-best/forest oracles or loss-augmented (e.g. hope/fear) derivations, avoiding sentencelevel BLEU scores or other loss functions. 4. We present a convincing analysis that it is the search errors and standard perceptron’s inability to deal with them that prevent previous work, esp. Liang et al. (2006), from succeeding. 5. Scaling to the training data enables us to engineer a very rich feature set of sparse, lexicalized, and non-local features, and we propose various ways to alleviate overfitting. For simplicity and efficiency reasons, in this paper we use phrase-based translation, but our method has the potential to be applicable to other translation paradigms. Extensive experiments on both Chineseto-English and Spanish-to-English tasks show statistically significant gains in BLEU by up to +2.3/+2.0 on dev/test over MERT, and up to +1.5/+1.5 over PRO, thanks to 20M+ sparse features. 2 Phrase-Based MT and Forced Decoding We first review the basic phrase-based decoding algorithm (Koehn, 2004), which will be adapted for forced decoding. 2.1 Background: Phrase-based Decoding We will use the following running example from Chinese to English from Mi et al. (2008): 0123456 Figure 1: Standard beam-search phrase-based decoding. B `ush´ ı y uˇ Sh¯ al´ ong j ˇux ´ıng le hu` ıt´ an Bush with Sharon hold -ed meeting ‘Bush held a meeting with Sharon’ Phrase-based decoders generate partial targetlanguage outputs in left-to-right order in the form of hypotheses (or states) (Koehn, 2004). Each hypothesis has a coverage vector capturing the sourcelanguage words translated so far, and can be extended into a longer hypothesis by a phrase-pair translating an uncovered segment. For example, the following is one possible derivation: (• 3(• •() • :1( •s063),:“(Bs)u2s:,h)“(hBs:e1ul(d,s0“ht,aB“hleuk”ls) hdw”t)ailhkrsS1”h)aro2n”)r3 where a • in the coverage vector indicates the source wwoherdre a at •th i ns position aisg e“ vcoecvteorred in”d iacnadte ws thheer seo euarcche si is the score of each state, each adding the rule score and the distortion cost (dc) to the score of the previous state. To compute the distortion cost we also need to maintain the ending position of the last phrase (e.g., the 3 and 6 in the coverage vectors). In phrase-based translation there is also a distortionlimit which prohibits long-distance reorderings. The above states are called −LM states since they do Tnhoet ainbovovleve st language mlleodd −el LcMos tsst.a eTso iandcde a beiygram model, we split each −LM state into a series ogrfa +mL mMo states; ee sapchli t+ eaLcMh −staLtMe h satsa ttehe in ftoor ma (v,a) where a is the last word of the hypothesis. Thus a +LM version of the above derivation might be: (• 3(• ,(•Sh1a•(r6o0,nta)l:ks,()Bsu:03sh,(s“<)s02

3 0.48750395 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging

Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu

Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.

4 0.48680425 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

Author: Kuzman Ganchev ; Dipanjan Das

Abstract: We present a framework for cross-lingual transfer of sequence information from a resource-rich source language to a resourceimpoverished target language that incorporates soft constraints via posterior regularization. To this end, we use automatically word aligned bitext between the source and target language pair, and learn a discriminative conditional random field model on the target side. Our posterior regularization constraints are derived from simple intuitions about the task at hand and from cross-lingual alignment information. We show improvements over strong baselines for two tasks: part-of-speech tagging and namedentity segmentation.

5 0.48586789 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation

Author: Longkai Zhang ; Houfeng Wang ; Xu Sun ; Mairgup Mansur

Abstract: Nowadays supervised sequence labeling models can reach competitive performance on the task of Chinese word segmentation. However, the ability of these models is restricted by the availability of annotated data and the design of features. We propose a scalable semi-supervised feature engineering approach. In contrast to previous works using pre-defined taskspecific features with fixed values, we dynamically extract representations of label distributions from both an in-domain corpus and an out-of-domain corpus. We update the representation values with a semi-supervised approach. Experiments on the benchmark datasets show that our approach achieve good results and reach an f-score of 0.961. The feature engineering approach proposed here is a general iterative semi-supervised method and not limited to the word segmentation task.

6 0.48436919 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction

7 0.48298529 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs

8 0.48219919 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

9 0.48185283 143 emnlp-2013-Open Domain Targeted Sentiment

10 0.48153195 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation

11 0.48064476 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

12 0.48061234 167 emnlp-2013-Semi-Markov Phrase-Based Monolingual Alignment

13 0.48015624 132 emnlp-2013-Mining Scientific Terms and their Definitions: A Study of the ACL Anthology

14 0.4799948 21 emnlp-2013-An Empirical Study Of Semi-Supervised Chinese Word Segmentation Using Co-Training

15 0.47990659 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

16 0.4798013 111 emnlp-2013-Joint Chinese Word Segmentation and POS Tagging on Heterogeneous Annotated Corpora with Multiple Task Learning

17 0.47972062 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation

18 0.47920588 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

19 0.47917321 168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing

20 0.47913635 154 emnlp-2013-Prior Disambiguation of Word Tensors for Constructing Sentence Vectors