emnlp emnlp2013 emnlp2013-125 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Deyi Xiong ; Yang Ding ; Min Zhang ; Chew Lim Tan
Abstract: Lexical chains provide a representation of the lexical cohesion structure of a text. In this paper, we propose two lexical chain based cohesion models to incorporate lexical cohesion into document-level statistical machine translation: 1) a count cohesion model that rewards a hypothesis whenever a chain word occurs in the hypothesis, 2) and a probability cohesion model that further takes chain word translation probabilities into account. We compute lexical chains for each source document to be translated and generate target lexical chains based on the computed source chains via maximum entropy classifiers. We then use the generated target chains to provide constraints for word selection in document-level machine translation through the two proposed lexical chain based cohesion models. We verify the effectiveness of the two models using a hierarchical phrase-based translation system. Ex- periments on large-scale training data show that they can substantially improve translation quality in terms of BLEU and that the probability cohesion model outperforms previous models based on lexical cohesion devices.
Reference: text
sentIndex sentText sentNum sentScore
1 s g , , Abstract Lexical chains provide a representation of the lexical cohesion structure of a text. [sent-8, score-1.246]
2 We compute lexical chains for each source document to be translated and generate target lexical chains based on the computed source chains via maximum entropy classifiers. [sent-10, score-1.534]
3 We then use the generated target chains to provide constraints for word selection in document-level machine translation through the two proposed lexical chain based cohesion models. [sent-11, score-1.763]
4 Ex- periments on large-scale training data show that they can substantially improve translation quality in terms of BLEU and that the probability cohesion model outperforms previous models based on lexical cohesion devices. [sent-13, score-1.827]
5 This linguistic phenomenon is called as textual cohesion (Halliday and Hasan, 1976). [sent-17, score-0.771]
6 It deals with five categories of relationships between text units, namely co-reference, ellipsis, substitution, conjunction and lexical cohesion that is realized via semantically related words. [sent-19, score-0.95]
7 The former four cohesion relations can be grouped as grammatical cohesion. [sent-20, score-0.793]
8 Generally speaking, grammatical cohesion is less common and harder to identify than lexical cohesion (Barzilay and Elhadad, 1997). [sent-21, score-1.721]
9 As most SMT systems translate a text in a sentence-by-sentence fashion, they tend to build less lexical cohesion than human translators (Wong and Kit, 2012). [sent-22, score-0.989]
10 We use lexical chains (Morris and Hirst, 1991) to capture lexical cohesion in a text. [sent-24, score-1.425]
11 Lexical chains are connected graphs that represent the lexical cohesion structure of a text. [sent-25, score-1.246]
12 In this paper, we investigate how lexical chains can be used to incorporate lexical cohesion into document-level translation. [sent-27, score-1.442]
13 Our basic assumption is that the lexical chains of a target document are direct correspondences of the lexical chains of its counterpart source document. [sent-28, score-1.144]
14 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is5t6ic3s–1573, to incorporate lexical cohesion into target document translation via lexical chains, which works as follows. [sent-32, score-1.364]
15 We build two lexical chain based cohesion models. [sent-34, score-1.319]
16 The first model is a count model that rewards a hypothesis whenever a word in the projected target lexical chains occur in the hypothesis. [sent-35, score-0.659]
17 As a source chain word may be translated into many different target words, we further extend the count model to a second cohesion model: a probability model that takes chain word translation probabilities into account. [sent-36, score-1.744]
18 We test the two lexical chain based cohesion models on a hierarchical phrase-based SMT system that is trained with large-scale Chinese-English bilingual data. [sent-37, score-1.335]
19 Experiment results show that our lexical chain based cohesion models can achieve substantial improvements over the baseline. [sent-38, score-1.299]
20 Furthermore, the probability cohesion model is better than the count model and it also outperforms previous cohesion models based on lexical cohesion devices (Xiong et al. [sent-39, score-2.61]
21 Section 3 briefly introduces lexical chains and algorithms that compute lexical chains. [sent-44, score-0.654]
22 Section 4 elaborates the proposed lexical chain based framework, including details on source lexical chain computation, target lexical chain generation and the two lexical chain based cohesion models. [sent-45, score-3.011]
23 Using Lexical Cohesion Devices in DocumentLevel SMT Lexical cohesion devices are semantically related words, including word repetition, synonyms/near-synonyms, hyponyms and so on. [sent-62, score-0.844]
24 Wong and Kit (2012) use lexical cohesion device based metrics to improve machine translation evaluation at the document level. [sent-64, score-1.154]
25 These metrics measure the proportion of content words that are used as lexical cohesion devices in machine-generated translations. [sent-65, score-1.023]
26 They argue that their semantic language model can capture lexical cohesion by exploring n-grams that cross sentence boundaries. [sent-68, score-0.95]
27 (2013) integrate three categories of lexical cohesion devices into document-level machine translation. [sent-70, score-1.039]
28 They define three cohesion models based on lexical cohesion devices: a direct reward model, a conditional probability model and a mutual information trigger model. [sent-71, score-1.845]
29 The latter two models measure the strength oflexical cohesion relation between two lexical items. [sent-72, score-0.95]
30 They are incorporated into SMT to calculate how appropriately lexical cohesion devices are used in document translation. [sent-73, score-1.089]
31 Modeling Coherence in Document-Level SMT In discourse analysis, cohesion is often studied to- gether with coherence which is another dimension of the linguistic structure of a text (Barzilay and Elhadad, 1997). [sent-76, score-0.871]
32 Our lexical chain based cohesion models are also related to previous work on using word and phrase sense disambiguation for lexical choice in SMT (Carpuat and Wu, 2007b; Carpuat and Wu, 2007a; Chan et al. [sent-81, score-1.556]
33 The difference is that we use document-wide lexical chains to build our cohesion models rather than sentence-level context features. [sent-83, score-1.266]
34 In our framework, lexical choice is performed to make the selected words consistent with the lexical cohesion structure of a document. [sent-84, score-1.129]
35 Words in these lexical chains have lexical cohesion relations such as repetition, synonym, which may range over the entire text. [sent-92, score-1.447]
36 Generally, a text can have many different lexical chains, each of which represents a thread of cohesion through the text. [sent-95, score-0.95]
37 Several lexical chaining algorithms have been proposed to compute lexical chains from texts. [sent-96, score-0.678]
38 4 Translating Documents Using Lexical Chains In this section, we describe how we incorporate lexical cohesion into document-level machine translation using lexical chains. [sent-108, score-1.246]
39 1 Source Lexical Chains Computation We follow the chain computation algorithm introduced by Galley and McKeown (2003) to build lexical chains on source (Chinese) documents. [sent-112, score-0.92]
40 In the algorithm, the chaining process includes three steps: choosing candidate words to build a disambiguation graph (Galley and McKeown, 2003) for each document, disambiguating the candidate words and finally building lexical chains over the disambiguated candidate words. [sent-113, score-0.644]
41 2 Target Lexical Chains Generation Since a faithful target document translation should follow the same cohesion structure as that in its corresponding source document, we generate target lexical chains from the computed source lexical chains. [sent-138, score-1.867]
42 Given a source lexical chain LCs = where the ith chain word is from the jth sentence of the source document Ds, we generate a target lexical chain LCt = using maximum entropy (MaxEnt) classifiers. [sent-139, score-1.686]
43 Particularly, we translate a word in the source lexical chain into a target word in the target lexical chain using a corresponding MaxEnt classifier as follows1 . [sent-140, score-1.271]
44 chain word We train one MaxEnt classifier per unique source chain word. [sent-142, score-0.789]
45 Glueive ofn a source document Ds and its N lexical chains {LCsk}kN=1 computed from the document as ♥sji chains {LC} 1We collect training instances from word-aligned bilingual data to train the MaxEnt classifier. [sent-150, score-0.999]
46 Each target word in the target lexical chain LCkt is the translation of its corresponding source word in the source lexical chain LCks with the highest probability according to Eq. [sent-153, score-1.418]
47 In order to incorporate these multiple chain word translations, we can generate a super target lexical chain ? [sent-156, score-1.061]
48 For example, given a source lexical chain LCs = {a, b, c}, we can have the corresponding super target ,lce}xi,ca wl ech caainn ? [sent-159, score-0.755]
49 Our experiments also confirm that the super target lexical chains with multiple translation options for each chain word are better than the target lexical chains with only one translation per chain word. [sent-182, score-2.082]
50 3 Lexical Chain Based Cohesion Models Once we generate the super target lexical chains {? [sent-185, score-0.642]
51 We therefore propose lexical chain based cohesion models to measure the cohesion of the target document translation. [sent-188, score-2.204]
52 The basic idea is to reward a translation hypothesis if a word from the super target lexical chains occurs in the hypothesis. [sent-189, score-0.795]
53 According to the difference in the reward strategy, we have two cohesion models: a count cohesion model and a probability cohesion model. [sent-190, score-2.387]
54 LCtk}kN=1) : This model rewards a translation hypothesis }of the jth sentence in the document whenever a lexical chain word occurs in the hypothesis. [sent-192, score-0.788]
55 It is factorized into the sentence cohesion metric Mc(Tj, {? [sent-194, score-0.786]
56 The model is also factorized into the sentence cohesion metric Mp(Tj, {? [sent-206, score-0.786]
57 4 Decoding The proposed lexical chain based cohesion models are integrated into the log-linear translation framework of SMT as a cohesion feature. [sent-212, score-2.154]
58 Before translating a source document, we compute lexical chains for the source document as described in Section 4. [sent-213, score-0.661]
59 In order to efficiently calculate our lexical chain based cohesion models, we reorganize words in the super target lexical chains into vectors. [sent-216, score-1.941]
60 We associate each source sentence Sj a vector to store target lexical chain words that are to occur in the corresponding target sentence Tj. [sent-217, score-0.724]
61 Although we still translate a source document sentence by sentence, we capture the global cohesion structure of the document via lexical chains and use the lexical chain based cohesion models to constrain word selection in document translation. [sent-218, score-2.822]
62 Figure 4 shows the architecture of an SMT system with the lexical chain based cohesion model. [sent-219, score-1.299]
63 Figure 4: Architecture of an SMT system with the lexical chain based cohesion model. [sent-220, score-1.299]
64 5 Experiments In this section, we conducted a series of experiments to validate the effectiveness of the proposed lexical chain based cohesion models for Chinese-to-English document-level machine translation. [sent-221, score-1.315]
65 o Comparing our lexical chain based cohesion Cmoomdeplsa against t lheex previous l ebxaisceadl ccoohheessiioonn device based models (Xiong et al. [sent-226, score-1.357]
66 069d 78tes sets, which show the number of documents (#Doc) and sentences (#Sent), the number of lexical chains extracted from the source documents (#Chain), the average number of lexical chains per document (#AvgC) and the average number of words per lexical chain (#AvgW). [sent-239, score-1.666]
67 In order to build the lexical chain based cohesion models, we selected corpora with document boundaries explicitly provided from the bilingual training data together with the whole Hong Kong parallel text corpus as the cohesion model training data2. [sent-240, score-2.176]
68 We used the off-the-shelf MaxEnt toolkit3 to train one MaxEnt classifier per unique source lexical chain word (61,121 different source chain words in total). [sent-251, score-1.028]
69 As the two lexical chain based cohesion models are built on the super target lexical chains that are associated with a parameter ? [sent-277, score-1.941]
70 We conducted a group of experiments using the probability cohesion model defined in Eq. [sent-280, score-0.793]
71 05), the super target lexical chains may contain too many noisy words that are not the translations of source lexical chain words, which may jeopardise the quality of the super target lexical chains. [sent-286, score-1.594]
72 The cohesion model built on these noisy super target lexical chains may select incorrect words rather than the proper lexical chain words. [sent-287, score-1.941]
73 4), we may take the risk of not selecting the appropriate chain word translations into the super target lexical chains. [sent-292, score-0.713]
74 3 Effect of the Count and Probability Cohesion Model After we found the best threshold, we carried out experiments to test the effect of the two lexical chain based cohesion models: the count and probability cohesion model. [sent-300, score-2.115]
75 We also compared the count cohesion model (LexChainCount(top1)) built on the target lexical chains where each target chain word is the best translation of its corresponding source lexical chain word according to Eq. [sent-310, score-2.426]
76 From Table 3, we can observe that • Our lexical chain based cohesion models are aObuler etox substantially improve othne mtroandsellsat iaorne quality in terms of BLEU score. [sent-313, score-1.299]
77 • The count cohesion model built on the super target oluexnitca clo hcehsaiionns miso bdeeltte bru tihltan on nth thaet b suaspeedr on the target lexical chains only with top one translations (27. [sent-316, score-1.522]
78 This shows the advantage of the super target lexical chains {? [sent-320, score-0.642]
79 4 Finally, the probability cohesion model is much bFeinttaelrl ,th tahen pthroeb caboiulnityt ccoohheessiioonn mmooddeell i (28. [sent-323, score-0.813]
80 This suggests that we should take into account chain word translation probabilities when we reward hypotheses where target lexical chain words occur. [sent-327, score-1.058]
81 Lexical Cohesion Devices As we have mentioned in Section 2, lexical cohesion devices can be also used to build lexical cohesion models to capture lexical cohesion relations in a text. [sent-329, score-2.965]
82 We therefore want to compare our lexical chain based cohesion models with the lexical cohesion device based cohesion models. [sent-330, score-3.058]
83 We re-implemented the mutual information trigger model that is the best lexical cohesion model based on lexical cohesion devices among the three models proposed by Xiong et al. [sent-336, score-2.046]
84 The mutual information trigger model measures the association strength of two lexical cohesion items x and y in a lexical cohesion relation xRy. [sent-338, score-1.973]
85 In the model, it is required that x occurs in a sentence preceding the sentence where y occurs and that the two items have a lexical cohesion relation such as word repetition, synonym. [sent-339, score-0.995]
86 Our lexical chain based probability cohesion model outperforms the lexical cohesion device based trigger model by 0. [sent-343, score-2.367]
87 The reason for this superiority of our cohesion model over the trigger model may be that the former model captures lexical cohesion relations among sequences of words through lexical chains while the latter model captures lexical cohesion relations only between two related words. [sent-345, score-3.248]
88 6 Conclusions We have presented two lexical chain based cohesion models that incorporate the lexical cohesion structure of a text into document-level machine translation. [sent-346, score-2.282]
89 We project the lexical chains of a source document to the corresponding target document by translating each word in each source lexical chain into their counterparts via MaxEnt classifiers. [sent-347, score-1.323]
90 The projected target lexical chains provide a representation of the lexical cohesion structure of the target document that is to be generated. [sent-348, score-1.652]
91 These two cohesion models are used to constrain word selection for document translation so that the generated document is consistent with the projected lexical cohesion structure. [sent-350, score-1.962]
92 We have integrated the two proposed cohesion models into a hierarchical phrase-based SMT system. [sent-351, score-0.787]
93 Experiment results on large-scale data validate that • • • The lexical chain based cohesion models are aTbhlee tleox substantially improve tsriaonnsl matoiodne quality in terms of BLEU. [sent-352, score-1.299]
94 The probability cohesion model is better than tThhee ec poruonbt acboihlietsyio cno hmesoidoenl. [sent-353, score-0.793]
95 The lexical chain based probability cohesion Tmhoede lel xisi c baeltt cehr athinan b tahseed previous lmituytu caolh iensfioornmation trigger model that adopts lexical cohe- sion devices to capture lexical cohesion relations between two related words. [sent-354, score-2.603]
96 As we mentioned in Section 2, cohesion is closely connected to coherence. [sent-355, score-0.771]
97 In the future, we would like to use lexical chains to identify coherence and incorporate both cohesion and coherence into document-level machine translation. [sent-357, score-1.409]
98 Lexical cohesion computed by thesaural relations as an indicator of the structure of text. [sent-443, score-0.809]
99 A computational analysis of lexical cohesion with applications in information retrieval. [sent-455, score-0.95]
100 Extend- ing machine translation evaluation metrics with lexical cohesion to document level. [sent-481, score-1.116]
wordName wordTfidf (topN-words)
[('cohesion', 0.771), ('chain', 0.349), ('chains', 0.296), ('lexical', 0.179), ('lctk', 0.13), ('sij', 0.111), ('super', 0.099), ('translation', 0.084), ('devices', 0.073), ('kn', 0.071), ('target', 0.068), ('document', 0.066), ('coherence', 0.065), ('maxent', 0.062), ('source', 0.06), ('tij', 0.06), ('trigger', 0.058), ('smt', 0.054), ('lcs', 0.05), ('xiong', 0.047), ('wsd', 0.044), ('cilin', 0.043), ('rewards', 0.043), ('carpuat', 0.043), ('morris', 0.043), ('tj', 0.042), ('sense', 0.04), ('barzilay', 0.039), ('disambiguation', 0.038), ('device', 0.038), ('elhadad', 0.038), ('discourse', 0.035), ('tdt', 0.032), ('senses', 0.031), ('per', 0.031), ('lct', 0.03), ('reward', 0.029), ('threshold', 0.028), ('mc', 0.028), ('galley', 0.027), ('jth', 0.027), ('marine', 0.026), ('hypothesis', 0.025), ('projected', 0.025), ('mckeown', 0.024), ('bleu', 0.024), ('nist', 0.024), ('chaining', 0.024), ('succeeding', 0.024), ('hirst', 0.023), ('count', 0.023), ('candidate', 0.023), ('probability', 0.022), ('relations', 0.022), ('mp', 0.022), ('deyi', 0.021), ('chinese', 0.02), ('bilingual', 0.02), ('build', 0.02), ('beigman', 0.02), ('ccoohheessiioonn', 0.02), ('faithful', 0.02), ('klebanov', 0.02), ('lexchaincount', 0.02), ('lexchainprob', 0.02), ('tji', 0.02), ('tyij', 0.02), ('xti', 0.02), ('repetition', 0.02), ('consistency', 0.02), ('translate', 0.019), ('translated', 0.018), ('translations', 0.018), ('disambiguated', 0.018), ('ltp', 0.017), ('gong', 0.017), ('hardmeier', 0.017), ('sji', 0.017), ('incorporate', 0.017), ('ontology', 0.017), ('min', 0.017), ('atomic', 0.016), ('wong', 0.016), ('computation', 0.016), ('computed', 0.016), ('synonym', 0.016), ('hierarchical', 0.016), ('halliday', 0.016), ('machine', 0.016), ('preceding', 0.015), ('occurs', 0.015), ('constraint', 0.015), ('factorized', 0.015), ('wy', 0.015), ('cutoff', 0.015), ('triggered', 0.015), ('gale', 0.015), ('hong', 0.015), ('kong', 0.015), ('mutual', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 125 emnlp-2013-Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation
Author: Deyi Xiong ; Yang Ding ; Min Zhang ; Chew Lim Tan
Abstract: Lexical chains provide a representation of the lexical cohesion structure of a text. In this paper, we propose two lexical chain based cohesion models to incorporate lexical cohesion into document-level statistical machine translation: 1) a count cohesion model that rewards a hypothesis whenever a chain word occurs in the hypothesis, 2) and a probability cohesion model that further takes chain word translation probabilities into account. We compute lexical chains for each source document to be translated and generate target lexical chains based on the computed source chains via maximum entropy classifiers. We then use the generated target chains to provide constraints for word selection in document-level machine translation through the two proposed lexical chain based cohesion models. We verify the effectiveness of the two models using a hierarchical phrase-based translation system. Ex- periments on large-scale training data show that they can substantially improve translation quality in terms of BLEU and that the probability cohesion model outperforms previous models based on lexical cohesion devices.
2 0.25267991 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation
Author: Anca-Roxana Simon ; Guillaume Gravier ; Pascale Sebillot
Abstract: Topic segmentation classically relies on one of two criteria, either finding areas with coherent vocabulary use or detecting discontinuities. In this paper, we propose a segmentation criterion combining both lexical cohesion and disruption, enabling a trade-off between the two. We provide the mathematical formulation of the criterion and an efficient graph based decoding algorithm for topic segmentation. Experimental results on standard textual data sets and on a more challenging corpus of automatically transcribed broadcast news shows demonstrate the benefit of such a combination. Gains were observed in all conditions, with segments of either regular or varying length and abrupt or smooth topic shifts. Long segments benefit more than short segments. However the algorithm has proven robust on automatic transcripts with short segments and limited vocabulary reoccurrences.
3 0.084768683 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
Author: Zhongqiang Huang ; Jacob Devlin ; Rabih Zbib
Abstract: Translation Jacob Devlin Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA j devl in@bbn . com Rabih Zbib Raytheon BBN Technologies 50 Moulton St Cambridge, MA, USA r zbib@bbn . com have tried to introduce grammaticality to the transThis paper describes a factored approach to incorporating soft source syntactic constraints into a hierarchical phrase-based translation system. In contrast to traditional approaches that directly introduce syntactic constraints to translation rules by explicitly decorating them with syntactic annotations, which often exacerbate the data sparsity problem and cause other problems, our approach keeps translation rules intact and factorizes the use of syntactic constraints through two separate models: 1) a syntax mismatch model that associates each nonterminal of a translation rule with a distribution of tags that is used to measure the degree of syntactic compatibility of the translation rule on source spans; 2) a syntax-based reordering model that predicts whether a pair of sibling constituents in the constituent parse tree of the source sentence should be reordered or not when translated to the target language. The features produced by both models are used as soft constraints to guide the translation process. Experiments on Chinese-English translation show that the proposed approach significantly improves a strong string-to-dependency translation system on multiple evaluation sets.
4 0.072081812 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
Author: Xinyan Xiao ; Deyi Xiong
Abstract: Traditional synchronous grammar induction estimates parameters by maximizing likelihood, which only has a loose relation to translation quality. Alternatively, we propose a max-margin estimation approach to discriminatively inducing synchronous grammars for machine translation, which directly optimizes translation quality measured by BLEU. In the max-margin estimation of parameters, we only need to calculate Viterbi translations. This further facilitates the incorporation of various non-local features that are defined on the target side. We test the effectiveness of our max-margin estimation framework on a competitive hierarchical phrase-based system. Experiments show that our max-margin method significantly outperforms the traditional twostep pipeline for synchronous rule extraction by 1.3 BLEU points and is also better than previous max-likelihood estimation method.
5 0.059807763 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou
Abstract: Domain adaptation for SMT usually adapts models to an individual specific domain. However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. Furthermore, it also outperforms the individual adaptation of each specific domain.
6 0.058234096 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
7 0.053579714 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
8 0.05333687 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation
9 0.050714981 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
10 0.050009046 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification
11 0.049103387 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
12 0.048751134 123 emnlp-2013-Learning to Rank Lexical Substitutions
13 0.048632782 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
14 0.047026012 5 emnlp-2013-A Discourse-Driven Content Model for Summarising Scientific Articles Evaluated in a Complex Question Answering Task
15 0.040481474 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
16 0.037292585 39 emnlp-2013-Boosting Cross-Language Retrieval by Learning Bilingual Phrase Associations from Relevance Rankings
17 0.036760461 138 emnlp-2013-Naive Bayes Word Sense Induction
18 0.03574967 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge
19 0.035409026 186 emnlp-2013-Translating into Morphologically Rich Languages with Synthetic Phrases
20 0.034850325 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
topicId topicWeight
[(0, -0.134), (1, -0.066), (2, 0.018), (3, 0.047), (4, 0.028), (5, -0.042), (6, 0.005), (7, 0.025), (8, -0.052), (9, -0.041), (10, 0.011), (11, -0.022), (12, 0.031), (13, 0.038), (14, 0.007), (15, 0.131), (16, -0.0), (17, -0.032), (18, 0.023), (19, 0.139), (20, -0.038), (21, -0.023), (22, -0.144), (23, -0.009), (24, -0.07), (25, -0.243), (26, 0.078), (27, -0.082), (28, -0.114), (29, -0.086), (30, 0.285), (31, 0.13), (32, 0.299), (33, -0.023), (34, -0.189), (35, 0.119), (36, 0.065), (37, 0.282), (38, 0.053), (39, -0.024), (40, -0.016), (41, -0.042), (42, -0.035), (43, -0.057), (44, 0.004), (45, -0.123), (46, 0.104), (47, -0.121), (48, 0.17), (49, -0.047)]
simIndex simValue paperId paperTitle
same-paper 1 0.96291685 125 emnlp-2013-Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation
Author: Deyi Xiong ; Yang Ding ; Min Zhang ; Chew Lim Tan
Abstract: Lexical chains provide a representation of the lexical cohesion structure of a text. In this paper, we propose two lexical chain based cohesion models to incorporate lexical cohesion into document-level statistical machine translation: 1) a count cohesion model that rewards a hypothesis whenever a chain word occurs in the hypothesis, 2) and a probability cohesion model that further takes chain word translation probabilities into account. We compute lexical chains for each source document to be translated and generate target lexical chains based on the computed source chains via maximum entropy classifiers. We then use the generated target chains to provide constraints for word selection in document-level machine translation through the two proposed lexical chain based cohesion models. We verify the effectiveness of the two models using a hierarchical phrase-based translation system. Ex- periments on large-scale training data show that they can substantially improve translation quality in terms of BLEU and that the probability cohesion model outperforms previous models based on lexical cohesion devices.
2 0.81347692 124 emnlp-2013-Leveraging Lexical Cohesion and Disruption for Topic Segmentation
Author: Anca-Roxana Simon ; Guillaume Gravier ; Pascale Sebillot
Abstract: Topic segmentation classically relies on one of two criteria, either finding areas with coherent vocabulary use or detecting discontinuities. In this paper, we propose a segmentation criterion combining both lexical cohesion and disruption, enabling a trade-off between the two. We provide the mathematical formulation of the criterion and an efficient graph based decoding algorithm for topic segmentation. Experimental results on standard textual data sets and on a more challenging corpus of automatically transcribed broadcast news shows demonstrate the benefit of such a combination. Gains were observed in all conditions, with segments of either regular or varying length and abrupt or smooth topic shifts. Long segments benefit more than short segments. However the algorithm has proven robust on automatic transcripts with short segments and limited vocabulary reoccurrences.
3 0.30426937 123 emnlp-2013-Learning to Rank Lexical Substitutions
Author: Gyorgy Szarvas ; Robert Busa-Fekete ; Eyke Hullermeier
Abstract: The problem to replace a word with a synonym that fits well in its sentential context is known as the lexical substitution task. In this paper, we tackle this task as a supervised ranking problem. Given a dataset of target words, their sentential contexts and the potential substitutions for the target words, the goal is to train a model that accurately ranks the candidate substitutions based on their contextual fitness. As a key contribution, we customize and evaluate several learning-to-rank models to the lexical substitution task, including classification-based and regression-based approaches. On two datasets widely used for lexical substitution, our best models signifi- cantly advance the state-of-the-art.
4 0.24697267 26 emnlp-2013-Assembling the Kazakh Language Corpus
Author: Olzhas Makhambetov ; Aibek Makazhanov ; Zhandos Yessenbayev ; Bakhyt Matkarimov ; Islam Sabyrgaliyev ; Anuar Sharafudinov
Abstract: This paper presents the Kazakh Language Corpus (KLC), which is one of the first attempts made within a local research community to assemble a Kazakh corpus. KLC is designed to be a large scale corpus containing over 135 million words and conveying five stylistic genres: literary, publicistic, official, scientific and informal. Along with its primary part KLC comprises such parts as: (i) annotated sub-corpus, containing segmented documents encoded in the eXtensible Markup Language (XML) that marks complete morphological, syntactic, and structural characteristics of texts; (ii) as well as a sub-corpus with the annotated speech data. KLC has a web-based corpus management system that helps to navigate the data and retrieve necessary information. KLC is also open for contributors, who are willing to make suggestions, donate texts and help with annotation of existing materials.
5 0.24380469 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
Author: Hendra Setiawan ; Bowen Zhou ; Bing Xiang
Abstract: Reordering poses one of the greatest challenges in Statistical Machine Translation research as the key contextual information may well be beyond the confine oftranslation units. We present the “Anchor Graph” (AG) model where we use a graph structure to model global contextual information that is crucial for reordering. The key ingredient of our AG model is the edges that capture the relationship between the reordering around a set of selected translation units, which we refer to as anchors. As the edges link anchors that may span multiple translation units at decoding time, our AG model effectively encodes global contextual information that is previously absent. We integrate our proposed model into a state-of-the-art translation system and demonstrate the efficacy of our proposal in a largescale Chinese-to-English translation task.
6 0.23708616 182 emnlp-2013-The Topology of Semantic Knowledge
7 0.23528968 190 emnlp-2013-Ubertagging: Joint Segmentation and Supertagging for English
8 0.21912169 84 emnlp-2013-Factored Soft Source Syntactic Constraints for Hierarchical Machine Translation
9 0.21156955 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
10 0.20885666 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models
11 0.2014789 138 emnlp-2013-Naive Bayes Word Sense Induction
12 0.2001171 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
13 0.19995996 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation
15 0.19377165 43 emnlp-2013-Cascading Collective Classification for Bridging Anaphora Recognition using a Rich Linguistic Feature Set
16 0.19021855 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)
17 0.18801746 45 emnlp-2013-Chinese Zero Pronoun Resolution: Some Recent Advances
19 0.18050036 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge
20 0.18022263 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning
topicId topicWeight
[(3, 0.034), (18, 0.041), (19, 0.319), (22, 0.073), (30, 0.084), (51, 0.133), (66, 0.046), (71, 0.019), (75, 0.04), (77, 0.046), (96, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.75325948 125 emnlp-2013-Lexical Chain Based Cohesion Models for Document-Level Statistical Machine Translation
Author: Deyi Xiong ; Yang Ding ; Min Zhang ; Chew Lim Tan
Abstract: Lexical chains provide a representation of the lexical cohesion structure of a text. In this paper, we propose two lexical chain based cohesion models to incorporate lexical cohesion into document-level statistical machine translation: 1) a count cohesion model that rewards a hypothesis whenever a chain word occurs in the hypothesis, 2) and a probability cohesion model that further takes chain word translation probabilities into account. We compute lexical chains for each source document to be translated and generate target lexical chains based on the computed source chains via maximum entropy classifiers. We then use the generated target chains to provide constraints for word selection in document-level machine translation through the two proposed lexical chain based cohesion models. We verify the effectiveness of the two models using a hierarchical phrase-based translation system. Ex- periments on large-scale training data show that they can substantially improve translation quality in terms of BLEU and that the probability cohesion model outperforms previous models based on lexical cohesion devices.
2 0.50429302 56 emnlp-2013-Deep Learning for Chinese Word Segmentation and POS Tagging
Author: Xiaoqing Zheng ; Hanyang Chen ; Tianyu Xu
Abstract: This study explores the feasibility of performing Chinese word segmentation (CWS) and POS tagging by deep learning. We try to avoid task-specific feature engineering, and use deep layers of neural networks to discover relevant features to the tasks. We leverage large-scale unlabeled data to improve internal representation of Chinese characters, and use these improved representations to enhance supervised word segmentation and POS tagging models. Our networks achieved close to state-of-theart performance with minimal computational cost. We also describe a perceptron-style algorithm for training the neural networks, as an alternative to maximum-likelihood method, to speed up the training process and make the learning algorithm easier to be implemented.
3 0.50332385 187 emnlp-2013-Translation with Source Constituency and Dependency Trees
Author: Fandong Meng ; Jun Xie ; Linfeng Song ; Yajuan Lu ; Qun Liu
Abstract: We present a novel translation model, which simultaneously exploits the constituency and dependency trees on the source side, to combine the advantages of two types of trees. We take head-dependents relations of dependency trees as backbone and incorporate phrasal nodes of constituency trees as the source side of our translation rules, and the target side as strings. Our rules hold the property of long distance reorderings and the compatibility with phrases. Large-scale experimental results show that our model achieves significantly improvements over the constituency-to-string (+2.45 BLEU on average) and dependencyto-string (+0.91 BLEU on average) models, which only employ single type of trees, and significantly outperforms the state-of-theart hierarchical phrase-based model (+1.12 BLEU on average), on three Chinese-English NIST test sets.
4 0.5006156 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation
Author: Peng Li ; Yang Liu ; Maosong Sun
Abstract: While inversion transduction grammar (ITG) is well suited for modeling ordering shifts between languages, how to make applying the two reordering rules (i.e., straight and inverted) dependent on actual blocks being merged remains a challenge. Unlike previous work that only uses boundary words, we propose to use recursive autoencoders to make full use of the entire merging blocks alternatively. The recursive autoencoders are capable of generating vector space representations for variable-sized phrases, which enable predicting orders to exploit syntactic and semantic information from a neural language modeling’s perspective. Experiments on the NIST 2008 dataset show that our system significantly improves over the MaxEnt classifier by 1.07 BLEU points.
5 0.49972281 175 emnlp-2013-Source-Side Classifier Preordering for Machine Translation
Author: Uri Lerner ; Slav Petrov
Abstract: We present a simple and novel classifier-based preordering approach. Unlike existing preordering models, we train feature-rich discriminative classifiers that directly predict the target-side word order. Our approach combines the strengths of lexical reordering and syntactic preordering models by performing long-distance reorderings using the structure of the parse tree, while utilizing a discriminative model with a rich set of features, including lexical features. We present extensive experiments on 22 language pairs, including preordering into English from 7 other languages. We obtain improvements of up to 1.4 BLEU on language pairs in the WMT 2010 shared task. For languages from different families the improvements often exceed 2 BLEU. Many of these gains are also significant in human evaluations.
6 0.49767411 107 emnlp-2013-Interactive Machine Translation using Hierarchical Translation Models
7 0.49678975 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction
8 0.49586233 114 emnlp-2013-Joint Learning and Inference for Grammatical Error Correction
9 0.49500775 48 emnlp-2013-Collective Personal Profile Summarization with Social Networks
10 0.49411246 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation
11 0.49319503 15 emnlp-2013-A Systematic Exploration of Diversity in Machine Translation
12 0.49316975 47 emnlp-2013-Collective Opinion Target Extraction in Chinese Microblogs
13 0.49098283 143 emnlp-2013-Open Domain Targeted Sentiment
14 0.49068254 22 emnlp-2013-Anchor Graph: Global Reordering Contexts for Statistical Machine Translation
15 0.49043706 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk
16 0.48895225 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation
17 0.48874953 76 emnlp-2013-Exploiting Discourse Analysis for Article-Wide Temporal Classification
18 0.48749241 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization
19 0.48706374 194 emnlp-2013-Unsupervised Relation Extraction with General Domain Knowledge
20 0.48697388 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)