acl acl2013 acl2013-69 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guosheng Ben ; Deyi Xiong ; Zhiyang Teng ; Yajuan Lu ; Qun Liu
Abstract: In this paper, we propose a bilingual lexical cohesion trigger model to capture lexical cohesion for document-level machine translation. We integrate the model into hierarchical phrase-based machine translation and achieve an absolute improvement of 0.85 BLEU points on average over the baseline on NIST Chinese-English test sets.
Reference: text
sentIndex sentText sentNum sentScore
1 ie Abstract In this paper, we propose a bilingual lexical cohesion trigger model to capture lexical cohesion for document-level machine translation. [sent-10, score-2.198]
2 We integrate the model into hierarchical phrase-based machine translation and achieve an absolute improvement of 0. [sent-11, score-0.137]
3 85 BLEU points on average over the baseline on NIST Chinese-English test sets. [sent-12, score-0.022]
4 1 Introduction Current statistical machine translation (SMT) systems are mostly sentence-based. [sent-13, score-0.082]
5 The major drawback of such a sentence-based translation fashion is the neglect of inter-sentential dependencies. [sent-14, score-0.078]
6 As a linguistic means to establish inter-sentential links, lexical cohesion ties sentences together into a meaningfully interwoven structure through words with the same or related meanings (Wong and Kit, 2012). [sent-15, score-0.865]
7 This paper studies lexical cohesion devices and incorporate them into document-level machine translation. [sent-16, score-1.126]
8 We propose a bilingual lexical cohesion trigger model to capture lexical cohesion for document-level SMT. [sent-17, score-2.175]
9 We consider a lexical cohesion item in the source language and its corresponding counterpart in the target language as a trigger pair, in which we treat the source language lexical cohesion item as the trigger and its target language counterpart as the triggered item. [sent-18, score-2.726]
10 Then we use mutual information to measure the strength of the dependency between the trigger and triggered item. [sent-19, score-0.482]
11 We integrate this model into a hierarchical phrase-based SMT system. [sent-20, score-0.055]
12 Section 3 elaborates the proposed bilingual lexical cohesion trigger model, including the details of identifying lexical cohesion devices, measuring dependency strength of bilingual lexical cohesion triggers and integrating the model into SMT. [sent-23, score-3.189]
13 2 Related Work As a linguistic means to establish inter-sentential links, cohesion has been explored in the literature of both linguistics and computational linguistics. [sent-26, score-0.756]
14 Cohesion is defined as relations of meaning that exist within the text and divided into grammatical cohesion that refers to the syntactic links between text items and lexical cohesion that is achieved through word choices in a text by Halliday and Hasan (1976). [sent-27, score-1.685]
15 In order to improve the quality of machine translation output, cohesion has served as a high level quality criterion in post-editing (Vasconcellos, 1989). [sent-28, score-0.816]
16 As a part of COMTIS project, grammatical cohesion is integrated into machine translation models to capture inter-sentential links (Cartoni et al. [sent-29, score-0.897]
17 Wong and Kit (2012) incorporate lexical cohesion to machine translation evaluation metrics to evaluate document-level machine translation quality. [sent-31, score-1.031]
18 (2013) integrate various target-side lexical cohesion devices into document-level machine translation. [sent-33, score-1.123]
19 Lexical cohesion is also partially explored in the cachebased translation models of Gong et al. [sent-34, score-0.793]
20 (201 1) and translation consistency constraints of Xiao et al. [sent-35, score-0.08]
21 All previous methods on lexical cohesion for document-level machine translation as mentioned above have one thing in common, which is that they do not use any source language information. [sent-39, score-0.957]
22 Our work is mostly related to the mutual information trigger based lexical cohesion model pro- posed by Xiong et al. [sent-40, score-1.204]
23 However, we significantly extend their model to a bilingual lexical cohesion trigger model that captures both source and target-side lexical cohesion items to improve target word selection in document-level machine translation. [sent-42, score-2.277]
24 1 Identification of Lexical Cohesion Devices Lexical cohesion can be divided into reiteration and collocation (Wong and Kit, 2012). [sent-44, score-0.988]
25 Reiteration is a form of lexical cohesion which involves the repetition of a lexical item. [sent-45, score-0.952]
26 Collocation is a pair of lexical items that have semantic relations, such as synonym, near-synonym, superordinate, subordinate, antonym, meronym and so on. [sent-46, score-0.14]
27 In the collocation, we focus on the synonym/nearsynonym and super-subordinate semantic relations 1. [sent-47, score-0.022]
28 We define lexical cohesion devices as content words that have lexical cohesion relations, namely the reiteration, synonym/near-synonym and supersubordinate. [sent-48, score-1.943]
29 Take the following two sentences extracted from a document for example (Halliday and Hasan, 1976). [sent-50, score-0.02]
30 We see that word elm in the first sentence is repeated in the second sentence. [sent-55, score-0.048]
31 Such reiteration devices are easy to identify in texts. [sent-56, score-0.453]
32 WordNet is a lexical resource that clusters words with the same sense into a semantic group called synset. [sent-59, score-0.109]
33 Let s(w) denote a function that defines all synonym words of w grouped in the same synset in WordNet. [sent-61, score-0.034]
34 We can use the function to compute all synonyms and near-synonyms for word w. [sent-62, score-0.029]
35 In order to represent conveniently, s0 denotes the set of synonyms in 1Other collocations are not used frequently, such antonyms. [sent-63, score-0.029]
36 Near-synonym set s1 as is defined as the union of all synsets that are defined by the function s(w) where w∈ s0. [sent-66, score-0.023]
37 s1 = [ s(w) (1) w[∈s0 s2 = [ s(w) w[∈ s3 = (2) [s 1 [ s(w) (3) w[∈s2 Similarly sm can be defined recursively as follows. [sent-68, score-0.035]
38 sm = [ s(w) (4) w∈[sm−1 Obviously, We can find synonyms and nearsynonyms for word w according to formula (4). [sent-69, score-0.084]
39 Superordinate and subordinate are formed by words with an is-a semantic relation in WordNet. [sent-70, score-0.059]
40 As the super-subordinate relation is also encoded in WordNet, we can define a function that is similar to s(w) identify hypernyms and hyponyms. [sent-71, score-0.025]
41 We use rep, syn and hyp to represent the lexical cohesion device reiteration, synonym/near- synonym and super-subordinate respectively hereafter for convenience. [sent-72, score-1.218]
42 2 Bilingual Lexical Cohesion Trigger Model In a bilingual text, lexical cohesion is present in the source and target language in a synchronous fashion. [sent-74, score-1.083]
43 We use a trigger model capture such a bilingual lexical cohesion relation. [sent-75, score-1.332]
44 We define xRy (R∈{rep, syn, hyp}) as a trigger pair where x is t(Rhe∈ trigger yinn ,th hey source language aanidr y tehree triggered item in the target language. [sent-76, score-0.881]
45 In order to capture these synchronous relations between lexical cohesion items in the source language and their counterparts in the target language, we use word alignments. [sent-77, score-1.018]
46 First, we identify a monolingual lexical cohesion relation in the target language in the form of tRy where t is the trigger, y the triggered item that occurs in a sentence succeeding the sentence of t, and R∈{rep, syn, hyp}. [sent-78, score-1.091]
47 Second, we ftienndc ew oofrd t x innd dth Re∈ source language . [sent-79, score-0.072]
48 We may find multiple words x1k in the source language that are aligned to t. [sent-81, score-0.051]
49 We use all of them xiRt(1≤i≤k) to define bilingual lexical cohesion reRlatti(o1≤nsi. [sent-82, score-0.987]
50 ≤ kIn) tt ohis d way, we can create bilingual lexical cohesion relations xRy (R∈{rep, syn, hyp}): x being the trigger and y Rthye triggered sityenm,. [sent-83, score-1.424]
51 h 383 The possibility that y will occur given x is equal to the chance that x triggers y. [sent-84, score-0.028]
52 Therefore we measure the strength of dependency between the trigger and triggered item according to pointwise mutual information (PMI) (Church and Hanks, 1990; Xiong et al. [sent-85, score-0.534]
53 The PMI for the trigger pair xRy where x is the trigger, y the triggered item that occurs in a target sentence succeeding the target sentence that aligns to the source sentence of x, and R∈{rep, syn, hyp} tiso ctahelc suolautrecde as nftoellnocwe os. [sent-87, score-0.616]
54 p(x,R) =XC(x,y,R) (7) Xy p(y,R) = XC(x,y,R) (8) Xx Given a target sentence y1m, our bilingual lexical cohesion trigger model is defined as follows. [sent-90, score-1.348]
55 MIR(y1m) = Y exp(PMI(·Ryi)) (9) Yyi where yi are content words in the sentence y1m and PMI(·Ryi)is the maximum PMI value among all trigger wyords x1q from source sentences that have been recently translated, where trigger words x1q have an R relation with word yi. [sent-91, score-0.716]
56 PMI(·Ryi) = max1≤j≤qPMI(xjRyi) (10) Three models MIrep(y1m), MIsyn(y1m), MIhyp(y1m) for the reiteration device, the synonym/near-synonym device and the supersubordinate device can be formulated as above. [sent-92, score-0.457]
57 3 Decoding We incorporate our bilingual lexical cohesion trigger model into a hierarchical phrase-based system (Chiang, 2007). [sent-95, score-1.364]
58 • MIrep(y1m) • MIsyn(y1m) • MIhyp(y1m) In order to quickly calculate the score of each feature, we calculate PMI for each trigger pair before decoding. [sent-97, score-0.319]
59 During translation, we maintain a cache to store source language sentences of recently translated target sentences and three sets Srep, Ssyn, Shyp to store source language words that have the relation of {rep, syn, hyp} with content words genelartaitoend oinf target language. [sent-99, score-0.331]
60 During decoding, we update scores according to formula (9). [sent-100, score-0.02]
61 When one sentence is translated, we store the corresponding source sentence into the cache. [sent-101, score-0.062]
62 When the whole document is translated, we clear the cache for the next document. [sent-102, score-0.051]
63 1 Setup Our experiments were conducted on the NIST Chinese-English translation tasks with large-scale training data. [sent-104, score-0.059]
64 For the bilingual lexical cohesion trigger model, we collected data with document boundaries explicitly provided. [sent-113, score-1.326]
65 The corpora are selected from our bilingual training data and the whole Hong Kong parallel text corpus3, which contains 103,236 documents with 2. [sent-114, score-0.144]
66 0a584g85e(%) Table 1: Distributions of lexical cohesion devices in the target language. [sent-126, score-1.121]
67 In this section we want to study how these lexical cohesion devices distribute in the training data before conducting our experiments on the bilingual lexical cohesion model. [sent-127, score-2.066]
68 Here we study the distribution of lexical cohesion in the target language (English). [sent-128, score-0.885]
69 Table 1 shows the distribution of percentages that are counted based on the content words in the training data. [sent-129, score-0.021]
70 From Table 1, we can see that the reiteration cohesion device is nearly a third of all content words (30. [sent-130, score-1.057]
71 Obviously, lexical cohesion devices are frequently used in real-world texts. [sent-134, score-1.079]
72 Therefore capturing lexical cohesion devices is very useful for document-level machine translation. [sent-135, score-1.102]
73 v74238g72 Table 2: BLEU scores with various lexical co- hesion devices on the test sets MT06 and MT08. [sent-140, score-0.345]
74 “Base” is the traditonal hierarchical system, “Avg” is the average BLEU score on the two test sets. [sent-141, score-0.034]
75 From the table, we can see that integrating a single lexical cohesion device into SMT, the model gains an improvement of up to 0. [sent-143, score-0.928]
76 04 BLEU points on MT06 test set, and an average improvement of 0. [sent-146, score-0.022]
77 85 BLEU points on the two test sets of MT06 and MT08. [sent-147, score-0.022]
78 These stable improvements strongly suggest that our bilingual lexical cohesion trigger model is able to substantially improve the translation quality. [sent-148, score-1.365]
79 5 Conclusions In this paper we have presented a bilingual lexical cohesion trigger model to incorporate three classes of lexical cohesion devices, namely the reiteration, synonym/near-synonym and supersubordinate devices into a hierarchical phrasebased system. [sent-149, score-2.491]
80 This displays the advantage of exploiting bilingual lexical cohesion. [sent-151, score-0.253]
81 Grammatical and lexical cohesion have often been studied together in discourse analysis. [sent-152, score-0.843]
82 In the future, we plan to extend our model to capture both grammatical and lexical cohesion in document-level machine translation. [sent-153, score-0.919]
83 Improving mt coherence through text-level processing of input texts: the comtis project. [sent-162, score-0.093]
84 Word 385 association norms, mutual information, and lexicography. [sent-172, score-0.042]
85 Bleu: a method for automatic evaluation of machine translation. [sent-205, score-0.023]
86 Cohesion and coherence in the presentation of machine translation products. [sent-214, score-0.103]
87 Extending machine translation evaluation metrics with lexical cohesion to document level. [sent-220, score-0.945]
88 Enhancing language models in statistical machine translation with backward n-grams and mutual information triggers. [sent-229, score-0.124]
wordName wordTfidf (topN-words)
[('cohesion', 0.734), ('trigger', 0.319), ('devices', 0.236), ('reiteration', 0.217), ('hyp', 0.149), ('bilingual', 0.144), ('rep', 0.118), ('lexical', 0.109), ('syn', 0.107), ('xry', 0.097), ('triggered', 0.096), ('pmi', 0.09), ('xiong', 0.087), ('device', 0.085), ('comtis', 0.072), ('ryi', 0.072), ('translation', 0.059), ('bleu', 0.059), ('halliday', 0.056), ('item', 0.052), ('deyi', 0.049), ('cartoni', 0.048), ('elm', 0.048), ('guosheng', 0.048), ('mihyp', 0.048), ('mirep', 0.048), ('misyn', 0.048), ('supersubordinate', 0.048), ('wong', 0.047), ('xc', 0.044), ('superordinate', 0.043), ('mutual', 0.042), ('target', 0.042), ('kit', 0.04), ('nist', 0.04), ('collocation', 0.037), ('gong', 0.037), ('sm', 0.035), ('synonym', 0.034), ('subordinate', 0.034), ('hierarchical', 0.034), ('succeeding', 0.033), ('source', 0.032), ('qun', 0.032), ('kong', 0.032), ('cache', 0.031), ('items', 0.031), ('hong', 0.03), ('hasan', 0.03), ('store', 0.03), ('och', 0.03), ('smt', 0.03), ('synonyms', 0.029), ('wordnet', 0.029), ('links', 0.028), ('triggers', 0.028), ('church', 0.028), ('translated', 0.027), ('counterpart', 0.027), ('grammatical', 0.027), ('yajuan', 0.026), ('capture', 0.026), ('relation', 0.025), ('strength', 0.025), ('incorporate', 0.024), ('min', 0.023), ('synsets', 0.023), ('machine', 0.023), ('franz', 0.023), ('obviously', 0.023), ('josef', 0.023), ('relations', 0.022), ('synchronous', 0.022), ('formulated', 0.022), ('xiao', 0.022), ('establish', 0.022), ('points', 0.022), ('billy', 0.021), ('hey', 0.021), ('georgetown', 0.021), ('oofrd', 0.021), ('rhe', 0.021), ('zhiyang', 0.021), ('integrate', 0.021), ('coherence', 0.021), ('ben', 0.021), ('consistency', 0.021), ('content', 0.021), ('document', 0.02), ('formula', 0.02), ('teng', 0.02), ('suda', 0.02), ('iuqun', 0.02), ('aligned', 0.019), ('zhengxian', 0.019), ('jacques', 0.019), ('oinf', 0.019), ('innd', 0.019), ('climbing', 0.019), ('neglect', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 69 acl-2013-Bilingual Lexical Cohesion Trigger Model for Document-Level Machine Translation
Author: Guosheng Ben ; Deyi Xiong ; Zhiyang Teng ; Yajuan Lu ; Qun Liu
Abstract: In this paper, we propose a bilingual lexical cohesion trigger model to capture lexical cohesion for document-level machine translation. We integrate the model into hierarchical phrase-based machine translation and achieve an absolute improvement of 0.85 BLEU points on average over the baseline on NIST Chinese-English test sets.
2 0.17512943 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
Author: Peifeng Li ; Qiaoming Zhu ; Guodong Zhou
Abstract: As a paratactic language, sentence-level argument extraction in Chinese suffers much from the frequent occurrence of ellipsis with regard to inter-sentence arguments. To resolve such problem, this paper proposes a novel global argument inference model to explore specific relationships, such as Coreference, Sequence and Parallel, among relevant event mentions to recover those intersentence arguments in the sentence, discourse and document layers which represent the cohesion of an event or a topic. Evaluation on the ACE 2005 Chinese corpus justifies the effectiveness of our global argument inference model over a state-of-the-art baseline. 1
3 0.13815221 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features
Author: Qi Li ; Heng Ji ; Liang Huang
Abstract: Traditional approaches to the task of ACE event extraction usually rely on sequential pipelines with multiple stages, which suffer from error propagation since event triggers and arguments are predicted in isolation by independent local classifiers. By contrast, we propose a joint framework based on structured prediction which extracts triggers and arguments together so that the local predictions can be mutually improved. In addition, we propose to incorporate global features which explicitly capture the dependencies of multiple triggers and arguments. Experimental results show that our joint approach with local features outperforms the pipelined baseline, and adding global features further improves the performance significantly. Our approach advances state-ofthe-art sentence-level event extraction, and even outperforms previous argument labeling methods which use external knowledge from other sentences and documents.
4 0.10440849 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays
Author: Beata Beigman Klebanov ; Michael Flor
Abstract: We describe a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassociated, and dis-associated pairs of words that co-exist in the given text. We illustrate the shape of the distirbution and observe variation with genre and target audience. We present a study of the relationship between quality of writing and word association profiles. For a set of essays written by college graduates on a number of general topics, we show that the higher scoring essays tend to have higher percentages of both highly associated and dis-associated pairs, and lower percentages of mildly associated pairs of words. Finally, we use word association profiles to improve a system for automated scoring of essays.
5 0.10057024 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
Author: Minwei Feng ; Jan-Thorsten Peter ; Hermann Ney
Abstract: In this paper, we propose a novel reordering model based on sequence labeling techniques. Our model converts the reordering problem into a sequence labeling problem, i.e. a tagging task. Results on five Chinese-English NIST tasks show that our model improves the baseline system by 1.32 BLEU and 1.53 TER on average. Results of comparative study with other seven widely used reordering models will also be reported.
6 0.091983534 386 acl-2013-What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse
7 0.087343857 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
8 0.083942935 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
9 0.07533367 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model
10 0.067077763 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory
11 0.064287275 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
12 0.058303397 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
13 0.057972006 247 acl-2013-Modeling of term-distance and term-occurrence information for improving n-gram language model performance
14 0.057934973 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
15 0.055793315 73 acl-2013-Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
16 0.054960951 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
17 0.054699033 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference
18 0.053475048 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
19 0.053362813 255 acl-2013-Name-aware Machine Translation
20 0.052852165 195 acl-2013-Improving machine translation by training against an automatic semantic frame based evaluation metric
topicId topicWeight
[(0, 0.129), (1, -0.049), (2, 0.07), (3, 0.0), (4, 0.006), (5, 0.069), (6, -0.044), (7, 0.073), (8, -0.024), (9, 0.041), (10, 0.045), (11, -0.008), (12, 0.009), (13, 0.051), (14, 0.075), (15, -0.036), (16, -0.019), (17, -0.064), (18, 0.014), (19, -0.044), (20, 0.034), (21, -0.072), (22, 0.012), (23, -0.027), (24, 0.017), (25, 0.008), (26, -0.039), (27, -0.016), (28, -0.023), (29, 0.024), (30, -0.03), (31, -0.068), (32, -0.14), (33, 0.04), (34, -0.095), (35, 0.03), (36, 0.03), (37, 0.135), (38, 0.068), (39, -0.028), (40, 0.11), (41, 0.104), (42, -0.106), (43, 0.003), (44, -0.02), (45, -0.014), (46, 0.044), (47, 0.034), (48, -0.083), (49, -0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.8838076 69 acl-2013-Bilingual Lexical Cohesion Trigger Model for Document-Level Machine Translation
Author: Guosheng Ben ; Deyi Xiong ; Zhiyang Teng ; Yajuan Lu ; Qun Liu
Abstract: In this paper, we propose a bilingual lexical cohesion trigger model to capture lexical cohesion for document-level machine translation. We integrate the model into hierarchical phrase-based machine translation and achieve an absolute improvement of 0.85 BLEU points on average over the baseline on NIST Chinese-English test sets.
2 0.61797184 206 acl-2013-Joint Event Extraction via Structured Prediction with Global Features
Author: Qi Li ; Heng Ji ; Liang Huang
Abstract: Traditional approaches to the task of ACE event extraction usually rely on sequential pipelines with multiple stages, which suffer from error propagation since event triggers and arguments are predicted in isolation by independent local classifiers. By contrast, we propose a joint framework based on structured prediction which extracts triggers and arguments together so that the local predictions can be mutually improved. In addition, we propose to incorporate global features which explicitly capture the dependencies of multiple triggers and arguments. Experimental results show that our joint approach with local features outperforms the pipelined baseline, and adding global features further improves the performance significantly. Our approach advances state-ofthe-art sentence-level event extraction, and even outperforms previous argument labeling methods which use external knowledge from other sentences and documents.
3 0.56128454 386 acl-2013-What causes a causal relation? Detecting Causal Triggers in Biomedical Scientific Discourse
Author: Claudiu Mihaila ; Sophia Ananiadou
Abstract: Current domain-specific information extraction systems represent an important resource for biomedical researchers, who need to process vaster amounts of knowledge in short times. Automatic discourse causality recognition can further improve their workload by suggesting possible causal connections and aiding in the curation of pathway models. We here describe an approach to the automatic identification of discourse causality triggers in the biomedical domain using machine learning. We create several baselines and experiment with various parameter settings for three algorithms, i.e., Conditional Random Fields (CRF), Support Vector Machines (SVM) and Random Forests (RF). Also, we evaluate the impact of lexical, syntactic and semantic features on each of the algorithms and look at er- rors. The best performance of 79.35% F-score is achieved by CRFs when using all three feature types.
4 0.55017793 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
Author: Peifeng Li ; Qiaoming Zhu ; Guodong Zhou
Abstract: As a paratactic language, sentence-level argument extraction in Chinese suffers much from the frequent occurrence of ellipsis with regard to inter-sentence arguments. To resolve such problem, this paper proposes a novel global argument inference model to explore specific relationships, such as Coreference, Sequence and Parallel, among relevant event mentions to recover those intersentence arguments in the sentence, discourse and document layers which represent the cohesion of an event or a topic. Evaluation on the ACE 2005 Chinese corpus justifies the effectiveness of our global argument inference model over a state-of-the-art baseline. 1
5 0.47492385 389 acl-2013-Word Association Profiles and their Use for Automated Scoring of Essays
Author: Beata Beigman Klebanov ; Michael Flor
Abstract: We describe a new representation of the content vocabulary of a text we call word association profile that captures the proportions of highly associated, mildly associated, unassociated, and dis-associated pairs of words that co-exist in the given text. We illustrate the shape of the distirbution and observe variation with genre and target audience. We present a study of the relationship between quality of writing and word association profiles. For a set of essays written by college graduates on a number of general topics, we show that the higher scoring essays tend to have higher percentages of both highly associated and dis-associated pairs, and lower percentages of mildly associated pairs of words. Finally, we use word association profiles to improve a system for automated scoring of essays.
7 0.41762882 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
8 0.38596764 246 acl-2013-Modeling Thesis Clarity in Student Essays
9 0.38047042 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations
10 0.37810358 255 acl-2013-Name-aware Machine Translation
11 0.37722161 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
12 0.37678576 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
13 0.37648422 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
14 0.37621954 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
15 0.37606174 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
16 0.37042087 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
17 0.36873135 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
18 0.3640452 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation
19 0.35589123 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?
20 0.3553735 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis
topicId topicWeight
[(0, 0.056), (6, 0.023), (11, 0.06), (24, 0.056), (26, 0.036), (35, 0.069), (42, 0.11), (48, 0.035), (54, 0.276), (64, 0.018), (70, 0.028), (88, 0.019), (90, 0.032), (95, 0.065)]
simIndex simValue paperId paperTitle
same-paper 1 0.77548337 69 acl-2013-Bilingual Lexical Cohesion Trigger Model for Document-Level Machine Translation
Author: Guosheng Ben ; Deyi Xiong ; Zhiyang Teng ; Yajuan Lu ; Qun Liu
Abstract: In this paper, we propose a bilingual lexical cohesion trigger model to capture lexical cohesion for document-level machine translation. We integrate the model into hierarchical phrase-based machine translation and achieve an absolute improvement of 0.85 BLEU points on average over the baseline on NIST Chinese-English test sets.
2 0.69357985 212 acl-2013-Language-Independent Discriminative Parsing of Temporal Expressions
Author: Gabor Angeli ; Jakob Uszkoreit
Abstract: Temporal resolution systems are traditionally tuned to a particular language, requiring significant human effort to translate them to new languages. We present a language independent semantic parser for learning the interpretation of temporal phrases given only a corpus of utterances and the times they reference. We make use of a latent parse that encodes a language-flexible representation of time, and extract rich features over both the parse and associated temporal semantics. The parameters of the model are learned using a weakly supervised bootstrapping approach, without the need for manually tuned parameters or any other language expertise. We achieve state-of-the-art accuracy on all languages in the TempEval2 temporal normalization task, reporting a 4% improvement in both English and Spanish accuracy, and to our knowledge the first results for four other languages.
3 0.65377921 251 acl-2013-Mr. MIRA: Open-Source Large-Margin Structured Learning on MapReduce
Author: Vladimir Eidelman ; Ke Wu ; Ferhan Ture ; Philip Resnik ; Jimmy Lin
Abstract: We present an open-source framework for large-scale online structured learning. Developed with the flexibility to handle cost-augmented inference problems such as statistical machine translation (SMT), our large-margin learner can be used with any decoder. Integration with MapReduce using Hadoop streaming allows efficient scaling with increasing size of training data. Although designed with a focus on SMT, the decoder-agnostic design of our learner allows easy future extension to other structured learning problems such as sequence labeling and parsing.
4 0.55260295 56 acl-2013-Argument Inference from Relevant Event Mentions in Chinese Argument Extraction
Author: Peifeng Li ; Qiaoming Zhu ; Guodong Zhou
Abstract: As a paratactic language, sentence-level argument extraction in Chinese suffers much from the frequent occurrence of ellipsis with regard to inter-sentence arguments. To resolve such problem, this paper proposes a novel global argument inference model to explore specific relationships, such as Coreference, Sequence and Parallel, among relevant event mentions to recover those intersentence arguments in the sentence, discourse and document layers which represent the cohesion of an event or a topic. Evaluation on the ACE 2005 Chinese corpus justifies the effectiveness of our global argument inference model over a state-of-the-art baseline. 1
5 0.54675794 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
Author: Jiajun Zhang ; Chengqing Zong
Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1
6 0.54493695 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
7 0.54382056 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
8 0.54378653 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation
9 0.54204571 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures
10 0.54128695 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
11 0.53660399 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search
12 0.53534132 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
13 0.5349015 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
14 0.53418863 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
15 0.5339607 166 acl-2013-Generalized Reordering Rules for Improved SMT
16 0.53353536 225 acl-2013-Learning to Order Natural Language Texts
17 0.53349495 90 acl-2013-Conditional Random Fields for Responsive Surface Realisation using Global Features
18 0.53334469 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
19 0.53308815 183 acl-2013-ICARUS - An Extensible Graphical Search Tool for Dependency Treebanks
20 0.53256655 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing