acl acl2011 acl2011-116 knowledge-graph by maker-knowledge-mining

116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers


Source: pdf

Author: Deyi Xiong ; Min Zhang ; Haizhou Li

Abstract: In this paper, with a belief that a language model that embraces a larger context provides better prediction ability, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We integrate the two proposed models into phrase-based statistical machine translation and conduct experiments on large-scale training data to investigate their effectiveness. Our experimental results show that both models are able to significantly improve transla- , tion quality and collectively achieve up to 1 BLEU point over a competitive baseline.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We integrate the two proposed models into phrase-based statistical machine translation and conduct experiments on large-scale training data to investigate their effectiveness. [sent-2, score-0.247]

2 The standard n-gram language model (Goodman, 2001) assigns probabilities to hypotheses in the target language conditioning on a context history of the preceding n − 1 words. [sent-6, score-0.266]

3 To some extent, these syntactically-informed language models are consistent with syntax-based translation models in capturing long-distance dependencies. [sent-23, score-0.17]

4 With a belief that a language model that embraces a larger context provides better prediction ability, we learn additional information from training data to enhance conventional n-gram language models and extend their ability to capture richer contexts and long-distance dependencies. [sent-25, score-0.205]

5 In particular, we integrate backward n-grams and mutual information (MI) triggers into language models in SMT. [sent-26, score-0.751]

6 c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s288–1297, on forward n-grams as forward n-gram language model. [sent-30, score-0.44]

7 Similarly, backward n-grams refer to the succeeding n − 1 words plus the current word. [sent-31, score-0.536]

8 a cWkeward n-grams and integrate the forward and backward language models together into the decoder. [sent-33, score-0.783]

9 Different from the backward n-gram language model, the MI trigger model still looks at previous contexts, which however go beyond the scope of forward n-grams. [sent-35, score-1.372]

10 If the current word is indexed as wi, the farthest word that the forward n-gram includes is wi−n+1. [sent-36, score-0.241]

11 However, the MI triggers are capable of detecting dependencies between wi and words from w1 to wi−n. [sent-37, score-0.272]

12 By these triggers ({wk → wi}, 1 ≤ k ≤ i−n), we can capture long-distance dependen- ckie ≤s ith−at are wouet csiadne c tahpet scope ogf- dfoisrtwanarcde n-grams. [sent-38, score-0.174]

13 We integrate the proposed backward language model and the MI trigger model into a state-ofthe-art phrase-based SMT system. [sent-39, score-1.271]

14 Compared with the baseline which only uses the forward language model, our experimental results show that the additional backward language model is able to gain about 0. [sent-41, score-0.754]

15 5 BLEU points, while the MI trigger model gains about 0. [sent-42, score-0.676]

16 Section 3 and 4 will elaborate the backward language model and the MI trigger model respectively in more detail, describe the training procedures and explain how the models are integrated into the phrase-based decoder. [sent-47, score-1.238]

17 Since our philosophy is fundamentally different from them in that we build contextually-informed language models by using backward n-grams and MI triggers, we discuss previous work that explore these two techniques (backward n-grams and MI triggers) in this section. [sent-57, score-0.5]

18 Since the context “history” in the backward language model (BLM) is actually the future words to be generated, BLM is normally used in a postprocessing where all words have already been generated or in a scenario where sentences are proceeded from the ending to the beginning. [sent-58, score-0.549]

19 Finch and Sumita (2009) use the BLM in their reverse translation decoder where source sentences are proceeded from the ending to the beginning. [sent-61, score-0.256]

20 (1994) introduce trigger pairs into a maximum entropy based language model as features. [sent-64, score-0.697]

21 The trigger pairs are selected according to their mutual information. [sent-65, score-0.68]

22 Zhou (2004) also propose an enhanced language model (MI-Ngram) which consists of a standard forward n-gram language model and an MI trigger model. [sent-66, score-1.02]

23 The latter model measures the mutual information of distancedependent trigger pairs. [sent-67, score-0.725]

24 Our MI trigger model is mostly inspired by the work of these two papers, especially by Zhou’s MI-Ngram model (2004). [sent-68, score-0.742]

25 (2009) use MI triggers in their confidence measures to assess the quality of translation results after decoding. [sent-71, score-0.242]

26 Our method is different from theirs in the MI calculation and trigger pair selection. [sent-72, score-0.61]

27 (2009) propose bilingual triggers where two source words trigger one target word to 1Language model adaptation is not very related to our work so we ignore it. [sent-74, score-0.868]

28 Our analysis (Section 6) show that our monolingual triggers can also help in the selection of target words. [sent-76, score-0.193]

29 wm), a standard forward n-gram language model assigns a probability Pf(w1m) to w1m as follows. [sent-80, score-0.341]

30 Different from the forward n-gram language model, the backward n-gram language model as- signs a probability Pb(w1m) to w1m by looking at the succeeding context according to ∏m Pb(w1m) = ∏P(wi|wim+1) i∏= ∏1 ∏m ≈ ∏P(wi|wii++1n−1) (2) i∏= ∏1 3. [sent-89, score-0.83]

31 In this way, we can use the same toolkit that we use to train a forward n-gram language model to train a backward n-gram language model without any other changes. [sent-99, score-0.793]

32 To be consistent with training, we also need to reverse the order of translation hypotheses when we access the trained backward language model2. [sent-100, score-0.605]

33 Wu (1996) introduce a dynamic programming algorithm to integrate a forward bigram language model with inversion transduction grammar. [sent-116, score-0.437]

34 His algorithm is then adapted and extended for integrating forward n-gram language models into synchronous CFGs by Chiang (2007). [sent-117, score-0.276]

35 We adopt a different way to calculate language model probabilities for partial hypotheses so that we can utilize incomplete n-grams. [sent-121, score-0.191]

36 wi+}1) 1 ≤i≤∏ ∏k−n+ 1 |{bz} (3) This function consists of two parts: • • The first part (a) calculates incomplete n-gram language amrtod (ae)l probabilities ofomr wleoterd n wk tmo wk−n+2. [sent-136, score-0.277]

37 That means, we calculate the unigram probability for wk (P(wk)), bigram probability for wk−1 (P(wk−1 |wk)) and so on until we take n − 1-gram probability for wk−n+2 (P(wk−n+2 |wk . [sent-137, score-0.299]

38 Tlithyis f rres wembles the way in w|which the forward language model probability in the future cost is computed in the standard phrase-based SMT (Koehn et al. [sent-141, score-0.341]

39 Since we calculate backward language model probabilities during a beginning-to-ending (left-to-right) decoding process, the succeeding context for the current word is either yet to be generated or incomplete in terms of n-grams. [sent-145, score-0.73]

40 Once the succeeding contexts are complete, we can quickly update language model probabilities in an efficient way in our algorithms. [sent-147, score-0.238]

41 1,w1, ioft hke≥r wi nse (4) (5) The L and R function return the leftmost and rightmost n −nd 1 R Rw fourndcs ifornom re a string einf a reverse roigrdhet-r respectively. [sent-160, score-0.25]

42 We firstly show the algorithm3 that integrates the backward language model into a BTG-style decoder (Xiong et al. [sent-162, score-0.561]

43 nWde only display tnhvee rbtaecdk owraderdr l(aden-guage model probability for each item, ignoring all other scores such as phrase translation probabilities. [sent-167, score-0.198]

44 (8) in Figure 1 shows how we calculate the backward language model probability for the axiom which applies a BTG lexicon rule to translate a source phrase c into a target phrase e. [sent-169, score-0.665]

45 (9) and (10) show how we update the backward language model probabilities for two inference rules which combine two neighboring blocks in a straight and inverted order respectively. [sent-171, score-0.677]

46 The fundamental theories behind this update are P(e1e2) = P(e1)P(e2)P(PR(R(e(2e)2))PL((Le1(e)1))) (6) 3It can also be easily adapted to integrate the forward ngram language model. [sent-172, score-0.348]

47 aTh 3e-sgera tmwo e equations guarantee toh vate our algorithm can correctly compute the backward language model probability of a sentence stepwise in a dynamic programming framework. [sent-182, score-0.559]

48 Figure 2 shows the algorithm that integrates the backward language model into a standard phrasebased SMT (Koehn et al. [sent-186, score-0.533]

49 [V′;L[(Ve;1Le2(e)]1 :)] P :( Pe1(e)P1)(e2 c)/Pe2(PR(:R( Pe(2e()2e) PL2)( Le(1e)1) ) (11) Figure 2: Integrating the backward language model into a standard phrase-based decoder. [sent-194, score-0.533]

50 A trigger pair is defined as an ordered 2-tuple (x, y) where word x occurs in the preceding context of word y. [sent-202, score-0.662]

51 It can also be denoted in a more visual manner as x → y with x being the trigger and y the triggered w →ord y5. [sent-203, score-0.667]

52 x and (12) y occur in the 1292 Zhou (2004) proposes a new language model enhanced with MI trigger pairs. [sent-205, score-0.708]

53 The second one is the MI trigger model which multiples all exponential PMI values for trigger pairs where the current word is the triggered word and all preceding words outside the n-gram window of the current word are triggers. [sent-208, score-1.458]

54 Note that his MI trigger model is distance-dependent since trigger pairs (wk, wi) are sensitive to their distance i−k −1 (zero distance for adjacent words). [sent-209, score-1.307]

55 By MERT (Och, 2003), we are even able to tune the weight of the MI trigger model against the weight of the standard n-gram language model while Zhou (2004) sets equal weights for both models. [sent-212, score-0.768]

56 1 Training We can use the maximum likelihood estimation method to calculate PMI for each trigger pair by taking counts from training data. [sent-214, score-0.642]

57 Let C(x, y) be the co-occurrence count of the trigger pair (x, y) in the training data. [sent-215, score-0.61]

58 We select trigger pairs according to the following three steps 1. [sent-217, score-0.631]

59 Finally, we only keep trigger pairs whose PMI value is larger than 0. [sent-227, score-0.631]

60 , 2003), whenever a par- tial hypothesis is extended by a new target phrase, we can quickly retrieve the pre-computed PMI value for each trigger pair where the triggered word locates in the newly translated target phrase and the trigger is outside the n-word window of the triggered word. [sent-234, score-1.46]

61 It’s a little more complicated to integrate the MI trigger model into the CKY-style 1293 phrase-based decoder. [sent-235, score-0.764]

62 It is defined as follows MI(e1 → e2) = ∏ ∏ exp(PMI(wk,wi)) w∏i∈e2w∏k∈e1 ≥n(19) 5 Experiments In this section, we conduct large-scale experiments on NIST Chinese-to-English translation tasks to evaluate the effectiveness of the proposed backward language model and MI trigger model in SMT. [sent-237, score-1.285]

63 How much improvements can we achieve by separately integrating the backward language model and the MI trigger model into our phrase-based SMT system? [sent-239, score-1.205]

64 The translation model MT consists of widely used phrase and lexical translation probabilities (Koehn et al. [sent-254, score-0.286]

65 6We have discussed how to integrate the backward language model and the MI trigger model into the standard phrase-based SMT system (Koehn et al. [sent-256, score-1.297]

66 If we simultaneously integrate both the backward language model PbL and the MI trigger model MI into the system, the new log-linear model will be formulated as w(D) =M· PTb(Lr(1le. [sent-263, score-1.337]

67 We used all corpora to train our translation model and smaller corpora without the United Nations corpus to build a maximum entropy based reordering model (Xiong et al. [sent-270, score-0.233]

68 To train our language models and MI trigger model, we used the Xinhua section of the English Gigaword corpus (306 million words). [sent-272, score-0.644]

69 Firstly, we built a forward 5-gram language model using the SRILM toolkit (Stolcke, 2002) with modified Kneser-Ney smoothing. [sent-273, score-0.286]

70 Then we trained a backward 5-gram language model on the same monolingual corpus in the way described in Section 3. [sent-274, score-0.532]

71 Finally, we trained our MI trigger model still on this corpus according to the method in Section 4. [sent-276, score-0.676]

72 When we combine the backward language model with the forward language 7LDC2004E12, LDC2004T08, LDC2005T10, LDC2003E14, LDC2002E18, LDC2005T06, LDC2003E07 and LDC2004T07. [sent-286, score-0.727]

73 The MI trigger model also achieves statistically significant improvements of 0. [sent-303, score-0.676]

74 When we integrate both the backward language model and the MI trigger model into our system, we obtain improvements of 1. [sent-306, score-1.271]

75 71 BLEU points over the single forward language model on the MT-04 and MT-05 respectively. [sent-308, score-0.286]

76 These improvements are larger than those achieved by using only one model (the backward language model or the MI trigger model). [sent-309, score-1.183]

77 The italic words in the hypothesis generated by using the backward language model (F+B) exactly match the reference. [sent-313, score-0.593]

78 We calculate the forward/backward language model score (the logarithm of language model probability) for the italic words in both the baseline and F+B hy- pothesis according to the trained language models. [sent-315, score-0.28]

79 The difference in the forward language model score is only 1. [sent-316, score-0.286]

80 On the other hand, the difference in the backward language model score is 3. [sent-318, score-0.507]

81 This larger difference may guarantee that the hypothesis generated by F+B comparing the baseline with the backward language model. [sent-320, score-0.514]

82 This suggests that the backward language model is able to provide useful and discriminative information which is complementary to that given by the forward language model. [sent-323, score-0.727]

83 In Table 4, we present another example to show how the MI trigger model improves translation quality. [sent-324, score-0.757]

84 The new system enhanced with the MI trigger model (F+M) selects the former while the baseline selects the latter. [sent-326, score-0.791]

85 The forward language model score for the baseline hypothesis is -26. [sent-327, score-0.359]

86 The forward 5-gram language model is hence not able to take it into account when calculating the probability of “was”. [sent-333, score-0.336]

87 However, this is not a problem for the MI trigger model. [sent-334, score-0.61]

88 Since “is” and “was” rarely co-occur in the same sentence, the PMI value of the trigger pair (is, was)8 is -1. [sent-335, score-0.61]

89 03 8Since we remove all trigger pairs whose PMI value is negative, the PMI value of this pair (is, was) is set 0 in practice in the decoder. [sent-336, score-0.631]

90 1295 comparing the baseline with the MI trigger model. [sent-337, score-0.637]

91 while the PMI value of the trigger pair (is, is) is as high as 0. [sent-341, score-0.61]

92 Therefore our MI trigger model selects “is” rather than “was”. [sent-343, score-0.704]

93 9 This example illustrates that the MI trigger model is capable of selecting correct words by using long-distance trigger pairs. [sent-344, score-1.286]

94 7 Conclusion We have presented two models to enhance the ability of standard n-gram language models in capturing richer contexts and long-distance dependencies that go beyond the scope of forward n-gram windows. [sent-345, score-0.447]

95 The first model is the backward language model which uses backward n-grams to predict the current word. [sent-347, score-1.035]

96 We introduced algorithms that directly integrate the backward language model into a CKY-style and a stan- dard phrase-based decoder respectively. [sent-348, score-0.649]

97 The second model is the MI trigger model that incorporates long-distance trigger pairs into language modeling. [sent-349, score-1.373]

98 Further study of the two 9The overall MI trigger model scores (the logarithm of Eq. [sent-351, score-0.697]

99 models indicates that backward n-grams and longdistance triggers provide useful information to improve translation quality. [sent-355, score-0.695]

100 In future work, we would like to integrate the backward language model into a syntax-based system in a way that is similar to the proposed algorithm shown in Figure 1. [sent-356, score-0.595]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('trigger', 0.61), ('backward', 0.441), ('mi', 0.309), ('forward', 0.22), ('wk', 0.18), ('pmi', 0.146), ('triggers', 0.139), ('blm', 0.111), ('wi', 0.103), ('integrate', 0.088), ('translation', 0.081), ('succeeding', 0.074), ('xiong', 0.071), ('smt', 0.067), ('model', 0.066), ('triggered', 0.057), ('reverse', 0.055), ('decoder', 0.054), ('preceding', 0.052), ('inverted', 0.051), ('wii', 0.049), ('mutual', 0.049), ('bleu', 0.047), ('hypothesis', 0.046), ('btg', 0.045), ('straight', 0.043), ('finch', 0.042), ('proceeded', 0.042), ('italic', 0.04), ('update', 0.04), ('transduction', 0.04), ('koehn', 0.04), ('duchateau', 0.037), ('embraces', 0.037), ('hke', 0.037), ('invert', 0.037), ('mauser', 0.037), ('probabilities', 0.036), ('rosenfeld', 0.036), ('scope', 0.035), ('zhou', 0.035), ('nist', 0.034), ('models', 0.034), ('pl', 0.033), ('raybaud', 0.033), ('calculate', 0.032), ('calculates', 0.032), ('chiang', 0.032), ('enhanced', 0.032), ('decoding', 0.031), ('talbot', 0.03), ('dependencies', 0.03), ('incomplete', 0.029), ('probability', 0.029), ('target', 0.029), ('history', 0.029), ('emami', 0.028), ('pothesis', 0.028), ('selects', 0.028), ('hypotheses', 0.028), ('baseline', 0.027), ('brants', 0.027), ('standard', 0.026), ('sumita', 0.026), ('aug', 0.026), ('decoders', 0.026), ('ability', 0.025), ('monolingual', 0.025), ('nse', 0.025), ('philosophy', 0.025), ('fl', 0.024), ('source', 0.024), ('leftmost', 0.023), ('deyi', 0.023), ('ioft', 0.023), ('exp', 0.023), ('statistical', 0.023), ('dynamic', 0.023), ('contexts', 0.022), ('integrating', 0.022), ('charniak', 0.022), ('icassp', 0.022), ('confidence', 0.022), ('phrase', 0.022), ('pr', 0.022), ('logarithm', 0.021), ('rightmost', 0.021), ('singapore', 0.021), ('distributed', 0.021), ('conduct', 0.021), ('integrated', 0.021), ('collectively', 0.021), ('calculating', 0.021), ('conventional', 0.021), ('capturing', 0.021), ('pairs', 0.021), ('current', 0.021), ('association', 0.02), ('pb', 0.02), ('reordering', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

Author: Deyi Xiong ; Min Zhang ; Haizhou Li

Abstract: In this paper, with a belief that a language model that embraces a larger context provides better prediction ability, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We integrate the two proposed models into phrase-based statistical machine translation and conduct experiments on large-scale training data to investigate their effectiveness. Our experimental results show that both models are able to significantly improve transla- , tion quality and collectively achieve up to 1 BLEU point over a competitive baseline.

2 0.24033235 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

Author: Shasha Liao ; Ralph Grishman

Abstract: Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not enough. In this paper, we present a novel self-training strategy, which uses Information Retrieval (IR) to collect a cluster of related documents as the resource for bootstrapping. Also, based on the particular characteristics of this corpus, global inference is applied to provide more confident and informative data selection. We compare this approach to self-training on a normal newswire corpus and show that IR can provide a better corpus for bootstrapping and that global inference can further improve instance selection. We obtain gains of 1.7% in trigger labeling and 2.3% in role labeling through IR and an additional 1.1% in trigger labeling and 1.3% in role labeling by applying global inference. 1

3 0.20419651 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

Author: Svetlana Kiritchenko ; Colin Cherry

Abstract: The automatic coding of clinical documents is an important task for today’s healthcare providers. Though it can be viewed as multi-label document classification, the coding problem has the interesting property that most code assignments can be supported by a single phrase found in the input document. We propose a Lexically-Triggered Hidden Markov Model (LT-HMM) that leverages these phrases to improve coding accuracy. The LT-HMM works in two stages: first, a lexical match is performed against a term dictionary to collect a set of candidate codes for a document. Next, a discriminative HMM selects the best subset of codes to assign to the document by tagging candidates as present or absent. By confirming codes proposed by a dictionary, the LT-HMM can share features across codes, enabling strong performance even on rare codes. In fact, we are able to recover codes that do not occur in the training set at all. Our approach achieves the best ever performance on the 2007 Medical NLP Challenge test set, with an F-measure of 89.84.

4 0.12977675 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

Author: Yee Seng Chan ; Dan Roth

Abstract: In this paper, we observe that there exists a second dimension to the relation extraction (RE) problem that is orthogonal to the relation type dimension. We show that most of these second dimensional structures are relatively constrained and not difficult to identify. We propose a novel algorithmic approach to RE that starts by first identifying these structures and then, within these, identifying the semantic type of the relation. In the real RE problem where relation arguments need to be identified, exploiting these structures also allows reducing pipelined propagated errors. We show that this RE framework provides significant improvement in RE performance.

5 0.11542336 328 acl-2011-Using Cross-Entity Inference to Improve Event Extraction

Author: Yu Hong ; Jianfeng Zhang ; Bin Ma ; Jianmin Yao ; Guodong Zhou ; Qiaoming Zhu

Abstract: Event extraction is the task of detecting certain specified types of events that are mentioned in the source language data. The state-of-the-art research on the task is transductive inference (e.g. cross-event inference). In this paper, we propose a new method of event extraction by well using cross-entity inference. In contrast to previous inference methods, we regard entitytype consistency as key feature to predict event mentions. We adopt this inference method to improve the traditional sentence-level event extraction system. Experiments show that we can get 8.6% gain in trigger (event) identification, and more than 11.8% gain for argument (role) classification in ACE event extraction. 1

6 0.11008333 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

7 0.10538402 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

8 0.10157692 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

9 0.10135676 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

10 0.094917476 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

11 0.086627237 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

12 0.086564176 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

13 0.086179495 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

14 0.084405802 266 acl-2011-Reordering with Source Language Collocations

15 0.082608774 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

16 0.081546105 142 acl-2011-Generalized Interpolation in Decision Tree LM

17 0.076767437 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

18 0.073746346 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation

19 0.072501019 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

20 0.072208606 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.179), (1, -0.092), (2, 0.004), (3, 0.036), (4, 0.114), (5, 0.058), (6, -0.073), (7, -0.044), (8, 0.067), (9, 0.046), (10, -0.013), (11, -0.026), (12, 0.007), (13, -0.017), (14, 0.027), (15, 0.035), (16, -0.034), (17, 0.008), (18, -0.013), (19, -0.013), (20, 0.006), (21, -0.055), (22, 0.12), (23, -0.04), (24, -0.04), (25, -0.017), (26, 0.07), (27, 0.063), (28, -0.086), (29, 0.019), (30, 0.07), (31, -0.006), (32, 0.077), (33, -0.026), (34, -0.051), (35, -0.036), (36, 0.034), (37, -0.089), (38, 0.084), (39, 0.024), (40, 0.006), (41, 0.019), (42, 0.035), (43, 0.04), (44, 0.055), (45, -0.087), (46, -0.042), (47, -0.015), (48, -0.182), (49, 0.178)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.89529723 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

Author: Deyi Xiong ; Min Zhang ; Haizhou Li

Abstract: In this paper, with a belief that a language model that embraces a larger context provides better prediction ability, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We integrate the two proposed models into phrase-based statistical machine translation and conduct experiments on large-scale training data to investigate their effectiveness. Our experimental results show that both models are able to significantly improve transla- , tion quality and collectively achieve up to 1 BLEU point over a competitive baseline.

2 0.69996953 209 acl-2011-Lexically-Triggered Hidden Markov Models for Clinical Document Coding

Author: Svetlana Kiritchenko ; Colin Cherry

Abstract: The automatic coding of clinical documents is an important task for today’s healthcare providers. Though it can be viewed as multi-label document classification, the coding problem has the interesting property that most code assignments can be supported by a single phrase found in the input document. We propose a Lexically-Triggered Hidden Markov Model (LT-HMM) that leverages these phrases to improve coding accuracy. The LT-HMM works in two stages: first, a lexical match is performed against a term dictionary to collect a set of candidate codes for a document. Next, a discriminative HMM selects the best subset of codes to assign to the document by tagging candidates as present or absent. By confirming codes proposed by a dictionary, the LT-HMM can share features across codes, enabling strong performance even on rare codes. In fact, we are able to recover codes that do not occur in the training set at all. Our approach achieves the best ever performance on the 2007 Medical NLP Challenge test set, with an F-measure of 89.84.

3 0.57524931 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

Author: Nan Duan ; Mu Li ; Ming Zhou

Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.

4 0.55622202 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

Author: Nadir Durrani ; Helmut Schmid ; Alexander Fraser

Abstract: We present a novel machine translation model which models translation by a linear sequence of operations. In contrast to the “N-gram” model, this sequence includes not only translation but also reordering operations. Key ideas of our model are (i) a new reordering approach which better restricts the position to which a word or phrase can be moved, and is able to handle short and long distance reorderings in a unified way, and (ii) a joint sequence model for the translation and reordering probabilities which is more flexible than standard phrase-based MT. We observe statistically significant improvements in BLEU over Moses for German-to-English and Spanish-to-English tasks, and comparable results for a French-to-English task.

5 0.5553996 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

Author: Ming Tan ; Wenli Zhou ; Lei Zheng ; Shaojun Wang

Abstract: This paper presents an attempt at building a large scale distributed composite language model that simultaneously accounts for local word lexical information, mid-range sentence syntactic structure, and long-span document semantic content under a directed Markov random field paradigm. The composite language model has been trained by performing a convergent N-best list approximate EM algorithm that has linear time complexity and a followup EM algorithm to improve word prediction power on corpora with up to a billion tokens and stored on a supercomputer. The large scale distributed composite language model gives drastic perplexity reduction over ngrams and achieves significantly better translation quality measured by the BLEU score and “readability” when applied to the task of re-ranking the N-best list from a state-of-the- art parsing-based machine translation system.

6 0.53388703 266 acl-2011-Reordering with Source Language Collocations

7 0.52565193 263 acl-2011-Reordering Constraint Based on Document-Level Context

8 0.52121258 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives

9 0.51200771 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

10 0.50006986 165 acl-2011-Improving Classification of Medical Assertions in Clinical Notes

11 0.48801845 220 acl-2011-Minimum Bayes-risk System Combination

12 0.46795028 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

13 0.46405724 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

14 0.46187758 321 acl-2011-Unsupervised Discovery of Rhyme Schemes

15 0.45864496 100 acl-2011-Discriminative Feature-Tied Mixture Modeling for Statistical Machine Translation

16 0.44962245 313 acl-2011-Two Easy Improvements to Lexical Weighting

17 0.44630748 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

18 0.44589743 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

19 0.44292182 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

20 0.4347744 35 acl-2011-An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.025), (17, 0.098), (26, 0.028), (31, 0.024), (37, 0.093), (39, 0.041), (41, 0.059), (55, 0.047), (59, 0.034), (72, 0.043), (79, 0.164), (88, 0.013), (91, 0.049), (96, 0.153), (97, 0.019), (98, 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83112466 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

Author: Deyi Xiong ; Min Zhang ; Haizhou Li

Abstract: In this paper, with a belief that a language model that embraces a larger context provides better prediction ability, we present two extensions to standard n-gram language models in statistical machine translation: a backward language model that augments the conventional forward language model, and a mutual information trigger model which captures long-distance dependencies that go beyond the scope of standard n-gram language models. We integrate the two proposed models into phrase-based statistical machine translation and conduct experiments on large-scale training data to investigate their effectiveness. Our experimental results show that both models are able to significantly improve transla- , tion quality and collectively achieve up to 1 BLEU point over a competitive baseline.

2 0.80049533 198 acl-2011-Latent Semantic Word Sense Induction and Disambiguation

Author: Tim Van de Cruys ; Marianna Apidianaki

Abstract: In this paper, we present a unified model for the automatic induction of word senses from text, and the subsequent disambiguation of particular word instances using the automatically extracted sense inventory. The induction step and the disambiguation step are based on the same principle: words and contexts are mapped to a limited number of topical dimensions in a latent semantic word space. The intuition is that a particular sense is associated with a particular topic, so that different senses can be discriminated through their association with particular topical dimensions; in a similar vein, a particular instance of a word can be disambiguated by determining its most important topical dimensions. The model is evaluated on the SEMEVAL-20 10 word sense induction and disambiguation task, on which it reaches stateof-the-art results.

3 0.79482192 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks

Author: Alla Rozovskaya ; Dan Roth

Abstract: We consider the problem of correcting errors made by English as a Second Language (ESL) writers and address two issues that are essential to making progress in ESL error correction - algorithm selection and model adaptation to the first language of the ESL learner. A variety of learning algorithms have been applied to correct ESL mistakes, but often comparisons were made between incomparable data sets. We conduct an extensive, fair comparison of four popular learning methods for the task, reversing conclusions from earlier evaluations. Our results hold for different training sets, genres, and feature sets. A second key issue in ESL error correction is the adaptation of a model to the first language ofthe writer. Errors made by non-native speakers exhibit certain regularities and, as we show, models perform much better when they use knowledge about error patterns of the nonnative writers. We propose a novel way to adapt a learned algorithm to the first language of the writer that is both cheaper to implement and performs better than other adaptation methods.

4 0.77975839 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico

Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.

5 0.77875489 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld

Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.

6 0.77858251 311 acl-2011-Translationese and Its Dialects

7 0.77821505 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

8 0.77321512 44 acl-2011-An exponential translation model for target language morphology

9 0.77159899 141 acl-2011-Gappy Phrasal Alignment By Agreement

10 0.76948857 277 acl-2011-Semi-supervised Relation Extraction with Large-scale Word Clustering

11 0.76885355 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora

12 0.76793516 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

13 0.76755059 30 acl-2011-Adjoining Tree-to-String Translation

14 0.76723462 254 acl-2011-Putting it Simply: a Context-Aware Approach to Lexical Simplification

15 0.76719832 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

16 0.76696694 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

17 0.76678431 65 acl-2011-Can Document Selection Help Semi-supervised Learning? A Case Study On Event Extraction

18 0.76643693 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition

19 0.7663871 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

20 0.76637912 193 acl-2011-Language-independent compound splitting with morphological operations