acl acl2013 acl2013-35 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Kevin Duh ; Graham Neubig ; Katsuhito Sudoh ; Hajime Tsukada
Abstract: Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains have been demonstrated in previous works, which employ standard ngram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i.e. 0.1 to 1.7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.
Reference: text
sentIndex sentText sentNum sentScore
1 jp s Abstract Data selection is an effective approach to domain adaptation in statistical machine translation. [sent-6, score-0.326]
2 The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. [sent-7, score-0.14]
3 Substantial gains have been demonstrated in previous works, which employ standard ngram language models. [sent-8, score-0.072]
4 Here, we explore the use of neural language models for data selection. [sent-9, score-0.346]
5 We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. [sent-10, score-0.646]
6 In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i. [sent-11, score-0.455]
7 7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams. [sent-15, score-0.072]
8 1 Introduction A perennial challenge in building Statistical Machine Translation (SMT) systems is the dearth of high-quality bitext in the domain of interest. [sent-16, score-0.154]
9 An effective and practical solution is adaptation data selection: the idea is to use language models (LMs) trained on in-domain text to select similar sentences from large general-domain corpora. [sent-17, score-0.202]
10 The selected sentences are then incorporated into the SMT training data. [sent-18, score-0.099]
11 j p Although previous works in data selection (Axelrod et al. [sent-29, score-0.091]
12 , 2008) have shown substantial gains, we suspect that the commonly-used n-gram LMs may be sub-optimal. [sent-31, score-0.104]
13 The small size of the in-domain text implies that a large percentage of generaldomain sentences will contain words not observed in the LM training data. [sent-32, score-0.201]
14 In fact, as many as 60% of general-domain sentences contain at least one unknown word in our experiments. [sent-33, score-0.096]
15 Although the LM probabilities of these sentences could still be computed by resorting to back-off and other smoothing techniques, a natural question remains: will alternative, more robust LMs do better? [sent-34, score-0.145]
16 We hypothesize that the neural language model (Bengio et al. [sent-35, score-0.415]
17 , 2003) is a viable alternative, since its continuous vector representation of words is well-suited for modeling sentences with frequent unknown words, providing smooth probability es- timates of unseen but similar contexts. [sent-36, score-0.454]
18 To the best of our knowledge, this paper is the first work that examines neural LMs for adaptation data selection. [sent-40, score-0.512]
19 2 Data Selection Method We employ the data selection method of (Axelrod et al. [sent-41, score-0.091]
20 The intuition is to select general-domain sentences that are similar to indomain text, while being dis-similar to the average general-domain text. [sent-43, score-0.136]
21 To do so, one defines the score of an generaldomain sentence pair (e, f) as: [INE(e) GENE(e)] + [INF(f) GENF(f)] (1) where INE(e) is the length-normalized crossentropy of e on the English in-domain LM. [sent-44, score-0.224]
22 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioinngauli Lsitnicgsu,i psatgices 678–683, Figure 1: Recur ent neural LM. [sent-47, score-0.346]
23 1 and those with scores lower than some empiricallychosen threshold are added to the bitext for translation model training. [sent-51, score-0.085]
24 )c ponreddiiticotn wedo on t bhey context (w(t−1) , w(t−2) , . [sent-61, score-0.069]
25 But when the contceoxntt eisx rare or 1c)o,nwta(itn−s 2u)n,k. [sent-65, score-0.128]
26 Bn words, n-grams are forced to back-off to lower-order models, e. [sent-69, score-0.048]
27 These backoffs are unfortunately very frequent in adaptation data selection. [sent-72, score-0.184]
28 Neural LMs, in contrast, model word probabilities using continuous vector representations. [sent-73, score-0.225]
29 Figure 1 shows a type of neural LMs called recurrent neural networks (Mikolov et al. [sent-74, score-0.885]
30 This is a continuous vector of dimension |S| whose elements are predicted by tdhime previous |w worhdo w(t 1) asn adr previous dsta btey s(t 1). [sent-83, score-0.281]
31 uTshi ws oisr dro wbu(tst −to rare cdon ptreexvtiso u besca stuastee sco(ntt −in 1uo)u. [sent-84, score-0.185]
32 s T representations e rnaarbele co sharing oecfa sutasetistical strength between similar contexts. [sent-85, score-0.074]
33 Bengio (2009) shows that such representations are better than multinomials in alleviating sparsity issues. [sent-86, score-0.108]
34 − − − − − 1Another major type of neural LMs are the so-called feed-forward networks (Bengio et al. [sent-87, score-0.446]
35 Both types of neural LMs have seen many improvements recently, in terms of computational scal- ability (Le et al. [sent-90, score-0.346]
36 We focus on recurrent networks here since there are fewer hyper-parameters and its ability to model infinite context using recursion is theoretically attractive. [sent-94, score-0.287]
37 But we note that feedforward networks are just as viable. [sent-95, score-0.169]
38 Now, given state vector s(t), we can predict the probability of the current word. [sent-96, score-0.059]
39 w|W| (t)] (2) wk(t) = gXj|S=|0sj(t)Vkj (3) sj(t)=fi|X=W0|wi(t − 1)Uji+iX|0S=|0si0(t − 1)Aji0 (4) Here, w(t) is viewed as a vector of dimension |W| (vocabulary size) where each element wk (t) represents bthulea probability orfe tehaec hk e-tlhe vocabulary item at sentence position t. [sent-103, score-0.425]
40 The function g(zk) = ezk / Pk ezk is a softmax function that ensures the neuraPl LM outputs are proper probabilities, and f(z) = 1/(1 + e−z) is a sigmoid activation that induces the non-linearity critical to the neural network’s expressive power. [sent-104, score-0.842]
wordName wordTfidf (topN-words)
[('lms', 0.453), ('neural', 0.346), ('ntt', 0.19), ('ezk', 0.155), ('generaldomain', 0.155), ('genf', 0.155), ('bengio', 0.136), ('lm', 0.133), ('wk', 0.127), ('continuous', 0.121), ('jp', 0.12), ('sudoh', 0.119), ('adaptation', 0.115), ('mikolov', 0.113), ('viable', 0.109), ('axelrod', 0.105), ('gene', 0.105), ('duh', 0.101), ('networks', 0.1), ('neubig', 0.098), ('schwenk', 0.095), ('recurrent', 0.093), ('selection', 0.091), ('ine', 0.088), ('inf', 0.087), ('bitext', 0.085), ('co', 0.074), ('gains', 0.072), ('smt', 0.071), ('hypothesize', 0.069), ('backoffs', 0.069), ('bhey', 0.069), ('crossentropy', 0.069), ('dearth', 0.069), ('eisx', 0.069), ('feedforward', 0.069), ('orfe', 0.069), ('rib', 0.069), ('tehaec', 0.069), ('timates', 0.069), ('wdi', 0.069), ('japan', 0.068), ('ikoma', 0.063), ('takayama', 0.063), ('alexandrescu', 0.063), ('dow', 0.063), ('dro', 0.063), ('hikaridai', 0.063), ('nakamura', 0.063), ('oisr', 0.063), ('tst', 0.063), ('compress', 0.06), ('haddow', 0.06), ('katsuhito', 0.06), ('sco', 0.06), ('stt', 0.06), ('twhe', 0.06), ('zk', 0.06), ('rare', 0.059), ('vector', 0.059), ('ime', 0.057), ('multinomials', 0.054), ('nara', 0.054), ('alleviating', 0.054), ('conventionally', 0.054), ('resorting', 0.054), ('softmax', 0.054), ('dimension', 0.053), ('incorporated', 0.053), ('suspect', 0.052), ('itn', 0.052), ('substantial', 0.052), ('recursion', 0.051), ('tsukada', 0.051), ('examines', 0.051), ('kirchhoff', 0.051), ('prevalent', 0.051), ('unknown', 0.05), ('indomain', 0.049), ('forced', 0.048), ('asn', 0.048), ('hajime', 0.048), ('hk', 0.048), ('sentences', 0.046), ('probabilities', 0.045), ('activation', 0.045), ('ix', 0.045), ('bn', 0.045), ('corporation', 0.045), ('sigmoid', 0.045), ('contexts', 0.045), ('koehn', 0.044), ('infinite', 0.043), ('lewis', 0.043), ('om', 0.043), ('pk', 0.043), ('induces', 0.042), ('graham', 0.042), ('select', 0.041), ('kyoto', 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
Author: Kevin Duh ; Graham Neubig ; Katsuhito Sudoh ; Hajime Tsukada
Abstract: Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains have been demonstrated in previous works, which employ standard ngram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i.e. 0.1 to 1.7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.
2 0.26447487 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
Author: Yuki Arase ; Ming Zhou
Abstract: We propose a method for automatically detecting low-quality Web-text translated by statistical machine translation (SMT) systems. We focus on the phrase salad phenomenon that is observed in existing SMT results and propose a set of computationally inexpensive features to effectively detect such machine-translated sentences from a large-scale Web-mined text. Unlike previous approaches that require bilingual data, our method uses only monolingual text as input; therefore it is applicable for refining data produced by a variety of Web-mining activities. Evaluation results show that the proposed method achieves an accuracy of 95.8% for sentences and 80.6% for text in noisy Web pages.
3 0.20215482 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
Author: lemao liu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao
Abstract: Most statistical machine translation (SMT) systems are modeled using a loglinear framework. Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential. A neural network is a reasonable method to address these pitfalls. However, modeling SMT with a neural network is not trivial, especially when taking the decoding efficiency into consideration. In this paper, we propose a variant of a neural network, i.e. additive neural networks, for SMT to go beyond the log-linear translation model. In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector. Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and Japanese-to-English translation tasks.
4 0.1721358 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu
Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.
Author: Heike Adel ; Ngoc Thang Vu ; Tanja Schultz
Abstract: In this paper, we investigate the application of recurrent neural network language models (RNNLM) and factored language models (FLM) to the task of language modeling for Code-Switching speech. We present a way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity. Furthermore, a comparison between RNNLMs and FLMs and a detailed analysis of perplexities on the different backoff levels are performed. Finally, we show that recurrent neural networks and factored language models can . be combined using linear interpolation to achieve the best performance. The final combined language model provides 37.8% relative improvement in terms of perplexity on the SEAME development set and a relative improvement of 32.7% on the evaluation set compared to the traditional n-gram language model. Index Terms: multilingual speech processing, code switching, language modeling, recurrent neural networks, factored language models
6 0.097626381 275 acl-2013-Parsing with Compositional Vector Grammars
7 0.089319229 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
8 0.088199921 34 acl-2013-Accurate Word Segmentation using Transliteration and Language Model Projection
9 0.086625889 317 acl-2013-Sentence Level Dialect Identification in Arabic
10 0.079422385 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation
11 0.074416727 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
12 0.074144669 219 acl-2013-Learning Entity Representation for Entity Disambiguation
13 0.071244255 255 acl-2013-Name-aware Machine Translation
14 0.070735805 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
15 0.067942098 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
16 0.065687977 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation
17 0.061585836 294 acl-2013-Re-embedding words
18 0.058044106 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
19 0.055106301 291 acl-2013-Question Answering Using Enhanced Lexical Semantic Models
20 0.052993692 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
topicId topicWeight
[(0, 0.139), (1, -0.064), (2, 0.088), (3, 0.041), (4, -0.002), (5, -0.029), (6, 0.028), (7, -0.024), (8, -0.02), (9, 0.05), (10, -0.007), (11, -0.04), (12, 0.039), (13, -0.069), (14, -0.059), (15, 0.09), (16, -0.107), (17, -0.026), (18, -0.025), (19, -0.144), (20, 0.098), (21, -0.067), (22, -0.028), (23, 0.014), (24, 0.098), (25, -0.021), (26, 0.127), (27, -0.1), (28, 0.114), (29, -0.069), (30, -0.115), (31, -0.02), (32, -0.067), (33, -0.117), (34, 0.072), (35, -0.039), (36, 0.063), (37, -0.039), (38, -0.024), (39, -0.054), (40, -0.076), (41, 0.03), (42, 0.06), (43, -0.005), (44, -0.036), (45, -0.02), (46, -0.127), (47, 0.073), (48, -0.051), (49, -0.081)]
simIndex simValue paperId paperTitle
same-paper 1 0.9474988 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
Author: Kevin Duh ; Graham Neubig ; Katsuhito Sudoh ; Hajime Tsukada
Abstract: Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains have been demonstrated in previous works, which employ standard ngram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i.e. 0.1 to 1.7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.
Author: Heike Adel ; Ngoc Thang Vu ; Tanja Schultz
Abstract: In this paper, we investigate the application of recurrent neural network language models (RNNLM) and factored language models (FLM) to the task of language modeling for Code-Switching speech. We present a way to integrate partof-speech tags (POS) and language information (LID) into these models which leads to significant improvements in terms of perplexity. Furthermore, a comparison between RNNLMs and FLMs and a detailed analysis of perplexities on the different backoff levels are performed. Finally, we show that recurrent neural networks and factored language models can . be combined using linear interpolation to achieve the best performance. The final combined language model provides 37.8% relative improvement in terms of perplexity on the SEAME development set and a relative improvement of 32.7% on the evaluation set compared to the traditional n-gram language model. Index Terms: multilingual speech processing, code switching, language modeling, recurrent neural networks, factored language models
3 0.68196017 38 acl-2013-Additive Neural Networks for Statistical Machine Translation
Author: lemao liu ; Taro Watanabe ; Eiichiro Sumita ; Tiejun Zhao
Abstract: Most statistical machine translation (SMT) systems are modeled using a loglinear framework. Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential. A neural network is a reasonable method to address these pitfalls. However, modeling SMT with a neural network is not trivial, especially when taking the decoding efficiency into consideration. In this paper, we propose a variant of a neural network, i.e. additive neural networks, for SMT to go beyond the log-linear translation model. In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector. Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and Japanese-to-English translation tasks.
4 0.67157996 216 acl-2013-Large tagset labeling using Feed Forward Neural Networks. Case study on Romanian Language
Author: Tiberiu Boros ; Radu Ion ; Dan Tufis
Abstract: Radu Ion Research Institute for ?????????? ???????????? ?????? Dr?????????? Romanian Academy radu@ racai . ro Dan Tufi? Research Institute for ?????????? ???????????? ?????? Dr?????????? Romanian Academy tufi s @ racai . ro Networks (Marques and Lopes, 1996) and Conditional Random Fields (CRF) (Lafferty et Standard methods for part-of-speech tagging suffer from data sparseness when used on highly inflectional languages (which require large lexical tagset inventories). For this reason, a number of alternative methods have been proposed over the years. One of the most successful methods used for this task, ?????? ?????? ??????? ??????, 1999), exploits a reduced set of tags derived by removing several recoverable features from the lexicon morpho-syntactic descriptions. A second phase is aimed at recovering the full set of morpho-syntactic features. In this paper we present an alternative method to Tiered Tagging, based on local optimizations with Neural Networks and we show how, by properly encoding the input sequence in a general Neural Network architecture, we achieve results similar to the Tiered Tagging methodology, significantly faster and without requiring extensive linguistic knowledge as implied by the previously mentioned method. 1
5 0.63855064 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
Author: Nan Yang ; Shujie Liu ; Mu Li ; Ming Zhou ; Nenghai Yu
Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.
6 0.57520074 294 acl-2013-Re-embedding words
7 0.53353709 235 acl-2013-Machine Translation Detection from Monolingual Web-Text
8 0.50581396 275 acl-2013-Parsing with Compositional Vector Grammars
9 0.48940828 254 acl-2013-Multimodal DBN for Predicting High-Quality Answers in cQA portals
10 0.48405433 308 acl-2013-Scalable Modified Kneser-Ney Language Model Estimation
11 0.46771121 219 acl-2013-Learning Entity Representation for Entity Disambiguation
12 0.45132133 349 acl-2013-The mathematics of language learning
13 0.44158158 390 acl-2013-Word surprisal predicts N400 amplitude during reading
14 0.43822345 34 acl-2013-Accurate Word Segmentation using Transliteration and Language Model Projection
16 0.40365922 325 acl-2013-Smoothed marginal distribution constraints for language modeling
17 0.38255054 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
18 0.3801448 317 acl-2013-Sentence Level Dialect Identification in Arabic
19 0.37597096 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation
20 0.3747797 122 acl-2013-Discriminative Approach to Fill-in-the-Blank Quiz Generation for Language Learners
topicId topicWeight
[(0, 0.04), (6, 0.032), (11, 0.053), (24, 0.039), (26, 0.07), (29, 0.025), (35, 0.067), (42, 0.042), (48, 0.045), (51, 0.398), (70, 0.036), (88, 0.022), (95, 0.051)]
simIndex simValue paperId paperTitle
same-paper 1 0.75081438 35 acl-2013-Adaptation Data Selection using Neural Language Models: Experiments in Machine Translation
Author: Kevin Duh ; Graham Neubig ; Katsuhito Sudoh ; Hajime Tsukada
Abstract: Data selection is an effective approach to domain adaptation in statistical machine translation. The idea is to use language models trained on small in-domain text to select similar sentences from large general-domain corpora, which are then incorporated into the training data. Substantial gains have been demonstrated in previous works, which employ standard ngram language models. Here, we explore the use of neural language models for data selection. We hypothesize that the continuous vector representation of words in neural language models makes them more effective than n-grams for modeling un- known word contexts, which are prevalent in general-domain text. In a comprehensive evaluation of 4 language pairs (English to German, French, Russian, Spanish), we found that neural language models are indeed viable tools for data selection: while the improvements are varied (i.e. 0.1 to 1.7 gains in BLEU), they are fast to train on small in-domain data and can sometimes substantially outperform conventional n-grams.
2 0.63838845 310 acl-2013-Semantic Frames to Predict Stock Price Movement
Author: Boyi Xie ; Rebecca J. Passonneau ; Leon Wu ; German G. Creamer
Abstract: Semantic frames are a rich linguistic resource. There has been much work on semantic frame parsers, but less that applies them to general NLP problems. We address a task to predict change in stock price from financial news. Semantic frames help to generalize from specific sentences to scenarios, and to detect the (positive or negative) roles of specific companies. We introduce a novel tree representation, and use it to train predictive models with tree kernels using support vector machines. Our experiments test multiple text representations on two binary classification tasks, change of price and polarity. Experiments show that features derived from semantic frame parsing have significantly better performance across years on the polarity task.
3 0.45742989 201 acl-2013-Integrating Translation Memory into Phrase-Based Machine Translation during Decoding
Author: Kun Wang ; Chengqing Zong ; Keh-Yih Su
Abstract: Since statistical machine translation (SMT) and translation memory (TM) complement each other in matched and unmatched regions, integrated models are proposed in this paper to incorporate TM information into phrase-based SMT. Unlike previous multi-stage pipeline approaches, which directly merge TM result into the final output, the proposed models refer to the corresponding TM information associated with each phrase at SMT decoding. On a Chinese–English TM database, our experiments show that the proposed integrated Model-III is significantly better than either the SMT or the TM systems when the fuzzy match score is above 0.4. Furthermore, integrated Model-III achieves overall 3.48 BLEU points improvement and 2.62 TER points reduction in comparison with the pure SMT system. Be- . sides, the proposed models also outperform previous approaches significantly.
4 0.4512645 383 acl-2013-Vector Space Model for Adaptation in Statistical Machine Translation
Author: Boxing Chen ; Roland Kuhn ; George Foster
Abstract: This paper proposes a new approach to domain adaptation in statistical machine translation (SMT) based on a vector space model (VSM). The general idea is first to create a vector profile for the in-domain development (“dev”) set. This profile might, for instance, be a vector with a dimensionality equal to the number of training subcorpora; each entry in the vector reflects the contribution of a particular subcorpus to all the phrase pairs that can be extracted from the dev set. Then, for each phrase pair extracted from the training data, we create a vector with features defined in the same way, and calculate its similarity score with the vector representing the dev set. Thus, we obtain a de- coding feature whose value represents the phrase pair’s closeness to the dev. This is a simple, computationally cheap form of instance weighting for phrase pairs. Experiments on large scale NIST evaluation data show improvements over strong baselines: +1.8 BLEU on Arabic to English and +1.4 BLEU on Chinese to English over a non-adapted baseline, and significant improvements in most circumstances over baselines with linear mixture model adaptation. An informal analysis suggests that VSM adaptation may help in making a good choice among words with the same meaning, on the basis of style and genre.
5 0.3561359 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
Author: Kai Liu ; Yajuan Lu ; Wenbin Jiang ; Qun Liu
Abstract: This paper describes a novel strategy for automatic induction of a monolingual dependency grammar under the guidance of bilingually-projected dependency. By moderately leveraging the dependency information projected from the parsed counterpart language, and simultaneously mining the underlying syntactic structure of the language considered, it effectively integrates the advantages of bilingual projection and unsupervised induction, so as to induce a monolingual grammar much better than previous models only using bilingual projection or unsupervised induction. We induced dependency gram- mar for five different languages under the guidance of dependency information projected from the parsed English translation, experiments show that the bilinguallyguided method achieves a significant improvement of 28.5% over the unsupervised baseline and 3.0% over the best projection baseline on average.
6 0.35577866 318 acl-2013-Sentiment Relevance
7 0.35053161 285 acl-2013-Propminer: A Workflow for Interactive Information Extraction and Exploration using Dependency Trees
9 0.34897262 7 acl-2013-A Lattice-based Framework for Joint Chinese Word Segmentation, POS Tagging and Parsing
10 0.34886277 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
11 0.34751227 275 acl-2013-Parsing with Compositional Vector Grammars
12 0.34717521 144 acl-2013-Explicit and Implicit Syntactic Features for Text Classification
13 0.34700805 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
14 0.34550998 82 acl-2013-Co-regularizing character-based and word-based models for semi-supervised Chinese word segmentation
15 0.34537542 295 acl-2013-Real-World Semi-Supervised Learning of POS-Taggers for Low-Resource Languages
16 0.34491634 225 acl-2013-Learning to Order Natural Language Texts
17 0.34445709 247 acl-2013-Modeling of term-distance and term-occurrence information for improving n-gram language model performance
18 0.34431481 159 acl-2013-Filling Knowledge Base Gaps for Distant Supervision of Relation Extraction
19 0.3439469 188 acl-2013-Identifying Sentiment Words Using an Optimization-based Model without Seed Words
20 0.34370676 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search