acl acl2013 acl2013-361 knowledge-graph by maker-knowledge-mining

361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers


Source: pdf

Author: Graham Neubig

Abstract: In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. Travatar is available under the LGPL at http : / /phont ron . com/t ravat ar

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 s Abstract In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. [sent-2, score-0.361]

2 It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. [sent-3, score-0.197]

3 There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. [sent-4, score-0.184]

4 The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. [sent-5, score-0.119]

5 We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. [sent-6, score-0.418]

6 As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. [sent-7, score-0.254]

7 com/t ravat ar 1 Introduction One of the recent trends in statistical machine translation (SMT) is the popularity of models that use syntactic information to help solve problems of long-distance reordering between the source and target language text. [sent-9, score-0.438]

8 These techniques can be broadly divided into pre-ordering techniques, which first parse and reorder the source sentence into the target order before translating (Xia and . [sent-10, score-0.095]

9 , 2010b), and treebased decoding techniques, which take a tree or forest as input and choose the reordering and translation jointly (Yamada and Knight, 2001 ; Liu et al. [sent-13, score-0.599]

10 While pre-ordering is not able to consider both translation and reordering in a joint model, it is useful in that it is done before the actual translation process, so it can be performed with a conventional translation pipeline using a standard phrase-based decoder such as Moses (Koehn et al. [sent-16, score-1.133]

11 For tree-to-string systems, on the other hand, it is necessary to have available or create a decoder that is equipped with this functionality, which becomes a bottleneck in the research and development process. [sent-18, score-0.111]

12 In this demo paper, we describe Travatar, an open-source tree-to-string or forest-to-string translation system that can be used as a tool for translation using source-side syntax, and as a platform for research into syntax-based translation methods. [sent-19, score-0.81]

13 Travatar includes a fully documented training and testing regimen that was modeled around that of Moses, making it possible for users familiar with Moses to get started with Travatar quickly. [sent-21, score-0.038]

14 The framework of the software is also designed to be extensible, so the toolkit is applicable for other tree-to-string transduction tasks. [sent-22, score-0.084]

15 In the evaluation of the decoder on EnglishJapanese machine translation, we perform a comparison to Moses’s phrase-based, hierarchicalphrase-based, and SCFG-based tree-to-string 91 Proce dinSgosfi oa,f tB huel 5g1arsita, An Anu gauls Mt 4e-e9ti n2g01 o3f. [sent-23, score-0.111]

16 c e2 A0s1s3oc Aiastsio cnia fotiron C fo mrp Cuotmatpiounta tlio Lninaglu Li sntgicusi,s ptaicgses 91–96, Figure 1: Tree-to-string translation SCFGs and tree transducers. [sent-25, score-0.361]

17 Based on the results, we find that treeto-string, and particularly forest-to-string, translation using Travatar provides competitive or superior accuracy to all of these techniques. [sent-27, score-0.307]

18 As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. [sent-28, score-0.254]

19 1 Overview Tree-to-string translation uses syntactic information to improve translation by first parsing the source sentence, then using this source-side parse tree to decide the translation and reordering of the input. [sent-30, score-1.122]

20 This method has several advantages, includ- ing efficiency of decoding, relatively easy handling of global reordering, and an intuitive representation of de-lexicalized rules that express general differences in order between the source and target languages. [sent-31, score-0.099]

21 Within tree-to-string translation there are two major methodologies, synchronous context-free grammars (Chiang, 2007), and tree transducers (Graehl and Knight, 2004). [sent-32, score-0.443]

22 An example of tree-to-string translation rules supported by SCFGs and tree transducers is shown in Figure 1. [sent-33, score-0.5]

23 In this example, the first rule is a simple multi-word noun phrase, the second example is an example of a delexicalized rule expressing translation from English SVO word order to Japanese SOV word order. [sent-34, score-0.59]

24 The third and fourth examples are translations of a verb, noun phrase, and prepositional phrase, where the third rule has the preposition attatched to the verb, and the fourth has the preposition attached to the noun. [sent-35, score-0.16]

25 For the SCFGs, it can be seen that on the source side of the rule, there are placeholders corresponding to syntactic phrases, and on the target side of the rule there corresponding placeholders that do not have a syntactic label. [sent-36, score-0.498]

26 On the other hand in the example of the translation rules using tree transducers, it can be seen that similar rules can be expressed, but the source rules are richer than simple SCFG rules, also including the internal structure of the parse tree. [sent-37, score-0.662]

27 This internal structure is important for achieving translation results faithful to the input parse. [sent-38, score-0.305]

28 In particular, the third and fourth rules show an intuitive example in which this internal structure can be important for translation. [sent-39, score-0.092]

29 Here the full tree structures demonstrate important differences in the attachment of the prepositional phrase to the verb or noun. [sent-40, score-0.132]

30 While this is one of the most difficult and important problems in syntactic parsing, the source side in the SCFG is identical, losing the ability to distinguish between the very information that parsers are designed to disambiguate. [sent-41, score-0.211]

31 In traditional tree-to-string translation methods, the translator uses a single one-best parse tree output by a syntactic parser, but parse errors have the potential to degrade the quality of translation. [sent-42, score-0.55]

32 An important advance in tree-to-string translation that helps ameliorate this difficulity is forest-to-string translation, which represents a large number of potential parses as a packed forest, allowing the translator to choose between these parses during the process of translation (Mi et al. [sent-43, score-0.575]

33 2 The State of Open Source Software There are a number of open-source software packages that support tree-to-string translation in the SCFG framework. [sent-46, score-0.27]

34 , 2012) support the annotation of source-side syntactic labels, and taking parse trees (or in the case of NiuTrans, forests) as input. [sent-49, score-0.101]

35 There are also a few other decoders that support other varieties of using source-side syntax to help improve translation or global reordering. [sent-50, score-0.355]

36 , 2010) supports the context-free-reordering/finitestate-translation framework described by Dyer and Resnik (2010). [sent-52, score-0.036]

37 , 2012) supports translation using head-driven 92 phrase structure grammars as described by Wu et al. [sent-54, score-0.306]

38 However, to our knowledge, while there is a general-purpose tool for tree automata in general (May and Knight, 2006), there is no open-source toolkit implementing the SMT pipeline in the tree transducer framework, despite it being a target of active research (Graehl and Knight, 2004; Liu et al. [sent-56, score-0.442]

39 3 The Travatar Machine Translation Toolkit In this section, we describe the overall framework of the Travatar decoder, following the order of the training pipeline. [sent-60, score-0.038]

40 1 Data Preprocessing This consists of parsing the source side sentence and tokenizing the target side sentences. [sent-62, score-0.14]

41 Travatar can decode input in the bracketed format of the Penn Treebank, or also in forest format. [sent-63, score-0.06]

42 There is documentation and scripts for using Travatar with several parsers for English, Chinese, and Japanese included with the toolkit. [sent-64, score-0.072]

43 2 Training Once the data has been pre-processed, a treeto-string model can be trained with the training pipeline included in the toolkit. [sent-66, score-0.119]

44 Like the training pipeline for Moses, there is a single script that performs alignment, rule extraction, scoring, and parameter initialization. [sent-67, score-0.279]

45 Language model training can be performed using a separate toolkit, and instructions are provided in the documentation. [sent-68, score-0.091]

46 For word alignment, the Travatar training pipeline is integrated with GIZA++ (Och and Ney, 2003) by default, but can also use alignments from any other aligner. [sent-69, score-0.169]

47 Rule extraction is performed using the GHKM algorithm (Galley et al. [sent-70, score-0.053]

48 , 2006) and its extension to rule extraction from forests (Mi and Huang, 2008). [sent-71, score-0.218]

49 There are also a number of options implemented, including rule composition, attachment of nullaligned target words at either the highest point in the tree, or at every possible position, and left and right binarization (Galley et al. [sent-72, score-0.346]

50 Rule scoring uses a standard set of forward and backward conditional probabilities, lexicalized translation probabilities, phrase frequency, and word and phrase counts. [sent-75, score-0.27]

51 3 Decoding Given a translation model Travatar is able to decode parsed input sentences to generate translations. [sent-78, score-0.27]

52 The decoding itself is performed using the bottom-up forest-to-string decoding algorithm of Mi et al. [sent-79, score-0.253]

53 Beam-search implemented using cube pruning (Chiang, 2007) is used to adjust the trade-off between search speed and translation accuracy. [sent-81, score-0.345]

54 The source side of the translation model is stored using a space-efficient trie data structure (Yata, 2012) implemented using the marisa-trie toolkit. [sent-82, score-0.412]

55 1 Rule lookup is performed using left-to- right depth-first search, which can be implemented as prefix lookup in the trie for efficient search. [sent-83, score-0.18]

56 The language model storage uses the implementation in KenLM (Heafield, 2011), and particularly the implementation that maintains left and right language model states for syntax-based MT (Heafield et al. [sent-84, score-0.074]

57 4 Tuning and Evaluation For tuning the parameters of the model, Travatar natively supports minimum error rate training (MERT) (Och, 2003) and is extension to hypergraphs (Kumar et al. [sent-87, score-0.227]

58 This tuning can be performed for evaluation measures including BLEU (Papineni et al. [sent-89, score-0.121]

59 There is also a preliminary implementation of online learning methods such as the structured perceptron algorithm (Collins, 2002), and regularized structured SVMs trained using FOBOS (Duchi and Singer, 2009). [sent-92, score-0.037]

60 The Travatar toolkit also provides an evaluation program that can calculate the scores of translation output according to various evaluation measures, and calculate the significance of differences between systems using bootstrap resampling (Koehn, 2004). [sent-94, score-0.389]

61 1 Experimental Setup In our experiments, we validated the performance of the translation toolkit on English-Japanese translation of Wikipedia articles, as specified by the Kyoto Free Translation Task (KFTT) (Neubig, 2011). [sent-98, score-0.624]

62 Training used the 405k sentences of training data of length under 60, tuning was performed on the development set, and testing was performed on the test set using the BLEU and RIBES measures. [sent-99, score-0.212]

63 As baseline systems we use the Moses2 im- plementation of phrase-based (MOSES-PBMT), hierarchical phrase-based (MOSES-HIER), and treeto-string translation (MOSES-T2S). [sent-100, score-0.27]

64 Alignment for each system was performed using either GIZA++3 or Nile4 with main results reported for the aligner that achieved the best accuracy on the dev set, and a further comparison shown in the auxiliary experiments in Section 4. [sent-103, score-0.212]

65 Tuning was performed with minimum error rate training to maximize BLEU over 200-best lists. [sent-105, score-0.132]

66 Tokenization was performed with the Stanford tokenizer for English, and the KyTea word segmenter (Neubig et al. [sent-106, score-0.053]

67 Rule extraction was performed using onebest trees, which were right-binarized, and lower- cased post-parsing. [sent-109, score-0.053]

68 For Travatar, composed rules of up to size 4 and a maximum of 2 non-terminals and 7 terminals for each rule were used. [sent-110, score-0.217]

69 Decoding was performed over either one-best trees (TRAV-T2S), or over forests including all edges included in the parser 200-best list (TRAV-F2S), and a pop limit of 1000 hypotheses was used for cube 2http : / / st atmt . [sent-112, score-0.202]

70 com/p / nile / As Nile is a supervised aligner, we trained it on the alignments provided with the KFTT. [sent-117, score-0.189]

71 com/p / egret -par s e r/ table size, and speed in sentences per second for each system. [sent-120, score-0.161]

72 From these results we can see that the systems utilizing source-side syntax significantly outperform the PBMT and Hiero, validating the usefulness of source side syntax on the English-toJapanese task. [sent-126, score-0.179]

73 One reason for the slightly higher BLEU of MOSES-T2S is because Moses’s rule extraction algorithm is more liberal in its attachment of null-aligned words, resulting in a much larger rule table (52. [sent-128, score-0.361]

74 When using forest based decoding in TRAV-F2S, we see significant gains in accuracy over TRAV-T2S, with BLEU slightly and RIBES greatly exceeding that of MOSES-T2S. [sent-133, score-0.197]

75 3 Effect of Alignment/Parsing In addition, as auxiliary results, we present a comparison of Travatar’s tree-to-string and forest-tostring systems using different alignment methods and syntactic parsers to examine the results on translation (Table 2). [sent-135, score-0.524]

76 6 While we do not have labeled data to calculate parse accuracies with, Egret is a clone of the Berkeley parser, which has been reported to achieve higher accuracy than the Stanford parser on several domains (Kummerfeld et al. [sent-137, score-0.141]

77 From the translation results, we can see that STAN6http : //nlp . [sent-139, score-0.27]

78 shtml 94 several translation models (PBMT, Hiero, T2S, F2S), aligners (GIZA++, Nile), and parsers (Stanford, Egret). [sent-142, score-0.342]

79 T2S significantly underperforms EGRET-T2S, confirming that the effectiveness of the parser plays a large effect on the translation accuracy. [sent-143, score-0.321]

80 Next, we compared the unsupervised aligner GIZA++, with the supervised aligner Nile, which uses syntactic information to improve alignment accuracy (Riesa and Marcu, 2010). [sent-144, score-0.307]

81 With respect to translation accuracy, we found that for translation that does not use syntactic information, improvements in alignment do not necessarily increase translation accuracy, as has been noted by Ganchev et al. [sent-148, score-0.94]

82 However, for all tree-to-string systems, the improved alignments result in significant improvements in accuracy, showing that alignments are, in fact, important in our syntax-driven translation setup. [sent-150, score-0.37]

83 5 Conclusion and Future Directions In this paper, we introduced Travatar, an opensource toolkit for forest-to-string translation using tree transducers. [sent-151, score-0.445]

84 We hope this decoder will be useful to the research community as a test-bed for forest-to-string systems. [sent-152, score-0.111]

85 First, we plan to support advanced rule extraction techniques, such as fuller support for count regularization and forest-based rule extraction (Mi and Huang, 2008), and using the EM algorithm to choose attachments for null-aligned words (Galley et al. [sent-155, score-0.32]

86 , 2006) or the direction of rule binarization (Wang et al. [sent-156, score-0.203]

87 We also plan to incorporate advances in decoding to improve search speed (Huang and Mi, 2010). [sent-158, score-0.135]

88 Finally, we will provide better support of parallelization through the entire pipeline to increase the efficiency of training and decoding. [sent-160, score-0.119]

89 Hope and fear for discriminative training of statistical translation models. [sent-169, score-0.308]

90 Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. [sent-173, score-0.038]

91 cdec: A decoder, alignment, and learning framework for finite-state and context-free translation models. [sent-187, score-0.27]

92 Scalable inference and training of context-rich syntactic translation models. [sent-191, score-0.356]

93 Left language 95 model state for syntactic machine translation. [sent-207, score-0.048]

94 Automatic evaluation of translation quality for distant language pairs. [sent-226, score-0.27]

95 Head finalization: A simple reordering rule for SOV languages. [sent-231, score-0.238]

96 Efficient minimum error rate training and minimum Bayes-risk decoding for translation hypergraphs and lattices. [sent-246, score-0.534]

97 Parser showdown at the wall street corral: an empirical investigation of er- ror types in parser output. [sent-251, score-0.051]

98 Binarizing syntax trees to improve syntax-based machine translation accuracy. [sent-306, score-0.314]

99 Akamon: An open source toolkit for tree/forest-based statistical machine translation. [sent-316, score-0.126]

100 Niutrans: An open source toolkit for phrasebased and syntax-based machine translation. [sent-325, score-0.126]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('travatar', 0.599), ('translation', 0.27), ('rule', 0.16), ('nile', 0.139), ('egret', 0.126), ('ribes', 0.126), ('moses', 0.126), ('decoder', 0.111), ('graehl', 0.101), ('decoding', 0.1), ('neubig', 0.1), ('mi', 0.097), ('knight', 0.096), ('niutrans', 0.095), ('tree', 0.091), ('isozaki', 0.085), ('toolkit', 0.084), ('transducers', 0.082), ('alignment', 0.082), ('pipeline', 0.081), ('kevin', 0.08), ('reordering', 0.078), ('scfgs', 0.077), ('scfg', 0.075), ('parsers', 0.072), ('dyer', 0.071), ('aligner', 0.07), ('tuning', 0.068), ('bleu', 0.067), ('akamon', 0.063), ('kftt', 0.063), ('nullaligned', 0.063), ('pbmt', 0.063), ('haitao', 0.062), ('forest', 0.06), ('heafield', 0.06), ('giza', 0.059), ('forests', 0.058), ('automata', 0.058), ('huang', 0.057), ('rules', 0.057), ('riesa', 0.056), ('sov', 0.056), ('koehn', 0.053), ('parse', 0.053), ('performed', 0.053), ('auxiliary', 0.052), ('japanese', 0.052), ('jonathan', 0.052), ('graham', 0.052), ('galley', 0.052), ('placeholders', 0.051), ('trie', 0.051), ('englishjapanese', 0.051), ('cdec', 0.051), ('parser', 0.051), ('alignments', 0.05), ('side', 0.049), ('katsuhito', 0.048), ('sudoh', 0.048), ('kummerfeld', 0.048), ('xianchao', 0.048), ('syntactic', 0.048), ('demonstrations', 0.047), ('nara', 0.044), ('hypergraphs', 0.044), ('hiero', 0.044), ('syntax', 0.044), ('liang', 0.043), ('binarization', 0.043), ('zollmann', 0.043), ('kenlm', 0.043), ('duchi', 0.043), ('source', 0.042), ('decoders', 0.041), ('duh', 0.041), ('minimum', 0.041), ('attachment', 0.041), ('chiang', 0.041), ('matsuzaki', 0.04), ('cube', 0.04), ('options', 0.039), ('takuya', 0.039), ('hajime', 0.039), ('training', 0.038), ('lookup', 0.038), ('ganchev', 0.038), ('wu', 0.037), ('chris', 0.037), ('code', 0.037), ('transducer', 0.037), ('accuracy', 0.037), ('implementation', 0.037), ('stanford', 0.037), ('plans', 0.036), ('supports', 0.036), ('resampling', 0.035), ('translator', 0.035), ('internal', 0.035), ('speed', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

Author: Graham Neubig

Abstract: In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. Travatar is available under the LGPL at http : / /phont ron . com/t ravat ar

2 0.21959226 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

Author: Yang Liu

Abstract: We introduce a shift-reduce parsing algorithm for phrase-based string-todependency translation. As the algorithm generates dependency trees for partial translations left-to-right in decoding, it allows for efficient integration of both n-gram and dependency language models. To resolve conflicts in shift-reduce parsing, we propose a maximum entropy model trained on the derivation graph of training data. As our approach combines the merits of phrase-based and string-todependency models, it achieves significant improvements over the two baselines on the NIST Chinese-English datasets.

3 0.19927721 314 acl-2013-Semantic Roles for String to Tree Machine Translation

Author: Marzieh Bazrafshan ; Daniel Gildea

Abstract: We experiment with adding semantic role information to a string-to-tree machine translation system based on the rule extraction procedure of Galley et al. (2004). We compare methods based on augmenting the set of nonterminals by adding semantic role labels, and altering the rule extraction process to produce a separate set of rules for each predicate that encompass its entire predicate-argument structure. Our results demonstrate that the second approach is effective in increasing the quality of translations.

4 0.1886946 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

5 0.17268559 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Author: Fabienne Braune ; Nina Seemann ; Daniel Quernheim ; Andreas Maletti

Abstract: We present a new translation model integrating the shallow local multi bottomup tree transducer. We perform a largescale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English → German tlriannes olnati tohne tWasMk.T TA 2s0 an a Edndgitliisonha →l c Gonetrrmibauntion we make the developed software and complete tool-chain publicly available for further experimentation.

6 0.16690712 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation

7 0.16586111 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

8 0.16513124 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation

9 0.16383378 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

10 0.16348886 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

11 0.15376621 312 acl-2013-Semantic Parsing as Machine Translation

12 0.15127379 40 acl-2013-Advancements in Reordering Models for Statistical Machine Translation

13 0.1512191 38 acl-2013-Additive Neural Networks for Statistical Machine Translation

14 0.14785028 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation

15 0.14726138 101 acl-2013-Cut the noise: Mutually reinforcing reordering and alignments for improved machine translation

16 0.14188181 166 acl-2013-Generalized Reordering Rules for Improved SMT

17 0.13725023 330 acl-2013-Stem Translation with Affix-Based Rule Selection for Agglutinative Languages

18 0.13271621 255 acl-2013-Name-aware Machine Translation

19 0.12996063 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment

20 0.12686531 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.266), (1, -0.259), (2, 0.162), (3, 0.146), (4, -0.075), (5, 0.077), (6, 0.044), (7, -0.016), (8, 0.036), (9, 0.07), (10, -0.018), (11, 0.097), (12, 0.024), (13, 0.023), (14, 0.085), (15, 0.0), (16, 0.069), (17, 0.034), (18, 0.032), (19, 0.012), (20, -0.052), (21, 0.032), (22, -0.027), (23, -0.005), (24, -0.025), (25, -0.017), (26, 0.001), (27, -0.037), (28, -0.04), (29, 0.017), (30, 0.052), (31, -0.008), (32, -0.022), (33, -0.051), (34, 0.004), (35, 0.033), (36, -0.073), (37, 0.027), (38, 0.041), (39, -0.033), (40, -0.091), (41, 0.094), (42, 0.01), (43, 0.05), (44, 0.096), (45, 0.039), (46, -0.089), (47, -0.001), (48, -0.002), (49, 0.079)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95929432 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

Author: Graham Neubig

Abstract: In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. Travatar is available under the LGPL at http : / /phont ron . com/t ravat ar

2 0.9088921 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

Author: Fabienne Braune ; Nina Seemann ; Daniel Quernheim ; Andreas Maletti

Abstract: We present a new translation model integrating the shallow local multi bottomup tree transducer. We perform a largescale empirical evaluation of our obtained system, which demonstrates that we significantly beat a realistic tree-to-tree baseline on the WMT 2009 English → German tlriannes olnati tohne tWasMk.T TA 2s0 an a Edndgitliisonha →l c Gonetrrmibauntion we make the developed software and complete tool-chain publicly available for further experimentation.

3 0.83354759 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

Author: Wenduan Xu ; Yue Zhang ; Philip Williams ; Philipp Koehn

Abstract: We present a context-sensitive chart pruning method for CKY-style MT decoding. Source phrases that are unlikely to have aligned target constituents are identified using sequence labellers learned from the parallel corpus, and speed-up is obtained by pruning corresponding chart cells. The proposed method is easy to implement, orthogonal to cube pruning and additive to its pruning power. On a full-scale Englishto-German experiment with a string-totree model, we obtain a speed-up of more than 60% over a strong baseline, with no loss in BLEU.

4 0.8194347 312 acl-2013-Semantic Parsing as Machine Translation

Author: Jacob Andreas ; Andreas Vlachos ; Stephen Clark

Abstract: Semantic parsing is the problem of deriving a structured meaning representation from a natural language utterance. Here we approach it as a straightforward machine translation task, and demonstrate that standard machine translation components can be adapted into a semantic parser. In experiments on the multilingual GeoQuery corpus we find that our parser is competitive with the state of the art, and in some cases achieves higher accuracy than recently proposed purpose-built systems. These results support the use of machine translation methods as an informative baseline in semantic parsing evaluations, and suggest that research in semantic parsing could benefit from advances in machine translation.

5 0.78389996 330 acl-2013-Stem Translation with Affix-Based Rule Selection for Agglutinative Languages

Author: Zhiyang Wang ; Yajuan Lu ; Meng Sun ; Qun Liu

Abstract: Current translation models are mainly designed for languages with limited morphology, which are not readily applicable to agglutinative languages as the difference in the way lexical forms are generated. In this paper, we propose a novel approach for translating agglutinative languages by treating stems and affixes differently. We employ stem as the atomic translation unit to alleviate data spareness. In addition, we associate each stemgranularity translation rule with a distribution of related affixes, and select desirable rules according to the similarity of their affix distributions with given spans to be translated. Experimental results show that our approach significantly improves the translation performance on tasks of translating from three Turkic languages to Chinese.

6 0.74535567 314 acl-2013-Semantic Roles for String to Tree Machine Translation

7 0.74363387 19 acl-2013-A Shift-Reduce Parsing Algorithm for Phrase-based String-to-Dependency Translation

8 0.74299127 77 acl-2013-Can Markov Models Over Minimal Translation Units Help Phrase-Based SMT?

9 0.74109381 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation

10 0.7356196 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

11 0.73255605 10 acl-2013-A Markov Model of Machine Translation using Non-parametric Bayesian Inference

12 0.72151619 328 acl-2013-Stacking for Statistical Machine Translation

13 0.71797752 16 acl-2013-A Novel Translation Framework Based on Rhetorical Structure Theory

14 0.71496385 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment

15 0.71181196 127 acl-2013-Docent: A Document-Level Decoder for Phrase-Based Statistical Machine Translation

16 0.69657737 255 acl-2013-Name-aware Machine Translation

17 0.69542897 363 acl-2013-Two-Neighbor Orientation Model with Cross-Boundary Global Contexts

18 0.66424733 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation

19 0.66052604 110 acl-2013-Deepfix: Statistical Post-editing of Statistical Machine Translation Using Deep Syntactic Analysis

20 0.65063298 180 acl-2013-Handling Ambiguities of Bilingual Predicate-Argument Structures for Statistical Machine Translation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.058), (6, 0.044), (11, 0.057), (14, 0.022), (15, 0.02), (24, 0.02), (26, 0.073), (35, 0.052), (42, 0.065), (48, 0.044), (51, 0.01), (70, 0.078), (85, 0.171), (88, 0.018), (90, 0.098), (95, 0.088)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84882361 324 acl-2013-Smatch: an Evaluation Metric for Semantic Feature Structures

Author: Shu Cai ; Kevin Knight

Abstract: The evaluation of whole-sentence semantic structures plays an important role in semantic parsing and large-scale semantic structure annotation. However, there is no widely-used metric to evaluate wholesentence semantic structures. In this paper, we present smatch, a metric that calculates the degree of overlap between two semantic feature structures. We give an efficient algorithm to compute the metric and show the results of an inter-annotator agreement study.

2 0.83896309 133 acl-2013-Efficient Implementation of Beam-Search Incremental Parsers

Author: Yoav Goldberg ; Kai Zhao ; Liang Huang

Abstract: Beam search incremental parsers are accurate, but not as fast as they could be. We demonstrate that, contrary to popular belief, most current implementations of beam parsers in fact run in O(n2), rather than linear time, because each statetransition is actually implemented as an O(n) operation. We present an improved implementation, based on Tree Structured Stack (TSS), in which a transition is performed in O(1), resulting in a real lineartime algorithm, which is verified empiri- cally. We further improve parsing speed by sharing feature-extraction and dotproduct across beam items. Practically, our methods combined offer a speedup of ∼2x over strong baselines on Penn Treeb∼a2nxk sentences, a bnads are eosrd oenrs P eofn magnitude faster on much longer sentences.

3 0.81889522 130 acl-2013-Domain-Specific Coreference Resolution with Lexicalized Features

Author: Nathan Gilbert ; Ellen Riloff

Abstract: Most coreference resolvers rely heavily on string matching, syntactic properties, and semantic attributes of words, but they lack the ability to make decisions based on individual words. In this paper, we explore the benefits of lexicalized features in the setting of domain-specific coreference resolution. We show that adding lexicalized features to off-the-shelf coreference resolvers yields significant performance gains on four domain-specific data sets and with two types of coreference resolution architectures.

same-paper 4 0.81715059 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers

Author: Graham Neubig

Abstract: In this paper we describe Travatar, a forest-to-string machine translation (MT) engine based on tree transducers. It provides an open-source C++ implementation for the entire forest-to-string MT pipeline, including rule extraction, tuning, decoding, and evaluation. There are a number of options for model training, and tuning includes advanced options such as hypergraph MERT, and training of sparse features through online learning. The training pipeline is modeled after that of the popular Moses decoder, so users familiar with Moses should be able to get started quickly. We perform a validation experiment of the decoder on EnglishJapanese machine translation, and find that it is possible to achieve greater accuracy than translation using phrase-based and hierarchical-phrase-based translation. As auxiliary results, we also compare different syntactic parsers and alignment techniques that we tested in the process of developing the decoder. Travatar is available under the LGPL at http : / /phont ron . com/t ravat ar

5 0.77140075 6 acl-2013-A Java Framework for Multilingual Definition and Hypernym Extraction

Author: Stefano Faralli ; Roberto Navigli

Abstract: In this paper we present a demonstration of a multilingual generalization of Word-Class Lattices (WCLs), a supervised lattice-based model used to identify textual definitions and extract hypernyms from them. Lattices are learned from a dataset of automatically-annotated definitions from Wikipedia. We release a Java API for the programmatic use of multilingual WCLs in three languages (English, French and Italian), as well as a Web application for definition and hypernym extraction from user-provided sentences.

6 0.7316975 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT

7 0.71801281 320 acl-2013-Shallow Local Multi-Bottom-up Tree Transducers in Statistical Machine Translation

8 0.7164638 250 acl-2013-Models of Translation Competitions

9 0.71606761 139 acl-2013-Entity Linking for Tweets

10 0.70935076 137 acl-2013-Enlisting the Ghost: Modeling Empty Categories for Machine Translation

11 0.70536542 343 acl-2013-The Effect of Higher-Order Dependency Features in Discriminative Phrase-Structure Parsing

12 0.70280445 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing

13 0.70250034 314 acl-2013-Semantic Roles for String to Tree Machine Translation

14 0.70040333 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation

15 0.70016903 197 acl-2013-Incremental Topic-Based Translation Model Adaptation for Conversational Spoken Language Translation

16 0.69927943 200 acl-2013-Integrating Phrase-based Reordering Features into a Chart-based Decoder for Machine Translation

17 0.69859219 390 acl-2013-Word surprisal predicts N400 amplitude during reading

18 0.69794279 80 acl-2013-Chinese Parsing Exploiting Characters

19 0.69350159 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation

20 0.69332606 132 acl-2013-Easy-First POS Tagging and Dependency Parsing with Beam Search