acl acl2011 acl2011-171 knowledge-graph by maker-knowledge-mining

171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Source: pdf

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. [sent-6, score-0.976]

2 We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. [sent-10, score-0.535]

3 We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity. [sent-11, score-0.337]

4 Drawing on earlier successes in speech recognition, research in statistical machine translation has effectively used n-gram word sequence models as language models. [sent-21, score-0.386]

5 Modern phrase-based translation using large scale n-gram language models generally performs well in terms of lexical choice, but still often produces ungrammatical output. [sent-22, score-0.298]

6 Bottom-up and top-down parsers typically require a completed string as input; this requirement makes it difficult to incorporate these parsers into phrase-based translation, which generates hypothe- sized translations incrementally, from left-to-right. [sent-24, score-0.224]

7 1 As a workaround, parsers can rerank the translated output of translation systems (Och et al. [sent-25, score-0.364]

8 On the other hand, incremental parsers (Roark, 2001 ; Henderson, 2004; Schuler et al. [sent-27, score-0.391]

9 We observe that incremental parsers, used as structured language models, provide an appropriate algorithmic match to incremental phrase-based decoding. [sent-29, score-0.65]

10 We directly integrate incremental syntactic parsing into phrase-based translation. [sent-30, score-0.551]

11 , 2003) nor hierarchical phrase-based translation (Chiang, 2005) take explicit advantage of the syntactic structure of either source or target language. [sent-36, score-0.549]

12 The translation models in these techniques define phrases as contiguous word sequences (with gaps allowed in the case of hierarchical phrases) which may or may not correspond to any linguistic constituent. [sent-37, score-0.337]

13 Early work in statistical phrase-based translation considered whether restricting translation models to use only syntactically well-formed constituents might improve translation quality (Koehn et al. [sent-38, score-0.982]

14 , 2003) but found such restrictions failed to improve translation quality. [sent-39, score-0.298]

15 Significant research has examined the extent to which syntax can be usefully incorporated into statistical tree-based translation models: string-to-tree (Yamada and Knight, 2001 ; Gildea, 2003; Imamura et al. [sent-40, score-0.339]

16 , 2005) techniques use syntactic information to inform the translation model. [sent-55, score-0.451]

17 Recent work has shown that parsing-based machine translation using syntax-augmented (Zollmann and Venugopal, 2006) hierarchical translation grammars with rich nonterminal sets can demonstrate substantial gains over hierarchical grammars for certain language pairs (Baker et al. [sent-56, score-0.805]

18 In contrast to the above tree-based translation models, our approach maintains a standard (non-syntactic) phrase-based translation model. [sent-58, score-0.596]

19 Traditional approaches to language models in 621 speech recognition and statistical machine translation focus on the use of n-grams, which provide a simple finite-state model approximation of the target language. [sent-60, score-0.492]

20 Syntactic language models have also been explored with tree-based translation models. [sent-67, score-0.298]

21 (2003) use syntactic language models to rescore the output of a tree-based translation system. [sent-69, score-0.451]

22 Post and Gildea (2009) use tree substitution grammar parsing for language modeling, but do not use this language model in a translation system. [sent-71, score-0.473]

23 Our work, in contrast to the above approaches, explores the use of incremental syntactic language models in conjunction with phrase-based translation models. [sent-72, score-0.776]

24 Our syntactic language model fits into the family of linear-time dynamic programming parsers described in (Huang and Sagae, 2010). [sent-73, score-0.266]

25 Like (Galley and Manning, 2009) our work implements an incremental syntactic language model; our approach differs by calculating syntactic LM scores over all available phrase-structure parses at each hypothesis instead of the 1-best dependency parse. [sent-74, score-0.737]

26 The syntactic cohesion features of Cherry (2008) encourages the use of syntactically well-formed translation phrases. [sent-76, score-0.451]

27 These approaches are fully orthogonal to our proposed incremental syntactic language model, and could be applied in concert with our work. [sent-77, score-0.478]

28 Figure 1: Partial decoding lattice for standard phrase-based decoding stack algorithm translating the German sentence Der Pr a¨sident trifft am Freitag den Vorstand. [sent-87, score-0.507]

29 Each node h in decoding stack t represents the application of a translation option, and includes the source sentence coverage vector, target language ngram state, and syntactic language model state ˜τ th . [sent-88, score-0.897]

30 We use the English translation The president meets the board on Friday as a running example throughout all Figures. [sent-90, score-0.495]

31 Typically, tree ˆτ is taken to be: τˆ = argτmax P(τ |e) (1) We define a syntactic language model on the total probability based mass over all possible trees for string e. [sent-93, score-0.299]

32 1 Incremental syntactic language model An incremental parser processes each token of input sequentially from the beginning of a sentence to the end, rather than processing input in a top-down (Earley, 1968) or bottom-up (Cocke and Schwartz, 1970; Kasami, 1965; Younger, 1967) fashion. [sent-96, score-0.603]

33 After 622 processing the tth token in string e, an incremental parser has some internal representation of possible hypothesized (incomplete) trees, τt. [sent-97, score-0.467]

34 The syntactic language model probability of a partial sentence e1. [sent-98, score-0.256]

35 An incremental syntactic language model can then be defined by a probability mass function (Equation 5) and a transition function δ (Equation 6). [sent-108, score-0.58]

36 ˆe = argemaxexp(Xjλjhj(e,f)) (7) Phrase-based translation constructs a set of translation options hypothesized translations for contiguous portions of the source sentence from a trained phrase table, then incrementally constructs a — — lattice of partial target translations (Koehn, 2010). [sent-122, score-1.027]

37 To prune the search space, lattice nodes are organized into beam stacks (Jelinek, 1969) according to the number of source words translated. [sent-123, score-0.22]

38 An n-gram language model history is also maintained at each node in the translation lattice. [sent-124, score-0.388]

39 The search space is further trimmed with hypothesis recombination, which collapses lattice nodes that share a common coverage vector and n-gram state. [sent-125, score-0.274]

40 3 Incorporating a Syntactic Language Model Phrase-based translation produces target language words in an incremental left-to-right fashion, generating words at the beginning of a translation first and words at the end of a translation last. [sent-127, score-1.278]

41 Similarly, incremental parsers process sentences in an incremental fashion, analyzing words at the beginning of a sentence first and words at the end of a sentence last. [sent-128, score-0.716]

42 As such, an incremental parser with transition function δ can be incorporated into the phrase-based decoding process in a straightforward manner. [sent-129, score-0.579]

43 Each node in the translation lattice is augmented with a syntactic language model state τ˜ t. [sent-130, score-0.788]

44 The hypothesis at the root of the translation lattice is initialized with ˜τ 0, representing the internal state of the incremental parser before any input words are processed. [sent-131, score-1.054]

45 The phrase-based translation decoding process adds nodes to the lattice; each new node contains one or more target language words. [sent-132, score-0.521]

46 Given a new target language word et and ˜τ t−1, the incremental parser’s transition function calculates ˜τ t. [sent-134, score-0.439]

47 Figure 1 illustrates δ 623 S NP DT VP NN The president VP VB meets PP NP IN DT NN the NP on Friday board Figure 2: Sample binarized phrase structure tree. [sent-135, score-0.237]

48 S S/NP S/PP NP IN Friday S/VP VP NP VP/NN NP/NN NN VP/NP DT president VB The on NN DT board the meets Figure 3: Sample binarized phrase structure tree after application of right-corner transform. [sent-136, score-0.252]

49 a sample phrase-based decoding lattice where each translation lattice node is augmented with syntactic language model state ˜τ t. [sent-137, score-1.077]

50 In phrase-based translation, many translation lattice nodes represent multi-word target language phrases. [sent-138, score-0.525]

51 For such translation lattice nodes, δ will be called once for each newly hypothesized target language word in the node. [sent-139, score-0.589]

52 Only the final syntactic language model state in such sequences need be stored in the translation lattice node. [sent-140, score-0.745]

53 4 Incremental Bounded-Memory Parsing with a Time Series Model Having defined the framework by which any in- cremental parser may be incorporated into phrasebased translation, we now formally define a specific incremental parser for use in our experiments. [sent-141, score-0.481]

54 The parser must process target language words incrementally as the phrase-based decoder adds hypotheses to the translation lattice. [sent-142, score-0.435]

55 To facilitate this incremental processing, ordinary phrase-structure trees can be transformed into right-corner recur- Figure 4: Graphical representation of the dependency structure in a standard Hierarchic Hidden Markov Model with D = 3 hidden levels that can be used to parse syntax. [sent-143, score-0.369]

56 This model of incremental parsing is implemented as a Hierarchical Hidden Markov Model (HHMM) (Murphy and Paskin, 2001), and is equivalent to a probabilistic pushdown automaton with a bounded pushdown store. [sent-153, score-0.703]

57 5) for a partial target language hypothesis, using a bounded store of incomplete constituents cη/cηι. [sent-158, score-0.472]

58 1 Formal Parsing Model: Scoring Partial Translation Hypotheses This model is essentially an extension of an HHMM, which obtains a most likely sequence of hidden store states, s11. [sent-161, score-0.225]

59 T, using HHMM state transition model θA and observation symbol model θB (Rabiner, 1990): s11. [sent-169, score-0.228]

60 D) (8) The HHMM parser is equivalent to a probabilistic pushdown automaton with a bounded push- down store. [sent-179, score-0.256]

61 The model generates each successive store (using store model θS) only after considering whether each nested sequence of incomplete constituents has completed and reduced (using reduction model θR): PθA(st1. [sent-180, score-0.723]

62 sentence The The shaded path through the parse lattice illustrates the recognized right-corner tree structure of Figure 3. [sent-186, score-0.263]

63 i f f r dtdt + + 1 1= 01 : P JrθtdR,=d(r ⊥tdK|rtd+1std−1sdt−−11) (13) where r⊥ is a null state resulting from the failure of an incomplete constituent to complete, and constants are defined for the edge conditions of st0 and Figure 5 illustrates this model in action. [sent-188, score-0.326]

64 These pushdown automaton operations are then refined for right-corner parsing (Schuler, 2009), distinguishing active transitions (model θS-T-A,d, in which an incomplete constituent is completed, but not reduced, and then immediately expanded to a rtD+1. [sent-189, score-0.36]

65 o 625 new incomplete constituent in the same store element) from awaited transitions (model θS-T-W,d, which involve no completion): PθS-T,d(std | rdt+1rdtsdstd−1)d=ef ? [sent-191, score-0.429]

66 Figure 6: A hypothesis in the phrase-based decoding lattice from Figure 1is expanded using translation option the board of source phrase den Vorstand. [sent-221, score-0.769]

67 Syntactic language model state ˜τ 31 contains random variables s13. [sent-222, score-0.233]

68 Figure 1illustrates an excerpt from a standard phrase-based translation lattice. [sent-264, score-0.298]

69 Within each decoder stack t, each hypothesis h is augmented with a syntactic language model state ˜τ th . [sent-265, score-0.482]

70 Each syntactic language model state is a random variable store, containing a slice of random variables from the HHMM. [sent-266, score-0.433]

71 By maintaining these syntactic random variable stores, each hypothesis has access to the current language model probability for the partial translation ending at that hypothesis, as calculated by an incremental syntactic language model defined by the HHMM. [sent-270, score-1.232]

72 Specifically, the random variable store at hypothesis h provides P(˜ τth) = P(e1h. [sent-271, score-0.331]

73 t is the sequence of words in a partial hypothesis ending at h which contains t target words, and where there are D syntactic random variables in each random variable store (Eq. [sent-279, score-0.706]

74 In the simplest case, a new hypothesis extends an existing hypothesis by exactly one target word. [sent-283, score-0.271]

75 As the new hypothesis is constructed by extending an existing stack element, the store and reduction state random variables are processed, along with the newly hypothesized word. [sent-284, score-0.679]

76 This results in a new store of syntactic random variables (Eq. [sent-285, score-0.438]

77 When a new hypothesis extends an existing hypothesis by more than one word, this process is first carried out for the first new word in the hypothe- sis. [sent-287, score-0.212]

78 Once the final word in the hypothesis has been processed, the resulting random variable store is associated with that hypothesis. [sent-289, score-0.331]

79 Figure 6 illustrates this process, showing how a syntactic language model state ˜τ 51 in a phrase-based decoding lattice is obtained from a previous syntactic language model state ˜τ 31 (from Figure 1) by parsing the target language words from a phrasebased translation option. [sent-291, score-1.317]

80 The HHMM outperforms the n-gram model in terms of out-of-domain test set perplexity when trained on the same WSJ data; the best perplexity results for in-domain and out-of-domain test sets4 are found by interpolating 4In-domain is WSJ Section 23. [sent-308, score-0.201]

81 152813 521 50179 435 3621345 17864 8239 Figure 8: Mean per-sentence decoding time (in seconds) for dev set using Moses with and without syntactic language model. [sent-312, score-0.274]

82 HHMM parser beam sizes are indicated for the syntactic LM. [sent-313, score-0.283]

83 We trained a phrase-based translation model on the full NIST Open MT08 Urdu-English translation model using the full training data. [sent-317, score-0.69]

84 During tuning, Moses was first configured to use just the n-gram LM, then configured to use both the n-gram LM and the syntactic HHMM LM. [sent-319, score-0.227]

85 In our integration with Moses, incorporating a syntactic language model dramatically slows the decoding process. [sent-321, score-0.321]

86 7 Discussion This paper argues that incremental syntactic languages models are a straightforward and appro628 Moses LM(s)BLEU nH-HgrMamM o +n nly-gram1198. [sent-326, score-0.478]

87 priate algorithmic fit for incorporating syntax into phrase-based statistical machine translation, since both process sentences in an incremental left-toright fashion. [sent-329, score-0.413]

88 This means incremental syntactic LM scores can be calculated during the decoding process, rather than waiting until a complete sentence is posited, which is typically necessary in top-down or bottom-up parsing. [sent-330, score-0.599]

89 We provided a rigorous formal definition of incremental syntactic languages models, and detailed what steps are necessary to incorporate such LMs into phrase-based decoding. [sent-331, score-0.515]

90 We integrated an incremental syntactic language model into Moses. [sent-332, score-0.525]

91 The translation quality significantly improved on a constrained task, and the perplexity improvements suggest that interpolating between n-gram and syntactic LMs may hold promise on larger data sets. [sent-333, score-0.567]

92 The use of very large n-gram language models is typically a key ingredient in the best-performing machine translation systems (Brants et al. [sent-334, score-0.345]

93 Our future work seeks to incorporate largescale n-gram language models in conjunction with incremental syntactic language models. [sent-337, score-0.478]

94 The added decoding time cost of our syntactic language model is very high. [sent-338, score-0.321]

95 By increasing the beam size and distortion limit of the baseline system, future work may examine whether a baseline system with comparable runtimes can achieve comparable translation quality. [sent-339, score-0.35]

96 A more efficient implementation of the HHMM parser would speed decoding and make more extensive and conclusive translation experiments possible. [sent-340, score-0.497]

97 Scalable inference and training of context-rich syntactic translation models. [sent-444, score-0.451]

98 Example-based machine translation based on syntactic transfer with statistical models. [sent-480, score-0.539]

99 Positive results for parsing with a bounded stack using a model-based right-corner trans631 form. [sent-582, score-0.221]

100 A new string-to-dependency machine translation algorithm with a target dependency language model. [sent-586, score-0.404]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hhmm', 0.385), ('incremental', 0.325), ('translation', 0.298), ('store', 0.178), ('lattice', 0.168), ('ef', 0.157), ('syntactic', 0.153), ('rtd', 0.136), ('decoding', 0.121), ('schuler', 0.115), ('rdt', 0.113), ('hypothesis', 0.106), ('stack', 0.097), ('lms', 0.093), ('awaited', 0.091), ('lm', 0.081), ('moses', 0.081), ('incomplete', 0.081), ('pushdown', 0.08), ('std', 0.08), ('constituent', 0.079), ('state', 0.079), ('parser', 0.078), ('perplexity', 0.077), ('dt', 0.076), ('meets', 0.076), ('board', 0.076), ('parsing', 0.073), ('frtd', 0.068), ('huang', 0.068), ('parsers', 0.066), ('hypothesized', 0.064), ('deneefe', 0.064), ('hc', 0.06), ('variables', 0.06), ('ppl', 0.06), ('friday', 0.06), ('jc', 0.06), ('knight', 0.06), ('target', 0.059), ('chelba', 0.058), ('kevin', 0.057), ('galley', 0.057), ('partial', 0.056), ('meeting', 0.056), ('murphy', 0.055), ('cocke', 0.055), ('adjoining', 0.055), ('tree', 0.055), ('transition', 0.055), ('wsj', 0.054), ('jelinek', 0.053), ('beam', 0.052), ('frederick', 0.052), ('kenji', 0.052), ('bounded', 0.051), ('completed', 0.05), ('reduction', 0.048), ('shieber', 0.048), ('automaton', 0.047), ('random', 0.047), ('constituents', 0.047), ('machine', 0.047), ('model', 0.047), ('association', 0.047), ('annual', 0.046), ('hierarchic', 0.045), ('paskin', 0.045), ('slowdown', 0.045), ('tdtd', 0.045), ('lane', 0.045), ('mi', 0.045), ('president', 0.045), ('trees', 0.044), ('synchronous', 0.044), ('node', 0.043), ('grammars', 0.042), ('translations', 0.042), ('qun', 0.041), ('liu', 0.041), ('statistical', 0.041), ('devtest', 0.04), ('jr', 0.04), ('ades', 0.04), ('illustrates', 0.04), ('hierarchical', 0.039), ('haitao', 0.039), ('post', 0.039), ('constrained', 0.039), ('och', 0.038), ('baker', 0.037), ('daniel', 0.037), ('configured', 0.037), ('nesson', 0.037), ('formal', 0.037), ('air', 0.036), ('gildea', 0.036), ('cc', 0.036), ('koehn', 0.036), ('liang', 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000007 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

2 0.23530769 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

Author: Markos Mylonakis ; Khalil Sima'an

Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.

3 0.19120491 30 acl-2011-Adjoining Tree-to-String Translation

Author: Yang Liu ; Qun Liu ; Yajuan Lu

Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.

4 0.18965062 61 acl-2011-Binarized Forest to String Translation

Author: Hao Zhang ; Licheng Fang ; Peng Xu ; Xiaoyun Wu

Abstract: Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably, our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. For the purpose of reducing search errors, we apply the synchronous binarization technique to forest-tostring decoding. Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system). Consistent and significant gains are also shown in WMT 2010 in the English to German, French, Spanish and Czech tracks.

5 0.18920942 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation

Author: Ashish Vaswani ; Haitao Mi ; Liang Huang ; David Chiang

Abstract: Most statistical machine translation systems rely on composed rules (rules that can be formed out of smaller rules in the grammar). Though this practice improves translation by weakening independence assumptions in the translation model, it nevertheless results in huge, redundant grammars, making both training and decoding inefficient. Here, we take the opposite approach, where we only use minimal rules (those that cannot be formed out of other rules), and instead rely on a rule Markov model of the derivation history to capture dependencies between minimal rules. Large-scale experiments on a state-of-the-art tree-to-string translation system show that our approach leads to a slimmer model, a faster decoder, yet the same translation quality (measured using B ) as composed rules.

6 0.17253357 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

7 0.17210566 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

8 0.16713713 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

9 0.16060266 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

10 0.16057667 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

11 0.15214317 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

12 0.15034592 184 acl-2011-Joint Hebrew Segmentation and Parsing using a PCFGLA Lattice Parser

13 0.14762053 313 acl-2011-Two Easy Improvements to Lexical Weighting

14 0.14656278 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation

15 0.13657475 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

16 0.13615757 217 acl-2011-Machine Translation System Combination by Confusion Forest

17 0.13560978 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation

18 0.1315054 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

19 0.13036165 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

20 0.12684309 282 acl-2011-Shift-Reduce CCG Parsing

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.317), (1, -0.265), (2, 0.111), (3, -0.064), (4, 0.045), (5, 0.011), (6, -0.113), (7, -0.001), (8, 0.006), (9, 0.016), (10, 0.015), (11, -0.037), (12, 0.011), (13, -0.127), (14, 0.041), (15, -0.001), (16, -0.052), (17, -0.004), (18, 0.012), (19, 0.005), (20, 0.059), (21, 0.04), (22, 0.072), (23, 0.011), (24, 0.023), (25, -0.056), (26, 0.05), (27, -0.004), (28, -0.038), (29, 0.024), (30, -0.022), (31, -0.074), (32, -0.001), (33, -0.007), (34, -0.009), (35, -0.033), (36, 0.005), (37, -0.063), (38, 0.028), (39, -0.043), (40, -0.012), (41, -0.042), (42, 0.064), (43, -0.047), (44, -0.073), (45, 0.012), (46, -0.081), (47, -0.09), (48, -0.027), (49, 0.024)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96881676 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

2 0.86998254 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

Author: Daniel Emilio Beck

Abstract: In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.

3 0.79329556 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

Author: Markos Mylonakis ; Khalil Sima'an

4 0.79202598 217 acl-2011-Machine Translation System Combination by Confusion Forest

Author: Taro Watanabe ; Eiichiro Sumita

Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.

5 0.77595711 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations

Author: Bing Zhao ; Young-Suk Lee ; Xiaoqiang Luo ; Liu Li

Abstract: We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word reordering. In particular, we integrate synchronous binarizations, verb regrouping, removal of redundant parse nodes, and incorporate a few important features such as translation boundaries. We learn the structural preferences from the data in a generative framework. The syntax-based translation system integrating the proposed techniques outperforms the best Arabic-English unconstrained system in NIST08 evaluations by 1.3 absolute BLEU, which is statistically significant.

6 0.75883442 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

7 0.75709182 61 acl-2011-Binarized Forest to String Translation

8 0.74053264 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation

9 0.73691255 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation

10 0.73610628 17 acl-2011-A Large Scale Distributed Syntactic, Semantic and Lexical Language Model for Machine Translation

11 0.72730935 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

12 0.7212562 116 acl-2011-Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers

13 0.71813041 313 acl-2011-Two Easy Improvements to Lexical Weighting

14 0.71639615 69 acl-2011-Clause Restructuring For SMT Not Absolutely Helpful

15 0.68469143 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering

16 0.67445469 220 acl-2011-Minimum Bayes-risk System Combination

17 0.6740613 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

18 0.67320645 30 acl-2011-Adjoining Tree-to-String Translation

19 0.65965074 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

20 0.65670723 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.027), (17, 0.103), (26, 0.016), (37, 0.087), (39, 0.074), (41, 0.061), (43, 0.146), (53, 0.014), (55, 0.042), (59, 0.056), (72, 0.042), (91, 0.054), (96, 0.19), (98, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87070072 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu

2 0.85077083 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment

Author: Yashar Mehdad ; Matteo Negri ; Marcello Federico

Abstract: This paper explores the use of bilingual parallel corpora as a source of lexical knowledge for cross-lingual textual entailment. We claim that, in spite of the inherent difficulties of the task, phrase tables extracted from parallel data allow to capture both lexical relations between single words, and contextual information useful for inference. We experiment with a phrasal matching method in order to: i) build a system portable across languages, and ii) evaluate the contribution of lexical knowledge in isolation, without interaction with other inference mechanisms. Results achieved on an English-Spanish corpus obtained from the RTE3 dataset support our claim, with an overall accuracy above average scores reported by RTE participants on monolingual data. Finally, we show that using parallel corpora to extract paraphrase tables reveals their potential also in the monolingual setting, improving the results achieved with other sources of lexical knowledge.

3 0.84792686 190 acl-2011-Knowledge-Based Weak Supervision for Information Extraction of Overlapping Relations

Author: Raphael Hoffmann ; Congle Zhang ; Xiao Ling ; Luke Zettlemoyer ; Daniel S. Weld

Abstract: Information extraction (IE) holds the promise of generating a large-scale knowledge base from the Web’s natural language text. Knowledge-based weak supervision, using structured data to heuristically label a training corpus, works towards this goal by enabling the automated learning of a potentially unbounded number of relation extractors. Recently, researchers have developed multiinstance learning algorithms to combat the noisy training data that can come from heuristic labeling, but their models assume relations are disjoint — for example they cannot extract the pair Founded ( Jobs Apple ) and CEO-o f ( Jobs Apple ) . , , This paper presents a novel approach for multi-instance learning with overlapping relations that combines a sentence-level extrac- , tion model with a simple, corpus-level component for aggregating the individual facts. We apply our model to learn extractors for NY Times text using weak supervision from Freebase. Experiments show that the approach runs quickly and yields surprising gains in accuracy, at both the aggregate and sentence level.

4 0.8471843 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation

Author: Zhongguo Li

Abstract: Lots of Chinese characters are very productive in that they can form many structured words either as prefixes or as suffixes. Previous research in Chinese word segmentation mainly focused on identifying only the word boundaries without considering the rich internal structures of many words. In this paper we argue that this is unsatisfying in many ways, both practically and theoretically. Instead, we propose that word structures should be recovered in morphological analysis. An elegant approach for doing this is given and the result is shown to be promising enough for encouraging further effort in this direction. Our probability model is trained with the Penn Chinese Treebank and actually is able to parse both word and phrase structures in a unified way. 1 Why Parse Word Structures? Research in Chinese word segmentation has progressed tremendously in recent years, with state of the art performing at around 97% in precision and recall (Xue, 2003; Gao et al., 2005; Zhang and Clark, 2007; Li and Sun, 2009). However, virtually all these systems focus exclusively on recognizing the word boundaries, giving no consideration to the internal structures of many words. Though it has been the standard practice for many years, we argue that this paradigm is inadequate both in theory and in practice, for at least the following four reasons. The first reason is that if we confine our definition of word segmentation to the identification of word boundaries, then people tend to have divergent 1405 opinions as to whether a linguistic unit is a word or not (Sproat et al., 1996). This has led to many different annotation standards for Chinese word segmentation. Even worse, this could cause inconsistency in the same corpus. For instance, 䉂擌奒 ‘vice president’ is considered to be one word in the Penn Chinese Treebank (Xue et al., 2005), but is split into two words by the Peking University corpus in the SIGHAN Bakeoffs (Sproat and Emerson, 2003). Meanwhile, 䉂䀓惼 ‘vice director’ and 䉂䚲䡮 ‘deputy are both segmented into two words in the same Penn Chinese Treebank. In fact, all these words are composed of the prefix 䉂 ‘vice’ and a root word. Thus the structure of 䉂擌奒 ‘vice president’ can be represented with the tree in Figure 1. Without a doubt, there is complete agree- manager’ NN ,,ll JJf NNf 䉂擌奒 Figure 1: Example of a word with internal structure. ment on the correctness of this structure among native Chinese speakers. So if instead of annotating only word boundaries, we annotate the structures of every word, then the annotation tends to be more 1 1Here it is necessary to add a note on terminology used in this paper. Since there is no universally accepted definition of the “word” concept in linguistics and especially in Chinese, whenever we use the term “word” we might mean a linguistic unit such as 䉂擌奒 ‘vice president’ whose structure is shown as the tree in Figure 1, or we might mean a smaller unit such as 擌奒 ‘president’ which is a substructure of that tree. Hopefully, ProceedingPso orftla thned 4,9 Otrhe Agonnn,u Jauln Mee 1e9t-i2ng4, o 2f0 t1h1e. A ?c s 2o0ci1a1ti Aonss foocria Ctioomnp fourta Ctioomnaplu Ltaintigouniaslti Lcisn,g puaigsetsic 1s405–1414, consistent and there could be less duplication of efforts in developing the expensive annotated corpus. The second reason is applications have different requirements for granularity of words. Take the personal name 撱嗤吼 ‘Zhou Shuren’ as an example. It’s considered to be one word in the Penn Chinese Treebank, but is segmented into a surname and a given name in the Peking University corpus. For some applications such as information extraction, the former segmentation is adequate, while for others like machine translation, the later finer-grained output is more preferable. If the analyzer can produce a structure as shown in Figure 4(a), then every application can extract what it needs from this tree. A solution with tree output like this is more elegant than approaches which try to meet the needs of different applications in post-processing (Gao et al., 2004). The third reason is that traditional word segmentation has problems in handling many phenomena in Chinese. For example, the telescopic compound 㦌撥怂惆 ‘universities, middle schools and primary schools’ is in fact composed ofthree coordinating elements 㦌惆 ‘university’, 撥惆 ‘middle school’ and 怂惆 ‘primary school’ . Regarding it as one flat word loses this important information. Another example is separable words like 扩扙 ‘swim’ . With a linear segmentation, the meaning of ‘swimming’ as in 扩堑扙 ‘after swimming’ cannot be properly represented, since 扩扙 ‘swim’ will be segmented into discontinuous units. These language usages lie at the boundary between syntax and morphology, and are not uncommon in Chinese. They can be adequately represented with trees (Figure 2). (a) NN (b) ???HHH JJ NNf ???HHH JJf JJf JJf 㦌撥怂惆 VV ???HHH VV NNf ZZ VVf VVf 扩扙堑 Figure 2: Example of telescopic compound (a) and separable word (b). The last reason why we should care about word the context will always make it clear what is being referred to with the term “word”. 1406 structures is related to head driven statistical parsers (Collins, 2003). To illustrate this, note that in the Penn Chinese Treebank, the word 戽䊂䠽吼 ‘English People’ does not occur at all. Hence constituents headed by such words could cause some difficulty for head driven models in which out-ofvocabulary words need to be treated specially both when they are generated and when they are conditioned upon. But this word is in turn headed by its suffix 吼 ‘people’, and there are 2,233 such words in Penn Chinese Treebank. If we annotate the structure of every compound containing this suffix (e.g. Figure 3), such data sparsity simply goes away.

5 0.84463418 324 acl-2011-Unsupervised Semantic Role Induction via Split-Merge Clustering

Author: Joel Lang ; Mirella Lapata

Abstract: In this paper we describe an unsupervised method for semantic role induction which holds promise for relieving the data acquisition bottleneck associated with supervised role labelers. We present an algorithm that iteratively splits and merges clusters representing semantic roles, thereby leading from an initial clustering to a final clustering of better quality. The method is simple, surprisingly effective, and allows to integrate linguistic knowledge transparently. By combining role induction with a rule-based component for argument identification we obtain an unsupervised end-to-end semantic role labeling system. Evaluation on the CoNLL 2008 benchmark dataset demonstrates that our method outperforms competitive unsupervised approaches by a wide margin.

6 0.843494 137 acl-2011-Fine-Grained Class Label Markup of Search Queries

7 0.84344304 87 acl-2011-Corpus Expansion for Statistical Machine Translation with Semantic Role Label Substitution Rules

8 0.84016645 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction

9 0.83962476 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks

10 0.83879894 3 acl-2011-A Bayesian Model for Unsupervised Semantic Parsing

11 0.83879685 30 acl-2011-Adjoining Tree-to-String Translation

12 0.83867478 318 acl-2011-Unsupervised Bilingual Morpheme Segmentation and Alignment with Context-rich Hidden Semi-Markov Models

13 0.83853173 170 acl-2011-In-domain Relation Discovery with Meta-constraints via Posterior Regularization

14 0.83828163 254 acl-2011-Putting it Simply: a Context-Aware Approach to Lexical Simplification

15 0.83809483 126 acl-2011-Exploiting Syntactico-Semantic Structures for Relation Extraction

16 0.8379041 28 acl-2011-A Statistical Tree Annotator and Its Applications

17 0.83773577 61 acl-2011-Binarized Forest to String Translation

18 0.83735275 187 acl-2011-Jointly Learning to Extract and Compress

19 0.83646643 178 acl-2011-Interactive Topic Modeling

20 0.8344 141 acl-2011-Gappy Phrasal Alignment By Agreement