acl acl2011 acl2011-290 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Daniel Emilio Beck
Abstract: In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.
Reference: text
sentIndex sentText sentNum sentScore
1 Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers Daniel Emilio Beck Computer Science Department Federal University of S a˜o Carlos daniel beck@ dc . [sent-1, score-0.053]
2 br Abstract In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. [sent-3, score-0.149]
3 Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. [sent-4, score-0.038]
4 Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. [sent-5, score-0.074]
5 These formalisms have important representational properties that makes them well-suited for syntax modeling. [sent-6, score-0.114]
6 Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese. [sent-7, score-0.08]
7 However, since the advent of PB-SMT by Koehn et al. [sent-10, score-0.037]
8 (2003) and Och and Ney (2004), purely statistical MT systems have not achieved considerable improvements. [sent-11, score-0.04]
9 This Master’s thesis proposal aims to improve SMT systems by including syntactic information in the first and second steps. [sent-14, score-0.223]
10 Thereas 1For the remainder of this proposal, Iwill refer to this step simply translation model. [sent-15, score-0.143]
11 36 fore, I plan to investigate two approaches: the Treeto-String (TTS) and the Tree-to-Tree (TTT) models. [sent-16, score-0.042]
12 In the former, syntactic information is provided only for the source language while in the latter, it is provided for both source and target languages. [sent-17, score-0.074]
13 There are many formal theories to represent syntax in a language, like Context-free Grammars (CFGs), Tree Substitution Grammars (TSGs), Tree Adjoining Grammars (TAGs) and all its synchronous counterparts. [sent-18, score-0.118]
14 In this work, I represent each sentence as a constituent tree and use Tree Automata (TAs) and Tree Transducers (TTs) in the language and translation models. [sent-19, score-0.31]
15 Although this work is mainly language independent, proof-of-concept experiments will be executed on the English and Brazilian Portuguese (en-ptBR) language pair. [sent-20, score-0.036]
16 Previous research on factored translation for this pair (using morphological information) showed that it improved the results in terms of BLEU (Papineni et al. [sent-21, score-0.196]
17 However, even factored translation models have limitations: many languages (and Brazilian Portuguese is not an exception) have relatively loose word order constraints and present longdistance agreements that cannot be efficiently represented by those models. [sent-23, score-0.234]
18 Such phenomena motivate the use of more powerful models that take syntactic information into account. [sent-24, score-0.112]
19 (2008) uses synchronous CFGs rules and Liu et al. [sent-29, score-0.106]
20 (2006) also uses transducer rules but extract them from parse trees in target language instead (the string-to-tree approach - STT). [sent-32, score-0.2]
21 All those works also include methods and algorithms for efficient rule extraction since it’s unfeasible to extract all possible rules from a parsed corpus due to exponential cost. [sent-35, score-0.215]
22 These works mainly try to incorporate non-syntatic phrases into a syntax-based model: while Liu et al. [sent-37, score-0.074]
23 (2008) uses an algorithm to convert leaves in a parse tree to phrases before rule extraction. [sent-39, score-0.304]
24 Language models that take into account syntactic aspects have also been an active research subject. [sent-40, score-0.112]
25 While works like Post and Gildea (2009) and Vandeghinste (2009) focus solely on language modeling itself, Graham and van Genabith (2010) shows an experiment that incorporates a syntax-based model into an PB-SMT system. [sent-41, score-0.079]
26 3 Tree automata and tree transducers Tree Automata are similar to Finite-state Automata (FSA), except they recognize trees instead of strings (or sequences of words). [sent-42, score-0.632]
27 Formally, FSA can only represent Regular Languages and thus, cannot efficiently model several syntactic features, including long-distance agreement. [sent-43, score-0.074]
28 TA recognize the so- called Regular Tree Languages (RTLs), which can represent Context-free Languages (CFLs) since a set of all syntactic trees of a CFL is an RTL (Comon et al. [sent-44, score-0.134]
29 Figure 1 shows such an RTL, composed of two trees. [sent-47, score-0.04]
30 If we extract an CFG from this RTL it would have the recursive rule S → SS, mw thhiicsh R wTLou iltd w generate an hinef irneictuer sseivte eo rfu syntactic trees. [sent-48, score-0.172]
31 In other words, there isn’t an CFG capable to generate only the syntactic trees contained in the RTL shown in Figure 1. [sent-49, score-0.134]
32 This feature implies that RTLs have more representational power than CFLs. [sent-50, score-0.06]
33 An FST is composed by an input RTL, an output RTL and a set of transformation rules. [sent-52, score-0.04]
34 Top-down (T) transducers processes input trees starting from its root and descending through its nodes until it reaches the leaves, in contrast to bottom-up transducers, which do the opposite. [sent-56, score-0.274]
35 Figure 2 shows a T rule, where uppercase letters (NP) represent symbols, lowercase letters (q, r, s) represent states and x1 and x2 are variables (formal definitions can be found in Comon et al. [sent-57, score-0.074]
36 Default top-down transducers must have only one symbol on the left-hand sides and thus cannot model some syntactic transformations (like local reordering, for example) without relying on copy and delete operations (Maletti et al. [sent-59, score-0.346]
37 Extended topdown transducers allow multiple symbols on lefthand sides, making them more suited for syntax modeling. [sent-61, score-0.331]
38 Tree-to-string transducers simply drop the tree structure on righthand sides, which makes them adequate for translation models wihtout syntactic information in one of the languages. [sent-64, score-0.673]
39 q NP x1 x2 NP q q −→ x2 x1 Figure 2: Example of a T rule 4 SMT Model The systems will be implemented using a discriminative, log-linear model (Och and Ney, 2002), using the language and translation models as feature functions. [sent-66, score-0.279]
40 Settings that uses more features besides those two models will also be built. [sent-67, score-0.038]
41 (2008) The translation models will be weighted TTs (Graehl et al. [sent-70, score-0.181]
42 (2004) but Ialso plan to investigate the approaches used by Liu et al. [sent-75, score-0.042]
43 For TTT rule extraction, Iwill use a method similar to the one described in Zhang et al. [sent-78, score-0.098]
44 Ialso plan to use language models which takes into account syntactic properties. [sent-80, score-0.154]
45 Although most works in syntactic language models uses tree grammars like TSGs and TAGs, these can be simulated by TAs and TTs (Shieber, 2004; Maletti, 2010). [sent-81, score-0.381]
46 This property can help the systems implementation because it’s possible to unite language and translation modeling in one TT toolkit. [sent-82, score-0.143]
47 5 Methods In this section, I present the experiments proposed in my thesis and the materials required, along with the metrics used for evaluation. [sent-83, score-0.14]
48 38 q S x1 SINV x2 q x3 S q VP x3 q q −→ x2 S r VP x1 S x1 x2 x2 s q r SINV x1 x2 s SINV x2 −→ x1 q −→ x2 q x1 x2 −→ x1 Figure 3: Example of a xT rule and its corresponding T rules 5. [sent-85, score-0.14]
49 1 Materials To implement and evaluate the techniques described, a parallel corpus with syntactic annotation is required. [sent-86, score-0.074]
50 As the focus of this thesis is the English and Brazilian Portuguese language pair, Iwill use the PesquisaFAPESP corpus2 in my experiments. [sent-87, score-0.088]
51 This corpus is composed of 646 scientific papers, originally written in Brazilian Portuguese and manually translated into English, resulting in about 17,000 parallel sentences. [sent-88, score-0.04]
52 As for syntactic annotation, I will use the Berkeley parser (Petrov and Klein, 2007) for 2http : / / revi st ape squi s a . [sent-89, score-0.148]
53 br Sq x1 VP V x2 was −→ x1 foi x2 Figure 4: Example of a xTS rule (for the en-ptBR language pair) English and the PALAVRAS parser (Bick, 2000) for Brazilian Portuguese. [sent-91, score-0.098]
54 In addition to the corpora and parsers, the following tools will be used: • • • GIZA++3 (Och and Ney, 2000) for lexical alignment Tiburon4 (May and Knight, 2006) for transdTuibcuerro training in both TTS and TTT systems Moses5 (Koehn et al. [sent-92, score-0.043]
55 For the TTS systems (one for each translation direction), the training set will be lexically aligned using GIZA++ and for the TTT system, its syntactic trees will be aligned using techniques similar to the ones proposed by Gildea (2003) and by Zhang et al. [sent-96, score-0.277]
56 The baseline will be the score for fac- tored translation, shown in Table 1. [sent-100, score-0.037]
57 6 Contributions After its conclusion, this thesis will have brought the following contributions: 3http : //www . [sent-101, score-0.088]
58 org/moses • • • 39 Language-independent SMT models which incorporates syntactic nitn fSoMrmTa tmioond eilns bwohtihc hla inn-guage and translation models. [sent-108, score-0.292]
59 Implementations of these models, using the tIomoplsle dmesencrtiabtieodn isn oSfec tthioense e5. [sent-109, score-0.04]
60 Technical reports will be written during this thesis progress and made publicly available. [sent-111, score-0.088]
61 Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. [sent-132, score-0.143]
62 Scalable inference and training of context-rich syntactic translation models. [sent-140, score-0.217]
63 Discriminative training and maximum entropy models for statistical machine translation. [sent-192, score-0.078]
wordName wordTfidf (topN-words)
[('tts', 0.317), ('rtl', 0.291), ('ttt', 0.291), ('brazilian', 0.22), ('transducers', 0.214), ('automata', 0.191), ('tree', 0.167), ('maletti', 0.146), ('translation', 0.143), ('smt', 0.134), ('portuguese', 0.134), ('graehl', 0.128), ('iwill', 0.127), ('cfl', 0.125), ('comon', 0.125), ('rtls', 0.125), ('xts', 0.125), ('ialso', 0.11), ('sinv', 0.11), ('rule', 0.098), ('transducer', 0.098), ('fsa', 0.095), ('thesis', 0.088), ('nguyen', 0.087), ('caseli', 0.083), ('palavras', 0.083), ('tiburon', 0.083), ('och', 0.081), ('syntactic', 0.074), ('tas', 0.073), ('tsgs', 0.073), ('beck', 0.073), ('jonathan', 0.073), ('kevin', 0.07), ('galley', 0.069), ('knight', 0.068), ('josef', 0.067), ('synchronous', 0.064), ('grammars', 0.064), ('topdown', 0.063), ('proposal', 0.061), ('trees', 0.06), ('graham', 0.06), ('fst', 0.06), ('representational', 0.06), ('sides', 0.058), ('liu', 0.057), ('zhang', 0.056), ('cfgs', 0.055), ('syntax', 0.054), ('master', 0.053), ('factored', 0.053), ('daniel', 0.053), ('bleu', 0.052), ('materials', 0.052), ('gildea', 0.05), ('xt', 0.049), ('vp', 0.047), ('franz', 0.046), ('nist', 0.046), ('koehn', 0.044), ('alignment', 0.043), ('yamada', 0.043), ('plan', 0.042), ('cfg', 0.042), ('rules', 0.042), ('van', 0.041), ('isn', 0.04), ('composed', 0.04), ('hermann', 0.04), ('statistical', 0.04), ('leaves', 0.039), ('works', 0.038), ('tt', 0.038), ('models', 0.038), ('ney', 0.037), ('letters', 0.037), ('ta', 0.037), ('akira', 0.037), ('tored', 0.037), ('vandeghinste', 0.037), ('dauchet', 0.037), ('stt', 0.037), ('righthand', 0.037), ('atica', 0.037), ('planned', 0.037), ('helena', 0.037), ('ape', 0.037), ('unfeasible', 0.037), ('ipropose', 0.037), ('advent', 0.037), ('eilns', 0.037), ('revi', 0.037), ('mainly', 0.036), ('extended', 0.036), ('post', 0.036), ('hopkins', 0.035), ('petrov', 0.035), ('np', 0.034), ('michel', 0.034)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999976 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
Author: Daniel Emilio Beck
Abstract: In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.
2 0.17253357 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu
Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.
3 0.17018911 30 acl-2011-Adjoining Tree-to-String Translation
Author: Yang Liu ; Qun Liu ; Yajuan Lu
Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.
4 0.16454519 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
5 0.15803131 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
6 0.15757641 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
7 0.15528002 61 acl-2011-Binarized Forest to String Translation
8 0.15099625 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
9 0.13409676 57 acl-2011-Bayesian Word Alignment for Statistical Machine Translation
10 0.13207579 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
11 0.12859571 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
12 0.11831298 313 acl-2011-Two Easy Improvements to Lexical Weighting
13 0.10812224 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
14 0.10581459 188 acl-2011-Judging Grammaticality with Tree Substitution Grammar Derivations
15 0.10025903 154 acl-2011-How to train your multi bottom-up tree transducer
16 0.098493159 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
17 0.098351754 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation
18 0.094136052 266 acl-2011-Reordering with Source Language Collocations
19 0.09375751 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
20 0.089941062 43 acl-2011-An Unsupervised Model for Joint Phrase Alignment and Extraction
topicId topicWeight
[(0, 0.216), (1, -0.207), (2, 0.119), (3, 0.019), (4, 0.053), (5, 0.029), (6, -0.11), (7, -0.041), (8, -0.017), (9, -0.009), (10, -0.027), (11, -0.038), (12, -0.012), (13, -0.033), (14, 0.022), (15, -0.02), (16, 0.005), (17, 0.001), (18, -0.012), (19, 0.025), (20, -0.02), (21, 0.013), (22, -0.006), (23, 0.012), (24, 0.069), (25, -0.016), (26, 0.025), (27, -0.016), (28, -0.032), (29, -0.011), (30, 0.008), (31, -0.039), (32, -0.012), (33, -0.025), (34, -0.015), (35, 0.023), (36, -0.012), (37, -0.038), (38, -0.014), (39, 0.016), (40, -0.07), (41, 0.009), (42, -0.0), (43, -0.012), (44, -0.099), (45, 0.066), (46, -0.001), (47, -0.07), (48, 0.067), (49, -0.025)]
simIndex simValue paperId paperTitle
same-paper 1 0.95604497 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
Author: Daniel Emilio Beck
Abstract: In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.
2 0.86362547 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
3 0.8200143 30 acl-2011-Adjoining Tree-to-String Translation
Author: Yang Liu ; Qun Liu ; Yajuan Lu
Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.
4 0.81888115 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
Author: Ashish Vaswani ; Haitao Mi ; Liang Huang ; David Chiang
Abstract: Most statistical machine translation systems rely on composed rules (rules that can be formed out of smaller rules in the grammar). Though this practice improves translation by weakening independence assumptions in the translation model, it nevertheless results in huge, redundant grammars, making both training and decoding inefficient. Here, we take the opposite approach, where we only use minimal rules (those that cannot be formed out of other rules), and instead rely on a rule Markov model of the derivation history to capture dependencies between minimal rules. Large-scale experiments on a state-of-the-art tree-to-string translation system show that our approach leads to a slimmer model, a faster decoder, yet the same translation quality (measured using B ) as composed rules.
5 0.81497592 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
Author: Lane Schwartz ; Chris Callison-Burch ; William Schuler ; Stephen Wu
Abstract: This paper describes a novel technique for incorporating syntactic knowledge into phrasebased machine translation through incremental syntactic parsing. Bottom-up and topdown parsers typically require a completed string as input. This requirement makes it difficult to incorporate them into phrase-based translation, which generates partial hypothesized translations from left-to-right. Incremental syntactic language models score sentences in a similar left-to-right fashion, and are therefore a good mechanism for incorporat- ing syntax into phrase-based translation. We give a formal definition of one such lineartime syntactic language model, detail its relation to phrase-based decoding, and integrate the model with the Moses phrase-based translation system. We present empirical results on a constrained Urdu-English translation task that demonstrate a significant BLEU score improvement and a large decrease in perplexity.
6 0.79418766 61 acl-2011-Binarized Forest to String Translation
7 0.7937603 217 acl-2011-Machine Translation System Combination by Confusion Forest
8 0.78548902 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
9 0.78476334 154 acl-2011-How to train your multi bottom-up tree transducer
10 0.77704602 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
11 0.6948486 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages
12 0.69460917 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach
13 0.68973237 313 acl-2011-Two Easy Improvements to Lexical Weighting
14 0.68732285 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
15 0.65658921 44 acl-2011-An exponential translation model for target language morphology
16 0.65577573 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
17 0.63203502 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
18 0.63178968 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach
19 0.63057899 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
20 0.62329739 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
topicId topicWeight
[(5, 0.015), (17, 0.041), (26, 0.011), (37, 0.045), (39, 0.03), (41, 0.033), (55, 0.013), (59, 0.024), (72, 0.019), (91, 0.026), (96, 0.672)]
simIndex simValue paperId paperTitle
1 0.99969441 25 acl-2011-A Simple Measure to Assess Non-response
Author: Anselmo Penas ; Alvaro Rodrigo
Abstract: There are several tasks where is preferable not responding than responding incorrectly. This idea is not new, but despite several previous attempts there isn’t a commonly accepted measure to assess non-response. We study here an extension of accuracy measure with this feature and a very easy to understand interpretation. The measure proposed (c@1) has a good balance of discrimination power, stability and sensitivity properties. We show also how this measure is able to reward systems that maintain the same number of correct answers and at the same time decrease the number of incorrect ones, by leaving some questions unanswered. This measure is well suited for tasks such as Reading Comprehension tests, where multiple choices per question are given, but only one is correct.
2 0.99850267 49 acl-2011-Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?
Author: Maoxi Li ; Chengqing Zong ; Hwee Tou Ng
Abstract: Word is usually adopted as the smallest unit in most tasks of Chinese language processing. However, for automatic evaluation of the quality of Chinese translation output when translating from other languages, either a word-level approach or a character-level approach is possible. So far, there has been no detailed study to compare the correlations of these two approaches with human assessment. In this paper, we compare word-level metrics with characterlevel metrics on the submitted output of English-to-Chinese translation systems in the IWSLT’08 CT-EC and NIST’08 EC tasks. Our experimental results reveal that character-level metrics correlate with human assessment better than word-level metrics. Our analysis suggests several key reasons behind this finding. 1
same-paper 3 0.9982022 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
Author: Daniel Emilio Beck
Abstract: In this paper I present a Master’s thesis proposal in syntax-based Statistical Machine Translation. Ipropose to build discriminative SMT models using both tree-to-string and tree-to-tree approaches. Translation and language models will be represented mainly through the use of Tree Automata and Tree Transducers. These formalisms have important representational properties that makes them well-suited for syntax modeling. Ialso present an experiment plan to evaluate these models through the use of a parallel corpus written in English and Brazilian Portuguese.
4 0.99716282 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles
Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose
Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).
Author: Vicent Alabau ; Alberto Sanchis ; Francisco Casacuberta
Abstract: In interactive machine translation (IMT), a human expert is integrated into the core of a machine translation (MT) system. The human expert interacts with the IMT system by partially correcting the errors of the system’s output. Then, the system proposes a new solution. This process is repeated until the output meets the desired quality. In this scenario, the interaction is typically performed using the keyboard and the mouse. In this work, we present an alternative modality to interact within IMT systems by writing on a tactile display or using an electronic pen. An on-line handwritten text recognition (HTR) system has been specifically designed to operate with IMT systems. Our HTR system improves previous approaches in two main aspects. First, HTR decoding is tightly coupled with the IMT system. Second, the language models proposed are context aware, in the sense that they take into account the partial corrections and the source sentence by using a combination of ngrams and word-based IBM models. The proposed system achieves an important boost in performance with respect to previous work.
6 0.99587762 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names
7 0.99510443 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System
8 0.99045277 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity
9 0.98635811 82 acl-2011-Content Models with Attitude
10 0.97579706 41 acl-2011-An Interactive Machine Translation System with Online Learning
11 0.97155863 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge
12 0.96286649 266 acl-2011-Reordering with Source Language Collocations
13 0.95425206 264 acl-2011-Reordering Metrics for MT
14 0.95140833 169 acl-2011-Improving Question Recommendation by Exploiting Information Need
15 0.94779938 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization
16 0.94631416 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment
17 0.94526792 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations
18 0.9432745 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search
19 0.94094735 326 acl-2011-Using Bilingual Information for Cross-Language Document Summarization
20 0.93779176 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages