acl acl2012 acl2012-13 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Meritxell Gonzalez ; Jesus Gimenez ; Lluis Marquez
Abstract: Error analysis in machine translation is a necessary step in order to investigate the strengths and weaknesses of the MT systems under development and allow fair comparisons among them. This work presents an application that shows how a set of heterogeneous automatic metrics can be used to evaluate a test bed of automatic translations. To do so, we have set up an online graphical interface for the ASIYA toolkit, a rich repository of evaluation measures working at different linguistic levels. The current implementation of the interface shows constituency and dependency trees as well as shallow syntactic and semantic annotations, and word alignments. The intelligent visualization of the linguistic structures used by the metrics, as well as a set of navigational functionalities, may lead towards advanced methods for automatic error analysis.
Reference: text
sentIndex sentText sentNum sentScore
1 edu , , Abstract Error analysis in machine translation is a necessary step in order to investigate the strengths and weaknesses of the MT systems under development and allow fair comparisons among them. [sent-3, score-0.293]
2 This work presents an application that shows how a set of heterogeneous automatic metrics can be used to evaluate a test bed of automatic translations. [sent-4, score-0.387]
3 To do so, we have set up an online graphical interface for the ASIYA toolkit, a rich repository of evaluation measures working at different linguistic levels. [sent-5, score-0.634]
4 The current implementation of the interface shows constituency and dependency trees as well as shallow syntactic and semantic annotations, and word alignments. [sent-6, score-0.553]
5 The intelligent visualization of the linguistic structures used by the metrics, as well as a set of navigational functionalities, may lead towards advanced methods for automatic error analysis. [sent-7, score-0.335]
6 1 Introduction Evaluation methods are a key ingredient in the development cycle of machine translation (MT) systems. [sent-8, score-0.237]
7 We focus here on the processes involved in the error analysis stage in which MT developers need to understand the output of their systems and to assess the improvements introduced. [sent-10, score-0.284]
8 139 Automatic detection and classification of the errors produced by MT systems is a challenging problem. [sent-11, score-0.161]
9 The cause of such errors may depend not only on the translation paradigm adopted, but also on the language pairs, the availability of enough linguistic resources and the performance of the linguistic processors, among others. [sent-12, score-0.319]
10 Several past research works studied and defined fine-grained typologies of translation errors according to various criteria (Vilar et al. [sent-13, score-0.262]
11 , 2007), which helped manual annotation and human analysis of the systems during the MT development cycle. [sent-16, score-0.092]
12 Recently, the task has received increasing attention towards the automatic detection, classification and analysis of these errors, and new tools have been made available to the community. [sent-17, score-0.206]
13 Examples of such tools are AMEANA (Kholy and Habash, 2011), which focuses on morphologically rich languages, and Hjerson (Popovi c´, 2011), which addresses automatic error classification at lexical level. [sent-18, score-0.338]
14 In this work we present an online graphical interface to access ASIYA, an existing software designed to evaluate automatic translations using an heterogeneous set of metrics and meta-metrics. [sent-19, score-0.764]
15 The primary goal of the online interface is to allow MT developers to upload their test beds, obtain a large set ofmetric scores and then, detect and analyze the errors of their systems using just their Internet browsers. [sent-20, score-0.644]
16 Additionally, the graphical interface of the toolkit may help developers to better understand the strengths and weaknesses of the existing evaluation measures and to support the development of further improvements or even totally new evaluation metrics. [sent-21, score-0.814]
17 c so2c0ia1t2io Ans fso rc Ciatoiomnp fuotart Cio nmaplu Ltiantgiounisatlic Lsi,n pgaugiestsi1c 3s9–14 , Figure 1: MT systems development cycle ence of ASIYA’s developers and also from the statistics given through the interface to the ASIYA’s users. [sent-24, score-0.481]
18 Section 3 describes the variety of information gathered during the evaluation process, and Section 4 provides details on the graphical interface developed to display this information. [sent-26, score-0.511]
19 Finally, Section 5 overviews recent work related to MT error analysis, and Section 6 concludes and reports some ongoing and future work. [sent-27, score-0.114]
20 2 The ASIYA Toolkit ASIYA is an open toolkit designed to assist developers of both MT systems and evaluation measures by offering a rich set of metrics and meta-metrics for assessing MT quality (Gim e´nez and M `arquez, 2010a). [sent-28, score-0.433]
21 Although automatic MT evaluation is still far from manual evaluation, it is indeed necessary to avoid the bottleneck introduced by a fully manual evaluation in the system development cycle. [sent-29, score-0.155]
22 Recently, there has been empirical and theoretical justification that a combination of several metrics scoring different aspects of translation quality should correlate better with humans than just a single automatic metric (Amig ´o et al. [sent-30, score-0.442]
23 These metrics rely on different similarity principles (such as precision, recall and overlap) and operate at different linguistic layers (from lexical to syntactic and semantic). [sent-33, score-0.3]
24 Syntactic similarity: based on part-of-speech tags, base phrase chunks, and dependency and constituency trees (e. [sent-38, score-0.172]
25 Semantic similarity: based on named entities, semantic roles and discourse representation (e. [sent-41, score-0.096]
26 Such heterogeneous set of metrics allow the user to analyze diverse aspects of translation quality at system, document and sentence levels. [sent-44, score-0.492]
27 As discussed in (Gim e´nez and M `arquez, 2008), the widely used lexical-based measures should be considered carefully at sentence level, as they tend to penalize translations using different lexical selection. [sent-45, score-0.173]
28 The combi- nation with complex metrics, more focused on adequacy aspects of the translation (e. [sent-46, score-0.179]
29 3 The Metric-dependent Information ASIYA operates over a fixed set of translation test cases, i. [sent-49, score-0.131]
30 , a source text, a set of candidate translations and a set of manually produced reference translations. [sent-51, score-0.289]
31 To run ASIYA the user must provide a test case and select the preferred set of metrics (it may depend on the evaluation purpose). [sent-52, score-0.215]
32 This kind of results is valuable for rapid evaluation and ranking of translations and systems. [sent-54, score-0.113]
33 However, it is unfriendly for MT developers that need to manually analyze and compare specific aspects of their systems. [sent-55, score-0.223]
34 During the evaluation process, ASIYA generates a number of intermediate analysis containing partial work outs of the evaluation measures. [sent-56, score-0.148]
35 These data constitute a priceless source for analysis purposes since a close examination of their content al- lows for analyzing the particular characteristics that 1A more detailed description of the metric set and its implementation can be found in (Gim e´nez and M `arquez, 2010b). [sent-57, score-0.166]
36 g- Table 1: The reference sentence, two candidate translation examples and the Ol scores calculation differentiate the score values obtained by each candidate translation. [sent-63, score-0.39]
37 Next, we review the type of information used by each family of measures according to their classification, and how this information can be used for MT error analysis purposes. [sent-64, score-0.282]
38 First, the sets of all lexical items that are found in the reference and the candidate sentences are considered. [sent-68, score-0.245]
39 Then, Ol is computed as the cardinality of their intersection divided by the cardinality of their union. [sent-69, score-0.086]
40 The example in Table 1shows the counts used to calculate Ol between the reference and two candidate translations (boldface and underline indicate nonmatched items in candidate 1 and 2, respectively). [sent-70, score-0.341]
41 Similarly, metrics in another category measure the edit distance of a translation, i. [sent-71, score-0.133]
42 , the number of word insertions, deletions and substitutions that are needed to convert a candidate translation into a reference. [sent-73, score-0.222]
43 On another front, metrics as BLEU or NIST compute a weighted average of matching n-grams. [sent-75, score-0.133]
44 An interesting information that can be obtained from these metrics are the weights assigned to each individual matching n-gram. [sent-76, score-0.133]
45 This information can be obtained from the imple- mentation of the metrics and presented to the user through the graphical interface. [sent-78, score-0.329]
46 ASIYA considers three levels of syntactic information: shallow, constituent and dependency parsing. [sent-80, score-0.179]
47 The shallow parsing annotations, that are obtained from the linguistic processors, consist of word level part-of-speech, lemmas and chunk Begin-Inside-Outside labels. [sent-81, score-0.131]
48 Useful figures such as the matching rate of a given (sub)category of items are the base of a group of metrics (i. [sent-82, score-0.16]
49 In addition, dependency and constituency parse trees allow for capturing other aspects of the translations. [sent-85, score-0.22]
50 For instance, DP-HCWM is a specific subset of the dependency measures that consists of retrieving and matching all the head-word chains (or the ones of a given length) from the dependency trees. [sent-86, score-0.142]
51 Similarly, CP-STM, a subset of the constituency parsing family of measures, consists of computing the lexical overlap according to the phrase constituent of a given type. [sent-87, score-0.287]
52 Then, for error analysis purposes, parse trees com- bine the grammatical relations and the grammatical categories of the words in the sentence and display the information they contain. [sent-88, score-0.264]
53 Figure 2 and 3 show, respectively, several annotation levels of the sentences in the example and the constituency trees. [sent-89, score-0.138]
54 ASIYA distinguishes also three levels of semantic information: named entities, semantic roles and discourse representations. [sent-91, score-0.19]
55 The former are post-processed similarly to the lexical annotations discussed above; and the semantic predicate-argument trees are post-processed and displayed in a similar manner to the syntactic trees. [sent-92, score-0.234]
56 Instead, the purpose of the discourse representation analysis is to evaluate candidate translations at document level. [sent-93, score-0.299]
57 In the nested discourse structures we could identify the lexical choices for each discourse sub-type. [sent-94, score-0.166]
58 4 The Graphical Interface This section presents the web application that makes possible a graphical visualization and interactive access to ASIYA. [sent-96, score-0.327]
59 First, it has been designed to facilitate the use of the ASIYA toolkit for rapid evaluation of test beds. [sent-98, score-0.117]
60 The online application obtains the measures for all the metrics and levels and generates an interactive table of scores displaying the values for all the measures. [sent-103, score-0.441]
61 Table organiza142 scores for several systems tion can swap among the three levels of granularity, and it can also be transposed with respect to system and metric information (transposing rows and columns). [sent-104, score-0.155]
62 When the metric basis table is shown, the user can select one or more metric columns in order to re-rank the rows accordingly. [sent-105, score-0.222]
63 Moreover, the source, reference and candidate translation are displayed along with metric scores. [sent-106, score-0.371]
64 We have also integrated a graphical library2 to generate real-time interactive plots to show the metric scores graphically. [sent-108, score-0.293]
65 The current version of the interface shows interactive bar charts, where different metrics and systems can be combined in the same plot. [sent-109, score-0.471]
66 2 Graphically-aided Error Analysis and Diagnosis Human analysis is crucial in the development cycle because humans have the capability to spot errors and analyze them subjectively, in relation to the underlying system that is being examined and the scores obtained. [sent-112, score-0.303]
67 Our purpose, as mentioned previously, is to generate a graphical representation of the information related to the source and the translations, enabling a visual analysis of the errors. [sent-113, score-0.239]
68 We have focused on the linguistic measures at the syntactic and semantic level, since they are more robust than lexical metrics when comparing systems based on different paradigms. [sent-114, score-0.377]
69 On the one hand, one of the views of the interface allows a user to navigate and inspect the segments of the test set. [sent-115, score-0.313]
70 com/ given criteria based on the various linguistic annotations aforementioned (e. [sent-118, score-0.115]
71 The interface integrates also the mechanisms to upload word-by-word alignments between the source and any of the candidates. [sent-121, score-0.478]
72 The alignments are also visualized along with the rest of the annotations, and they can be also used to calculate artificial annotations projected from the source in such test beds for which there is no linguistic processors available. [sent-122, score-0.306]
73 On the other hand, the web application includes a library for SVG graph generation in order to create the dependency and the constituent trees dynamically (as shown in Figure 3). [sent-123, score-0.209]
74 3 Accessing the Demo The online interface is fully functional and accessible at http : / /nlp . [sent-125, score-0.306]
75 Although the ASIYA toolkit is not difficult to install, some specific technical skills are still needed in order to set up all its capabilities (i. [sent-129, score-0.091]
76 , external components and resources such as linguistic processors and dictionaries). [sent-131, score-0.126]
77 The website in- cludes a tarball with sample input data and a video recording, which demonstrates the main functionalities of the interface and how to use it. [sent-133, score-0.326]
78 The current web-based interface allows the user to upload up to five candidate translation files, five reference files and one source file (maximum size of 200K each, which is enough for test bed of about 1K sentences). [sent-134, score-0.812]
79 5 Related Work In the literature, we can find detailed typologies of the errors produced by MT systems (Vilar et al. [sent-136, score-0.16]
80 , 2007) and graphical interfaces for human classification and annotation of these errors, such as BLAST (Stymne, 2011). [sent-139, score-0.199]
81 (201 1) classify the errors that arise during Spanish-Catalan translation at several levels: orthographic, morphological, lexical, semantic and syntactic errors. [sent-145, score-0.28]
82 Works towards the automatic identification and classification of errors have been conducted very recently. [sent-146, score-0.227]
83 The AMEANA tool (Kholy and Habash, 2011) uses alignments to produce detailed morphological error diagnosis and generates statistics at different linguistic levels. [sent-149, score-0.317]
84 To the best of our knowledge, the existing approaches to automatic error classification are centered on the lexical, morphological and shallow syntactic aspects of the translation, i. [sent-150, score-0.385]
85 In contrast, we introduce additional linguistic information, such as dependency and constituent parsing trees, discourse structures and semantic roles. [sent-153, score-0.269]
86 Also, there exist very few tools devoted to visualize the errors produced by the MT systems. [sent-154, score-0.107]
87 Here, instead of dealing with the automatic classification of errors, we deal with the automatic selection and visualization of the information used by the evaluation measures. [sent-155, score-0.272]
88 6 Conclusions and Future Work The main goal of the ASIYA toolkit is to cover the evaluation needs of researchers during the development cycle of their systems. [sent-156, score-0.196]
89 ASIYA generates a number of linguistic analyses over both the candidate and the reference translations. [sent-157, score-0.252]
90 However, the current command-line interface returns the results only in text mode and does not allow for fully exploiting this linguistic information. [sent-158, score-0.317]
91 We present a graphical interface showing a visual representation of such data for monitoring the MT development cycle. [sent-159, score-0.442]
92 We believe that it would be very helpful for car- rying out tasks such as error analysis, system comparison and graphical representations. [sent-160, score-0.259]
93 The application described here is the first release of a web interface to access ASIYA online. [sent-161, score-0.297]
94 So far, it includes the mechanisms to analyze 4 out of 10 categories of metrics: shallow parsing, dependency parsing, constituent parsing and named entities. [sent-162, score-0.286]
95 Regarding the analysis of the sentences, we have conducted a small experiment to show the ability of the interface to use word level alignments between the source and the target sentences. [sent-164, score-0.392]
96 In the near future, we will include the mechanisms to upload also phrase level alignments. [sent-165, score-0.143]
97 This functionality will also give the chance to develop a new family of evaluation metrics based on these alignments. [sent-166, score-0.207]
98 Regarding the interactive aspects of the interface, the grammatical graphs are dynamically generated in SVG format, which proffers a wide range of interactive functionalities. [sent-167, score-0.231]
99 Finally, in order to improve error analysis capabilities, we will endow the application with a search engine able to filter the results according to varied user defined criteria. [sent-173, score-0.257]
100 The main goal is to provide the mechanisms to select a case set where, for instance, all the sentences are scored above (or below) a threshold for a given metric (or a subset of them). [sent-174, score-0.13]
wordName wordTfidf (topN-words)
[('asiya', 0.615), ('interface', 0.262), ('mt', 0.174), ('gim', 0.169), ('nez', 0.169), ('popovi', 0.16), ('graphical', 0.145), ('metrics', 0.133), ('translation', 0.131), ('error', 0.114), ('developers', 0.113), ('candidate', 0.091), ('jes', 0.085), ('upload', 0.085), ('constituency', 0.082), ('maja', 0.08), ('svg', 0.08), ('errors', 0.078), ('reference', 0.077), ('arquez', 0.077), ('interactive', 0.076), ('metric', 0.072), ('visualization', 0.071), ('cycle', 0.071), ('processors', 0.071), ('kholy', 0.07), ('farr', 0.07), ('measures', 0.068), ('heterogeneous', 0.067), ('functionalities', 0.064), ('analyze', 0.062), ('llu', 0.061), ('ol', 0.061), ('annotations', 0.06), ('jos', 0.06), ('toolkit', 0.059), ('automatic', 0.058), ('discourse', 0.058), ('mechanisms', 0.058), ('analysis', 0.057), ('kirchhoff', 0.056), ('vilar', 0.056), ('levels', 0.056), ('translations', 0.055), ('linguistic', 0.055), ('classification', 0.054), ('ameana', 0.053), ('blast', 0.053), ('deia', 0.053), ('diagnosis', 0.053), ('fishel', 0.053), ('hjerson', 0.053), ('typologies', 0.053), ('trees', 0.053), ('constituent', 0.053), ('user', 0.051), ('lexical', 0.05), ('habash', 0.049), ('aspects', 0.048), ('shallow', 0.048), ('amig', 0.047), ('beds', 0.047), ('online', 0.044), ('family', 0.043), ('cardinality', 0.043), ('files', 0.042), ('display', 0.04), ('semantic', 0.038), ('purpose', 0.038), ('mari', 0.038), ('weaknesses', 0.038), ('source', 0.037), ('dependency', 0.037), ('towards', 0.037), ('alignments', 0.036), ('bed', 0.036), ('wer', 0.036), ('application', 0.035), ('development', 0.035), ('nizar', 0.034), ('syntactic', 0.033), ('morphologically', 0.033), ('gathered', 0.033), ('capabilities', 0.032), ('strengths', 0.032), ('overlap', 0.031), ('evaluation', 0.031), ('dynamically', 0.031), ('morphological', 0.03), ('generates', 0.029), ('rich', 0.029), ('similarity', 0.029), ('produced', 0.029), ('hermann', 0.028), ('bulletin', 0.028), ('parsing', 0.028), ('items', 0.027), ('rapid', 0.027), ('rows', 0.027), ('summit', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000007 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis
Author: Meritxell Gonzalez ; Jesus Gimenez ; Lluis Marquez
Abstract: Error analysis in machine translation is a necessary step in order to investigate the strengths and weaknesses of the MT systems under development and allow fair comparisons among them. This work presents an application that shows how a set of heterogeneous automatic metrics can be used to evaluate a test bed of automatic translations. To do so, we have set up an online graphical interface for the ASIYA toolkit, a rich repository of evaluation measures working at different linguistic levels. The current implementation of the interface shows constituency and dependency trees as well as shallow syntactic and semantic annotations, and word alignments. The intelligent visualization of the linguistic structures used by the metrics, as well as a set of navigational functionalities, may lead towards advanced methods for automatic error analysis.
2 0.13232426 96 acl-2012-Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
Author: Jinho D. Choi ; Martha Palmer
Abstract: This paper presents a novel way of improving POS tagging on heterogeneous data. First, two separate models are trained (generalized and domain-specific) from the same data set by controlling lexical items with different document frequencies. During decoding, one of the models is selected dynamically given the cosine similarity between each sentence and the training data. This dynamic model selection approach, coupled with a one-pass, leftto-right POS tagging algorithm, is evaluated on corpora from seven different genres. Even with this simple tagging algorithm, our system shows comparable results against other state-of-the-art systems, and gives higher accuracies when evaluated on a mixture of the data. Furthermore, our system is able to tag about 32K tokens per second. this model selection approach to more sophisticated tagging improve their robustness even We believe that can be applied algorithms and further.
3 0.11570487 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation
Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li
Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1
4 0.11177213 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
Author: Spence Green ; John DeNero
Abstract: When automatically translating from a weakly inflected source language like English to a target language with richer grammatical features such as gender and dual number, the output commonly contains morpho-syntactic agreement errors. To address this issue, we present a target-side, class-based agreement model. Agreement is promoted by scoring a sequence of fine-grained morpho-syntactic classes that are predicted during decoding for each translation hypothesis. For English-to-Arabic translation, our model yields a +1.04 BLEU average improvement over a state-of-the-art baseline. The model does not require bitext or phrase table annotations and can be easily implemented as a feature in many phrase-based decoders. 1
5 0.093372285 140 acl-2012-Machine Translation without Words through Substring Alignment
Author: Graham Neubig ; Taro Watanabe ; Shinsuke Mori ; Tatsuya Kawahara
Abstract: In this paper, we demonstrate that accurate machine translation is possible without the concept of “words,” treating MT as a problem of transformation between character strings. We achieve this result by applying phrasal inversion transduction grammar alignment techniques to character strings to train a character-based translation model, and using this in the phrase-based MT framework. We also propose a look-ahead parsing algorithm and substring-informed prior probabilities to achieve more effective and efficient alignment. In an evaluation, we demonstrate that character-based translation can achieve results that compare to word-based systems while effectively translating unknown and uncommon words over several language pairs.
6 0.090513125 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
7 0.089278266 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
8 0.089027494 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?
9 0.085563928 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models
10 0.082256369 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation
11 0.077729754 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
12 0.077433214 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
13 0.075247929 179 acl-2012-Smaller Alignment Models for Better Translations: Unsupervised Word Alignment with the l0-norm
14 0.074987978 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
15 0.074787624 134 acl-2012-Learning to Find Translations and Transliterations on the Web
16 0.073147886 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
17 0.072785445 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment
18 0.072105072 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT
19 0.070630312 203 acl-2012-Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information
20 0.069919638 136 acl-2012-Learning to Translate with Multiple Objectives
topicId topicWeight
[(0, -0.23), (1, -0.074), (2, -0.003), (3, 0.023), (4, 0.084), (5, 0.012), (6, 0.002), (7, -0.042), (8, -0.055), (9, 0.075), (10, -0.01), (11, 0.027), (12, 0.027), (13, 0.049), (14, 0.028), (15, 0.035), (16, -0.011), (17, -0.014), (18, -0.003), (19, -0.108), (20, -0.023), (21, -0.032), (22, 0.057), (23, 0.042), (24, 0.128), (25, 0.05), (26, 0.014), (27, 0.066), (28, 0.051), (29, 0.036), (30, 0.115), (31, -0.011), (32, 0.034), (33, 0.009), (34, -0.004), (35, 0.039), (36, -0.191), (37, 0.002), (38, 0.129), (39, -0.081), (40, 0.042), (41, 0.026), (42, 0.216), (43, -0.025), (44, -0.088), (45, -0.113), (46, -0.151), (47, 0.033), (48, -0.144), (49, 0.048)]
simIndex simValue paperId paperTitle
same-paper 1 0.92934752 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis
Author: Meritxell Gonzalez ; Jesus Gimenez ; Lluis Marquez
Abstract: Error analysis in machine translation is a necessary step in order to investigate the strengths and weaknesses of the MT systems under development and allow fair comparisons among them. This work presents an application that shows how a set of heterogeneous automatic metrics can be used to evaluate a test bed of automatic translations. To do so, we have set up an online graphical interface for the ASIYA toolkit, a rich repository of evaluation measures working at different linguistic levels. The current implementation of the interface shows constituency and dependency trees as well as shallow syntactic and semantic annotations, and word alignments. The intelligent visualization of the linguistic structures used by the metrics, as well as a set of navigational functionalities, may lead towards advanced methods for automatic error analysis.
2 0.68855584 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
Author: Andrejs Vasiljevs ; Raivis Skadins ; Jorg Tiedemann
Abstract: Machine Translation Raivis Skadiņš TILDE Vienbas gatve 75a, Riga LV-1004, LATVIA raivi s . s kadins @ ti lde . lv Jörg Tiedemann Uppsala University Box 635, Uppsala SE-75 126, SWEDEN j org .t iedemann@ l ingfi l .uu . se the Universities of Copenhagen, and Uppsala. Edinburgh, Zagreb, To facilitate the creation and usage of custom SMT systems we have created a cloud-based platform for do-it-yourself MT. The platform is developed in the EU collaboration project LetsMT! . This system demonstration paper presents the motivation in developing the LetsMT! platform, its main features, architecture, and an evaluation in a practical use case. 1
3 0.5990563 136 acl-2012-Learning to Translate with Multiple Objectives
Author: Kevin Duh ; Katsuhito Sudoh ; Xianchao Wu ; Hajime Tsukada ; Masaaki Nagata
Abstract: We introduce an approach to optimize a machine translation (MT) system on multiple metrics simultaneously. Different metrics (e.g. BLEU, TER) focus on different aspects of translation quality; our multi-objective approach leverages these diverse aspects to improve overall quality. Our approach is based on the theory of Pareto Optimality. It is simple to implement on top of existing single-objective optimization methods (e.g. MERT, PRO) and outperforms ad hoc alternatives based on linear-combination of metrics. We also discuss the issue of metric tunability and show that our Pareto approach is more effective in incorporating new metrics from MT evaluation for MT optimization.
4 0.56632423 1 acl-2012-ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora
Author: Marcis Pinnis ; Radu Ion ; Dan Stefanescu ; Fangzhong Su ; Inguna Skadina ; Andrejs Vasiljevs ; Bogdan Babych
Abstract: The lack of parallel corpora and linguistic resources for many languages and domains is one of the major obstacles for the further advancement of automated translation. A possible solution is to exploit comparable corpora (non-parallel bi- or multi-lingual text resources) which are much more widely available than parallel translation data. Our presented toolkit deals with parallel content extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction of parallel sentences and (2) extraction and bilingual mapping of terms and named entities. The toolkit pairs similar bilingual comparable documents and extracts parallel sentences and bilingual terminological and named entity dictionaries from comparable corpora. This demonstration focuses on the English, Latvian, Lithuanian, and Romanian languages.
5 0.50836295 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
Author: Chang Liu ; Hwee Tou Ng
Abstract: In this work, we introduce the TESLACELAB metric (Translation Evaluation of Sentences with Linear-programming-based Analysis Character-level Evaluation for Languages with Ambiguous word Boundaries) for automatic machine translation evaluation. For languages such as Chinese where words usually have meaningful internal structure and word boundaries are often fuzzy, TESLA-CELAB acknowledges the advantage of character-level evaluation over word-level evaluation. By reformulating the problem in the linear programming framework, TESLACELAB addresses several drawbacks of the character-level metrics, in particular the modeling of synonyms spanning multiple characters. We show empirically that TESLACELAB significantly outperforms characterlevel BLEU in the English-Chinese translation evaluation tasks. –
6 0.49215448 43 acl-2012-Building Trainable Taggers in a Web-based, UIMA-Supported NLP Workbench
7 0.47200838 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
8 0.44491753 178 acl-2012-Sentence Simplification by Monolingual Machine Translation
9 0.43862116 163 acl-2012-Prediction of Learning Curves in Machine Translation
10 0.42917812 96 acl-2012-Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
11 0.42601883 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations
12 0.4220708 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation
13 0.41225335 164 acl-2012-Private Access to Phrase Tables for Statistical Machine Translation
14 0.40824738 9 acl-2012-A Cost Sensitive Part-of-Speech Tagging: Differentiating Serious Errors from Minor Errors
15 0.39333266 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
16 0.38971442 49 acl-2012-Coarse Lexical Semantic Annotation with Supersenses: An Arabic Case Study
17 0.38679507 34 acl-2012-Automatically Learning Measures of Child Language Development
18 0.3794778 131 acl-2012-Learning Translation Consensus with Structured Label Propagation
19 0.37879357 215 acl-2012-WizIE: A Best Practices Guided Development Environment for Information Extraction
20 0.37739864 168 acl-2012-Reducing Approximation and Estimation Errors for Chinese Lexical Processing with Heterogeneous Annotations
topicId topicWeight
[(25, 0.016), (26, 0.029), (28, 0.041), (30, 0.018), (37, 0.017), (39, 0.048), (57, 0.019), (74, 0.052), (82, 0.02), (84, 0.019), (85, 0.441), (90, 0.112), (92, 0.035), (94, 0.015), (99, 0.049)]
simIndex simValue paperId paperTitle
1 0.91095459 82 acl-2012-Entailment-based Text Exploration with Application to the Health-care Domain
Author: Meni Adler ; Jonathan Berant ; Ido Dagan
Abstract: We present a novel text exploration model, which extends the scope of state-of-the-art technologies by moving from standard concept-based exploration to statement-based exploration. The proposed scheme utilizes the textual entailment relation between statements as the basis of the exploration process. A user of our system can explore the result space of a query by drilling down/up from one statement to another, according to entailment relations specified by an entailment graph and an optional concept taxonomy. As a prominent use case, we apply our exploration system and illustrate its benefit on the health-care domain. To the best of our knowledge this is the first implementation of an exploration system at the statement level that is based on the textual entailment relation. 1
same-paper 2 0.89514154 13 acl-2012-A Graphical Interface for MT Evaluation and Error Analysis
Author: Meritxell Gonzalez ; Jesus Gimenez ; Lluis Marquez
Abstract: Error analysis in machine translation is a necessary step in order to investigate the strengths and weaknesses of the MT systems under development and allow fair comparisons among them. This work presents an application that shows how a set of heterogeneous automatic metrics can be used to evaluate a test bed of automatic translations. To do so, we have set up an online graphical interface for the ASIYA toolkit, a rich repository of evaluation measures working at different linguistic levels. The current implementation of the interface shows constituency and dependency trees as well as shallow syntactic and semantic annotations, and word alignments. The intelligent visualization of the linguistic structures used by the metrics, as well as a set of navigational functionalities, may lead towards advanced methods for automatic error analysis.
3 0.86094934 165 acl-2012-Probabilistic Integration of Partial Lexical Information for Noise Robust Haptic Voice Recognition
Author: Khe Chai Sim
Abstract: This paper presents a probabilistic framework that combines multiple knowledge sources for Haptic Voice Recognition (HVR), a multimodal input method designed to provide efficient text entry on modern mobile devices. HVR extends the conventional voice input by allowing users to provide complementary partial lexical information via touch input to improve the efficiency and accuracy of voice recognition. This paper investigates the use of the initial letter of the words in the utterance as the partial lexical information. In addition to the acoustic and language models used in automatic speech recognition systems, HVR uses the haptic and partial lexical models as additional knowledge sources to reduce the recognition search space and suppress confusions. Experimental results show that both the word error rate and runtime factor can be re- duced by a factor of two using HVR.
4 0.85286623 152 acl-2012-Multilingual WSD with Just a Few Lines of Code: the BabelNet API
Author: Roberto Navigli ; Simone Paolo Ponzetto
Abstract: In this paper we present an API for programmatic access to BabelNet a wide-coverage multilingual lexical knowledge base and multilingual knowledge-rich Word Sense Disambiguation (WSD). Our aim is to provide the research community with easy-to-use tools to perform multilingual lexical semantic analysis and foster further research in this direction. – –
5 0.82781065 96 acl-2012-Fast and Robust Part-of-Speech Tagging Using Dynamic Model Selection
Author: Jinho D. Choi ; Martha Palmer
Abstract: This paper presents a novel way of improving POS tagging on heterogeneous data. First, two separate models are trained (generalized and domain-specific) from the same data set by controlling lexical items with different document frequencies. During decoding, one of the models is selected dynamically given the cosine similarity between each sentence and the training data. This dynamic model selection approach, coupled with a one-pass, leftto-right POS tagging algorithm, is evaluated on corpora from seven different genres. Even with this simple tagging algorithm, our system shows comparable results against other state-of-the-art systems, and gives higher accuracies when evaluated on a mixture of the data. Furthermore, our system is able to tag about 32K tokens per second. this model selection approach to more sophisticated tagging improve their robustness even We believe that can be applied algorithms and further.
6 0.73830205 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation
7 0.57608604 158 acl-2012-PORT: a Precision-Order-Recall MT Evaluation Metric for Tuning
8 0.55577886 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base
9 0.54209453 136 acl-2012-Learning to Translate with Multiple Objectives
10 0.532655 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation
11 0.53067017 52 acl-2012-Combining Coherence Models and Machine Translation Evaluation Metrics for Summarization Evaluation
12 0.52228022 72 acl-2012-Detecting Semantic Equivalence and Information Disparity in Cross-lingual Documents
13 0.5127086 46 acl-2012-Character-Level Machine Translation Evaluation for Languages with Ambiguous Word Boundaries
14 0.50950915 138 acl-2012-LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation
15 0.50576818 214 acl-2012-Verb Classification using Distributional Similarity in Syntactic and Semantic Structures
16 0.49692297 132 acl-2012-Learning the Latent Semantics of a Concept from its Definition
17 0.49465966 219 acl-2012-langid.py: An Off-the-shelf Language Identification Tool
18 0.48911574 63 acl-2012-Cross-lingual Parse Disambiguation based on Semantic Correspondence
19 0.48620805 24 acl-2012-A Web-based Evaluation Framework for Spatial Instruction-Giving Systems
20 0.47945753 48 acl-2012-Classifying French Verbs Using French and English Lexical Resources