acl acl2011 acl2011-41 knowledge-graph by maker-knowledge-mining

41 acl-2011-An Interactive Machine Translation System with Online Learning

Source: pdf

Author: Daniel Ortiz-Martinez ; Luis A. Leiva ; Vicent Alabau ; Ismael Garcia-Varea ; Francisco Casacuberta

Abstract: State-of-the-art Machine Translation (MT) systems are still far from being perfect. An alternative is the so-called Interactive Machine Translation (IMT) framework, where the knowledge of a human translator is combined with the MT system. We present a statistical IMT system able to learn from user feedback by means of the application of online learning techniques. These techniques allow the MT system to update the parameters of the underlying models in real time. According to empirical results, our system outperforms the results of conventional IMT systems. To the best of our knowledge, this online learning capability has never been provided by previous IMT systems. Our IMT system is implemented in C++, JavaScript, and ActionScript; and is publicly available on the Web.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 An alternative is the so-called Interactive Machine Translation (IMT) framework, where the knowledge of a human translator is combined with the MT system. [sent-8, score-0.043]

2 We present a statistical IMT system able to learn from user feedback by means of the application of online learning techniques. [sent-9, score-0.379]

3 These techniques allow the MT system to update the parameters of the underlying models in real time. [sent-10, score-0.111]

4 According to empirical results, our system outperforms the results of conventional IMT systems. [sent-11, score-0.066]

5 To the best of our knowledge, this online learning capability has never been provided by previous IMT systems. [sent-12, score-0.137]

6 Our IMT system is implemented in C++, JavaScript, and ActionScript; and is publicly available on the Web. [sent-13, score-0.056]

7 1 Introduction The research in the field of machine translation (MT) aims to develop computer systems which are able to translate text or speech without human intervention. [sent-14, score-0.162]

8 However, current translation technology has not been able to deliver full automated highquality translations. [sent-15, score-0.162]

9 Typical solutions to improve the quality of the translations supplied by an MT system require manual post-editing. [sent-16, score-0.057]

10 An alternative way to take advantage of the existing MT technologies is to use them in collaboration with human translators within a computer-assisted translation (CAT) or interactive framework (Isabelle and Church, 1997). [sent-18, score-0.266]

11 Systems have been designed to interact with linguists to solve ambiguities or update user dictionaries. [sent-20, score-0.173]

12 The idea proposed in that work was to embed data driven MT techniques within the interactive translation environment. [sent-24, score-0.242]

13 Each corrected text segment is then used by the MT system as additional information to achieve improved suggestions. [sent-27, score-0.059]

14 The vast majority of the existing work on IMT makes use of the well-known batch learning paradigm. [sent-29, score-0.13]

15 In the batch learning paradigm, the training of the IMT system and the interactive translation process are carried out in separate stages. [sent-30, score-0.414]

16 This paradigm is not able to take advantage of the new knowledge produced by the user of the IMT system. [sent-31, score-0.235]

17 In this paper, we present an application of the online learning paradigm to the IMT framework. [sent-32, score-0.2]

18 In the online learning paradigm, the training and prediction stages are no longer separated. [sent-33, score-0.137]

19 This feature is particularly useful in IMT since it allows to take into account the user feedback. [sent-34, score-0.115]

20 Specifically, our proposed IMT system can be extended with the new training samples that are generated each time the user validates the translation of a given source sentence. [sent-35, score-0.344]

21 The online learning techniques implemented in our IMT system incrementally update the statistical models involved in the translation process. [sent-36, score-0.427]

22 2 Related work There are some works on IMT in the literature that try to take advantage of user feedback. [sent-37, score-0.134]

23 (2004), where dynamic adaptation of an IMT system via cache-based model extensions to language and translation models is proposed. [sent-39, score-0.204]

24 In interaction-1, Para ver la lista de recursos To view a listing of resources the user moves the mouse to accept the first eight characters (k), then the system suggests completing the sentence with “list of resources” similar. [sent-44, score-0.188]

25 In the final interaction, the user accepts the current suggestion. [sent-45, score-0.115]

26 Interactions 2 and 3 are Recent research on IMT has proposed the use of online learning as one possible way to successfully incorporate user feedback in IMT systems (OrtizMart ı´nez et al. [sent-47, score-0.306]

27 In the online learning setting, models are trained sample by sample. [sent-49, score-0.155]

28 For this reason, such learning paradigm is appropriate for its use in the IMT framework. [sent-50, score-0.083]

29 Specifically, an IMT system able to incrementally update the parameters of all of the different models involved in the interactive translation process is proposed. [sent-53, score-0.409]

30 One previous attempt to implement online learning in IMT is the work by Cesa-Bianchi et al. [sent-54, score-0.155]

31 In that work, the authors present a very constrained version of online learning, which is not able to extend the translation models due to the high time cost of the learning process. [sent-56, score-0.317]

32 We have adopted the online learning techniques proposed in (Ortiz-Mart ı´nez et al. [sent-57, score-0.156]

33 For instance, a prototype system for text prediction to help translators is shown in (Foster et al. [sent-60, score-0.185]

34 Addi- tionally, Koehn (2009) presents the Caitra translation tool. [sent-62, score-0.124]

35 Caitra aids linguists suggesting sentence completions, alternative words or allowing users to post-edit machine translation output. [sent-63, score-0.146]

36 However, neither of these systems are able to take advantage of the user validated translations. [sent-64, score-0.228]

37 69 3 Interactive Machine Translation IMT can be seen as an evolution ofthe statistical machine translation (SMT) framework. [sent-65, score-0.142]

38 In SMT, given source string f, we seek for the target string e which maximizes the posterior probability: e = argemaxPr(e|f) (1) Within the IMT framework, a state-of-the-art SMT system is employed in the following way. [sent-66, score-0.059]

39 For a given source sentence, the SMT system automatically generates an initial translation. [sent-67, score-0.059]

40 A human translator checks this translation from left to right, correcting the first error. [sent-68, score-0.167]

41 The SMT system then proposes a new extension, taking the correct prefix ep into account. [sent-69, score-0.113]

42 In the resulting decision rule, we maximize over all possible extensions es of ep: es = argemsaxPr(es|ep,f) (2) It is worth to note that the user interactions are at character level, that is, for each submitted keystroke the system provides a new extension (or suffix) to the current hypothesis. [sent-71, score-0.287]

43 Among this feature functions, the most relevant are the language and translation models. [sent-78, score-0.124]

44 The language model is implemented using statistical n-gram language models and the translation model is implemented using phrase-based models. [sent-79, score-0.16]

45 The IMT system proposed here is based on a loglinear SMT system which includes a total of seven feature functions: an n-gram language model, a target sentence length model, inverse and direct phrasebased models, source and target phrase length models and a reordering model. [sent-80, score-0.176]

46 4 Online Learning In the online learning paradigm, learning proceeds as a sequence of trials. [sent-81, score-0.157]

47 The online learning paradigm fits nicely in the IMT framework, since the interactive translation of the source sentences generates new user-validated training samples that can be used to extend the statistical models involved in the translation process. [sent-84, score-0.629]

48 One key aspect in online learning is the time required by the learning algorithm to process the new training samples. [sent-85, score-0.157]

49 One way to satisfy this constraint is to obtain incrementally updateable versions of the algorithms that are executed to train the statistical models involved in the translation process. [sent-86, score-0.241]

50 Specifically, our proposed IMT system implements the set of training algorithms that are required to incrementally update each component of the loglinear model. [sent-88, score-0.172]

51 One key aspect of the required training algorithms is the necessity to replace the conventional expectation-maximization 70 (EM) algorithm by its incremental version (Neal and Hinton, 1998). [sent-90, score-0.052]

52 5 System Overview In this section the main features of our prototype are shown, including prototype design, interaction protocol, prototype functionalities and demo usage. [sent-93, score-0.463]

53 The latter allows researchers to test different techniques and interaction protocols. [sent-97, score-0.049]

54 For that reason, we developed an CAT Application Programming Interface (API) between the client and the actual translation engine, by using a network communication protocol and exposing a well-defined set of functions. [sent-98, score-0.278]

55 On the one hand, the IMT client provides a User Interface (UI) which uses the API to commu- nicate with the IMT server through the Web. [sent-101, score-0.166]

56 The hardware requirements in the client are very low, as the translation process is carried out remotely on the server, so virtually any computer (including netbooks, tablets or 3G mobile phones) should be fairly enough. [sent-102, score-0.229]

57 On the other hand, the server, which is unaware of the implementation details of the IMT client, uses and adapts the statistical models that are used to perform the translation. [sent-103, score-0.066]

58 2 User Interaction Protocol The protocol that rules the IMT process has the following steps: 1. [sent-105, score-0.057]

59 The system proposes a full translation of the selected text segment. [sent-106, score-0.16]

60 The source text segments are automatically extracted from source document. [sent-108, score-0.07]

61 Such segments are marked as pending (light blue), validated (dark green), partially translated (light green), and locked (light red). [sent-109, score-0.103]

62 The translation engine can work either at full-word or character level. [sent-110, score-0.175]

63 The user validates the longest prefix of the translation which is error-free and/or corrects the first error in the suffix. [sent-112, score-0.336]

64 Corrections are entered by amendment keystrokes or mouse clicks/wheel operations. [sent-113, score-0.054]

65 In this way, a new extended consolidated prefix is produced based on the previous validated prefix and the interaction amendments. [sent-115, score-0.193]

66 Steps 2 and 3 are iterated until the user-desired translation is produced. [sent-118, score-0.124]

67 The system adapts the models to the new validated pair of sentences. [sent-120, score-0.14]

68 3 Prototype Functionality The following is a list of the main features that the prototype supports: • • • When the user corrects the solution proposed by ethne system, a new improved siounffi pxr oisp presented to the user. [sent-122, score-0.285]

69 The system is able to learn from user-validated tTrahens slyastitoenms. [sent-123, score-0.074]

70 The user is able to perform actions by means oTfh keyboard asbhloert tocut pse or mouse gestures. [sent-124, score-0.208]

71 eTanhes supported actions on the proposed suffix are: Substitution Substitute the first word or character of the suffix. [sent-125, score-0.057]

72 Acceptance Assume that the current translation is correct and adapt the models. [sent-129, score-0.124]

73 At any time, the user is able to visualize the original dimoceu,m theent u (Figure 4(a)), as uwaleilzl as a properly formated draft of the current translation (Figure 4(b)). [sent-130, score-0.277]

74 The interface uses web technologies such as XHTML, JavaScript, and ActionScript; while the IMT engine is written in C++. [sent-134, score-0.086]

75 The prototype is publicly available on the Web (http : / / cat . [sent-135, score-0.21]

76 To begin with, the UI loads an index of all available translation corpora. [sent-139, score-0.124]

77 Currently, the prototype can be tested with the well-known Europarl corpora (Koehn, 2005). [sent-140, score-0.125]

78 The user chooses a corpus and navigates to the main interface page (Figure 3), where she interactively translates the text segments one by one. [sent-141, score-0.22]

79 All corrections are stored in plain text logs on the server, so the user can retake them in any moment, also allowing collaborative translations be- tween users. [sent-146, score-0.155]

80 On the other hand, this prototype allows uploading custom documents in text format. [sent-147, score-0.125]

81 Since the users operate within a web browser, the system also provides crossplatform compatibility and requires neither computational power nor disk space on the client’s machine. [sent-148, score-0.055]

82 The communication between application and web server is based on asynchronous HTTP connections, providing thus a richer interactive experience (no page refreshes are required. [sent-149, score-0.204]

83 ) Moreover, the Web server communicates with the IMT engine through binary TCP sockets, ensuring really fast response times. [sent-150, score-0.148]

84 , 2009), which consists of translation of Xerox printer manual involving three different language pairs: French-English, Spanish-English, and German-English. [sent-152, score-0.124]

85 , 2009), which measures the user effort required to generate error-free translations, and the well-known BLEU score, which constitutes a measure of the translation quality. [sent-156, score-0.239]

86 The test corpora were interactively translated from English to the other three languages, comparing the performance of a batch IMT (baseline) and the online IMT systems. [sent-157, score-0.295]

87 The batch IMT system is a conventional IMT system which is not able to take advantage of user feedback after each translation is performed. [sent-158, score-0.543]

88 The online IMT system uses the translations validated by the user to adapt the translation models at runtime. [sent-159, score-0.487]

89 Both systems were initialized with a log-linear model trained in batch mode using the training corpus. [sent-160, score-0.11]

90 Table 1 shows the BLEU score and the KSMR for the batch and the online IMT systems (95% confidence intervals are shown). [sent-161, score-0.227]

91 The BLEU score was calculated from the first translation hypothesis produced by the IMT system for each source sentence. [sent-162, score-0.183]

92 All the obtained im- provements with the online IMT system were statistically significant. [sent-163, score-0.153]

93 According to the reported response and online training times, we can argue that the system proposed here is able to be used on real time scenarios. [sent-165, score-0.262]

94 10(s495 ) Table 1: BLEU and KSMR results for the XEROX test corpora using the batch and the online IMT systems, reporting the average online learning (LT) and the interaction response times (RP) in seconds. [sent-172, score-0.444]

95 7 Conclusions We have described an IMT system with online learning which is able to learn from user feedback in real time. [sent-175, score-0.382]

96 The proposed IMT tool is publicly available through the Web (http : / / cat . [sent-177, score-0.104]

97 Currently, the system can be used to interactively translate the well-known Europarl corpus. [sent-181, score-0.081]

98 According to such experiments, our IMT system is able to outperform the results obtained by conventional IMT systems implementing batch learning. [sent-183, score-0.214]

99 Future work includes researching further on the benefits provided by our online learning techniques with experiments involving real users. [sent-184, score-0.158]

100 Adaptive language and translation models for interactive machine translation. [sent-267, score-0.241]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('imt', 0.862), ('prototype', 0.125), ('translation', 0.124), ('online', 0.117), ('user', 0.115), ('batch', 0.11), ('interactive', 0.099), ('barrachina', 0.087), ('server', 0.086), ('nez', 0.082), ('client', 0.08), ('transtype', 0.079), ('cat', 0.065), ('paradigm', 0.063), ('langlais', 0.06), ('ksmr', 0.059), ('mt', 0.058), ('protocol', 0.057), ('smt', 0.057), ('validated', 0.056), ('interaction', 0.049), ('interactively', 0.045), ('xerox', 0.045), ('es', 0.045), ('prefix', 0.044), ('translator', 0.043), ('isabelle', 0.043), ('actionscript', 0.04), ('caitra', 0.04), ('nepveu', 0.04), ('demo', 0.039), ('europarl', 0.039), ('able', 0.038), ('foster', 0.037), ('mouse', 0.037), ('system', 0.036), ('update', 0.036), ('interface', 0.036), ('feedback', 0.035), ('lapalme', 0.035), ('universitat', 0.035), ('incrementally', 0.035), ('ep', 0.033), ('ecnica', 0.032), ('garc', 0.032), ('javascript', 0.032), ('ncia', 0.032), ('polit', 0.032), ('engine', 0.031), ('response', 0.031), ('adapts', 0.03), ('casacuberta', 0.03), ('conventional', 0.03), ('neal', 0.027), ('validates', 0.027), ('val', 0.027), ('corrects', 0.026), ('loglinear', 0.026), ('extensions', 0.026), ('light', 0.026), ('carried', 0.025), ('bleu', 0.025), ('translators', 0.024), ('segments', 0.024), ('involved', 0.023), ('translated', 0.023), ('source', 0.023), ('executed', 0.023), ('corrected', 0.023), ('incremental', 0.022), ('linguists', 0.022), ('inform', 0.022), ('green', 0.021), ('translations', 0.021), ('real', 0.021), ('hm', 0.02), ('character', 0.02), ('publicly', 0.02), ('ui', 0.02), ('learning', 0.02), ('implements', 0.02), ('corrections', 0.019), ('proposed', 0.019), ('web', 0.019), ('api', 0.019), ('advantage', 0.019), ('implement', 0.018), ('actions', 0.018), ('models', 0.018), ('statistical', 0.018), ('seven', 0.018), ('grant', 0.018), ('ismael', 0.017), ('alabau', 0.017), ('amendment', 0.017), ('argemsaxpr', 0.017), ('aticos', 0.017), ('cubel', 0.017), ('departamento', 0.017), ('exposing', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 41 acl-2011-An Interactive Machine Translation System with Online Learning

Author: Daniel Ortiz-Martinez ; Luis A. Leiva ; Vicent Alabau ; Ismael Garcia-Varea ; Francisco Casacuberta

2 0.6043191 168 acl-2011-Improving On-line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation

Author: Vicent Alabau ; Alberto Sanchis ; Francisco Casacuberta

Abstract: In interactive machine translation (IMT), a human expert is integrated into the core of a machine translation (MT) system. The human expert interacts with the IMT system by partially correcting the errors of the system’s output. Then, the system proposes a new solution. This process is repeated until the output meets the desired quality. In this scenario, the interaction is typically performed using the keyboard and the mouse. In this work, we present an alternative modality to interact within IMT systems by writing on a tactile display or using an electronic pen. An on-line handwritten text recognition (HTR) system has been specifically designed to operate with IMT systems. Our HTR system improves previous approaches in two main aspects. First, HTR decoding is tightly coupled with the IMT system. Second, the language models proposed are context aware, in the sense that they take into account the partial corrections and the source sentence by using a combination of ngrams and word-based IBM models. The proposed system achieves an important boost in performance with respect to previous work.

3 0.069386795 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals

Author: Omar F. Zaidan ; Chris Callison-Burch

Abstract: Naively collecting translations by crowdsourcing the task to non-professional translators yields disfluent, low-quality results if no quality control is exercised. We demonstrate a variety of mechanisms that increase the translation quality to near professional levels. Specifically, we solicit redundant translations and edits to them, and automatically select the best output among them. We propose a set of features that model both the translations and the translators, such as country of residence, LM perplexity of the translation, edit rate from the other translations, and (optionally) calibration against professional translators. Using these features to score the collected translations, we are able to discriminate between acceptable and unacceptable translations. We recreate the NIST 2009 Urdu-toEnglish evaluation set with Mechanical Turk, and quantitatively show that our models are able to select translations within the range of quality that we expect from professional trans- lators. The total cost is more than an order of magnitude lower than professional translation.

4 0.068500713 216 acl-2011-MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles

Author: Chi-kiu Lo ; Dekai Wu

Abstract: We introduce a novel semi-automated metric, MEANT, that assesses translation utility by matching semantic role fillers, producing scores that correlate with human judgment as well as HTER but at much lower labor cost. As machine translation systems improve in lexical choice and fluency, the shortcomings of widespread n-gram based, fluency-oriented MT evaluation metrics such as BLEU, which fail to properly evaluate adequacy, become more apparent. But more accurate, nonautomatic adequacy-oriented MT evaluation metrics like HTER are highly labor-intensive, which bottlenecks the evaluation cycle. We first show that when using untrained monolingual readers to annotate semantic roles in MT output, the non-automatic version of the metric HMEANT achieves a 0.43 correlation coefficient with human adequacyjudgments at the sentence level, far superior to BLEU at only 0.20, and equal to the far more expensive HTER. We then replace the human semantic role annotators with automatic shallow semantic parsing to further automate the evaluation metric, and show that even the semiautomated evaluation metric achieves a 0.34 correlation coefficient with human adequacy judgment, which is still about 80% as closely correlated as HTER despite an even lower labor cost for the evaluation procedure. The results show that our proposed metric is significantly better correlated with human judgment on adequacy than current widespread automatic evaluation metrics, while being much more cost effective than HTER. 1

5 0.066060469 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

Author: Yanjun Ma ; Yifan He ; Andy Way ; Josef van Genabith

Abstract: We present a discriminative learning method to improve the consistency of translations in phrase-based Statistical Machine Translation (SMT) systems. Our method is inspired by Translation Memory (TM) systems which are widely used by human translators in industrial settings. We constrain the translation of an input sentence using the most similar ‘translation example’ retrieved from the TM. Differently from previous research which used simple fuzzy match thresholds, these constraints are imposed using discriminative learning to optimise the translation performance. We observe that using this method can benefit the SMT system by not only producing consistent translations, but also improved translation outputs. We report a 0.9 point improvement in terms of BLEU score on English–Chinese technical documents.

6 0.063886039 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

7 0.063424818 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

8 0.062316615 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

9 0.057866555 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations

10 0.057605971 115 acl-2011-Engkoo: Mining the Web for Language Learning

11 0.057122733 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation

12 0.05573903 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?

13 0.053894762 49 acl-2011-Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?

14 0.053783834 177 acl-2011-Interactive Group Suggesting for Twitter

15 0.053476121 104 acl-2011-Domain Adaptation for Machine Translation by Mining Unseen Words

16 0.052199576 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment

17 0.051953625 75 acl-2011-Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction

18 0.051804952 245 acl-2011-Phrase-Based Translation Model for Question Retrieval in Community Question Answer Archives

19 0.051181398 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output

20 0.051111918 313 acl-2011-Two Easy Improvements to Lexical Weighting

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.121), (1, -0.064), (2, 0.063), (3, 0.082), (4, -0.022), (5, 0.022), (6, 0.006), (7, -0.039), (8, 0.046), (9, 0.033), (10, -0.052), (11, -0.071), (12, 0.007), (13, -0.102), (14, -0.01), (15, 0.03), (16, -0.037), (17, -0.058), (18, -0.005), (19, -0.047), (20, 0.074), (21, 0.066), (22, 0.155), (23, 0.021), (24, -0.015), (25, -0.32), (26, 0.073), (27, -0.019), (28, -0.154), (29, -0.151), (30, -0.001), (31, 0.425), (32, -0.213), (33, -0.273), (34, -0.17), (35, -0.047), (36, 0.125), (37, 0.014), (38, 0.013), (39, -0.055), (40, -0.13), (41, 0.031), (42, -0.077), (43, -0.051), (44, -0.084), (45, 0.076), (46, 0.03), (47, 0.159), (48, -0.048), (49, -0.06)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91916853 41 acl-2011-An Interactive Machine Translation System with Online Learning

Author: Daniel Ortiz-Martinez ; Luis A. Leiva ; Vicent Alabau ; Ismael Garcia-Varea ; Francisco Casacuberta

2 0.91093183 168 acl-2011-Improving On-line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation

Author: Vicent Alabau ; Alberto Sanchis ; Francisco Casacuberta

3 0.36865082 35 acl-2011-An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling

Author: Kenneth Hild ; Umut Orhan ; Deniz Erdogmus ; Brian Roark ; Barry Oken ; Shalini Purwar ; Hooman Nezamfar ; Melanie Fried-Oken

Abstract: Event related potentials (ERP) corresponding to stimuli in electroencephalography (EEG) can be used to detect the intent of a person for brain computer interfaces (BCI). This paradigm is widely used to build letter-byletter text input systems using BCI. Nevertheless using a BCI-typewriter depending only on EEG responses will not be sufficiently accurate for single-trial operation in general, and existing systems utilize many-trial schemes to achieve accuracy at the cost of speed. Hence incorporation of a language model based prior or additional evidence is vital to improve accuracy and speed. In this demonstration we will present a BCI system for typing that integrates a stochastic language model with ERP classification to achieve speedups, via the rapid serial visual presentation (RSVP) paradigm.

4 0.34301805 62 acl-2011-Blast: A Tool for Error Analysis of Machine Translation Output

Author: Sara Stymne

Abstract: We present BLAST, an open source tool for error analysis of machine translation (MT) output. We believe that error analysis, i.e., to identify and classify MT errors, should be an integral part ofMT development, since it gives a qualitative view, which is not obtained by standard evaluation methods. BLAST can aid MT researchers and users in this process, by providing an easy-to-use graphical user interface. It is designed to be flexible, and can be used with any MT system, language pair, and error typology. The annotation task can be aided by highlighting similarities with a reference translation.

5 0.298408 115 acl-2011-Engkoo: Mining the Web for Language Learning

Author: Matthew R. Scott ; Xiaohua Liu ; Ming Zhou ; Microsoft Engkoo Team

Abstract: This paper presents Engkoo 1, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an application platform that supports a multitude of NLP technologies such as cross language retrieval, alignment, sentence classification, and statistical machine translation. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build perhaps the world’s largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net.

6 0.26511267 177 acl-2011-Interactive Group Suggesting for Twitter

7 0.26473188 220 acl-2011-Minimum Bayes-risk System Combination

8 0.26417017 247 acl-2011-Pre- and Postprocessing for Statistical Machine Translation into Germanic Languages

9 0.25494134 60 acl-2011-Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability

10 0.25448146 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence

11 0.2506631 313 acl-2011-Two Easy Improvements to Lexical Weighting

12 0.25052142 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis

13 0.23408777 81 acl-2011-Consistent Translation using Discriminative Learning - A Translation Memory-inspired Approach

14 0.23375258 310 acl-2011-Translating from Morphologically Complex Languages: A Paraphrase-Based Approach

15 0.22792505 233 acl-2011-On-line Language Model Biasing for Statistical Machine Translation

16 0.22448407 67 acl-2011-Clairlib: A Toolkit for Natural Language Processing, Information Retrieval, and Network Analysis

17 0.22035798 252 acl-2011-Prototyping virtual instructors from human-human corpora

18 0.21732073 227 acl-2011-Multimodal Menu-based Dialogue with Speech Cursor in DICO II+

19 0.21239959 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation

20 0.2123839 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.012), (5, 0.022), (17, 0.036), (26, 0.052), (37, 0.059), (39, 0.017), (41, 0.044), (55, 0.039), (58, 0.103), (59, 0.03), (62, 0.021), (72, 0.017), (91, 0.042), (96, 0.383)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97769862 248 acl-2011-Predicting Clicks in a Vocabulary Learning System

Author: Aaron Michelony

Abstract: We consider the problem of predicting which words a student will click in a vocabulary learning system. Often a language learner will find value in the ability to look up the meaning of an unknown word while reading an electronic document by clicking the word. Highlighting words likely to be unknown to a readeris attractive due to drawing his orher attention to it and indicating that information is available. However, this option is usually done manually in vocabulary systems and online encyclopedias such as Wikipedia. Furthurmore, it is never on a per-user basis. This paper presents an automated way of highlighting words likely to be unknown to the specific user. We present related work in search engine ranking, a description of the study used to collect click data, the experiment we performed using the random forest machine learning algorithm and finish with a discussion of future work.

same-paper 2 0.97480357 41 acl-2011-An Interactive Machine Translation System with Online Learning

Author: Daniel Ortiz-Martinez ; Luis A. Leiva ; Vicent Alabau ; Ismael Garcia-Varea ; Francisco Casacuberta

3 0.95789105 270 acl-2011-SciSumm: A Multi-Document Summarization System for Scientific Articles

Author: Nitin Agarwal ; Ravi Shankar Reddy ; Kiran GVR ; Carolyn Penstein Rose

Abstract: In this demo, we present SciSumm, an interactive multi-document summarization system for scientific articles. The document collection to be summarized is a list of papers cited together within the same source article, otherwise known as a co-citation. At the heart of the approach is a topic based clustering of fragments extracted from each article based on queries generated from the context surrounding the co-cited list of papers. This analysis enables the generation of an overview of common themes from the co-cited papers that relate to the context in which the co-citation was found. SciSumm is currently built over the 2008 ACL Anthology, however the gen- eralizable nature of the summarization techniques and the extensible architecture makes it possible to use the system with other corpora where a citation network is available. Evaluation results on the same corpus demonstrate that our system performs better than an existing widely used multi-document summarization system (MEAD).

4 0.95690787 341 acl-2011-Word Maturity: Computational Modeling of Word Knowledge

Author: Kirill Kireyev ; Thomas K Landauer

Abstract: While computational estimation of difficulty of words in the lexicon is useful in many educational and assessment applications, the concept of scalar word difficulty and current corpus-based methods for its estimation are inadequate. We propose a new paradigm called word meaning maturity which tracks the degree of knowledge of each word at different stages of language learning. We present a computational algorithm for estimating word maturity, based on modeling language acquisition with Latent Semantic Analysis. We demonstrate that the resulting metric not only correlates well with external indicators, but captures deeper semantic effects in language. 1 Motivation It is no surprise that through stages of language learning, different words are learned at different times and are known to different extents. For example, a common word like “dog” is familiar to even a first-grader, whereas a more advanced word like “focal” does not usually enter learners’ vocabulary until much later. Although individual rates of learning words may vary between highand low-performing students, it has been observed that “children [… ] acquire word meanings in roughly the same sequence” (Biemiller, 2008). The aim of this work is to model the degree of knowledge of words at different learning stages. Such a metric would have extremely useful applications in personalized educational technologies, for the purposes of accurate assessment and personalized vocabulary instruction. … 299 .l andaue r } @pear s on .com 2 Rethinking Word Difficulty Previously, related work in education and psychometrics has been concerned with measuring word difficulty or classifying words into different difficulty categories. Examples of such approaches include creation of word lists for targeted vocabulary instruction at various grade levels that were compiled by educational experts, such as Nation (1993) or Biemiller (2008). Such word difficulty assignments are also implicitly present in some readability formulas that estimate difficulty of texts, such as Lexiles (Stenner, 1996), which include a lexical difficulty component based on the frequency of occurrence of words in a representative corpus, on the assumption that word difficulty is inversely correlated to corpus frequency. Additionally, research in psycholinguistics has attempted to outline and measure psycholinguistic dimensions of words such as age-of-acquisition and familiarity, which aim to track when certain words become known and how familiar they appear to an average person. Importantly, all such word difficulty measures can be thought of as functions that assign a single scalar value to each word w: !

5 0.9563098 82 acl-2011-Content Models with Attitude

Author: Christina Sauper ; Aria Haghighi ; Regina Barzilay

Abstract: We present a probabilistic topic model for jointly identifying properties and attributes of social media review snippets. Our model simultaneously learns a set of properties of a product and captures aggregate user sentiments towards these properties. This approach directly enables discovery of highly rated or inconsistent properties of a product. Our model admits an efficient variational meanfield inference algorithm which can be parallelized and run on large snippet collections. We evaluate our model on a large corpus of snippets from Yelp reviews to assess property and attribute prediction. We demonstrate that it outperforms applicable baselines by a considerable margin.

6 0.95462376 272 acl-2011-Semantic Information and Derivation Rules for Robust Dialogue Act Detection in a Spoken Dialogue System

7 0.95440698 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers

8 0.95390105 335 acl-2011-Why Initialization Matters for IBM Model 1: Multiple Optima and Non-Strict Convexity

9 0.95358193 314 acl-2011-Typed Graph Models for Learning Latent Attributes from Names

10 0.95290959 168 acl-2011-Improving On-line Handwritten Recognition using Translation Models in Multimodal Interactive Machine Translation

11 0.94797146 25 acl-2011-A Simple Measure to Assess Non-response

12 0.94785267 49 acl-2011-Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level?

13 0.94673377 266 acl-2011-Reordering with Source Language Collocations

14 0.94451386 169 acl-2011-Improving Question Recommendation by Exploiting Information Need

15 0.94251639 26 acl-2011-A Speech-based Just-in-Time Retrieval System using Semantic Search

16 0.9404878 2 acl-2011-AM-FM: A Semantic Framework for Translation Quality Assessment

17 0.94028103 251 acl-2011-Probabilistic Document Modeling for Syntax Removal in Text Summarization

18 0.94011426 264 acl-2011-Reordering Metrics for MT

19 0.93704003 220 acl-2011-Minimum Bayes-risk System Combination

20 0.93447405 21 acl-2011-A Pilot Study of Opinion Summarization in Conversations