acl acl2010 acl2010-192 knowledge-graph by maker-knowledge-mining

192 acl-2010-Paraphrase Lattice for Statistical Machine Translation


Source: pdf

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 jp chi Abstract Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. [sent-5, score-0.727]

2 We show that lattice decoding is also useful for handling input variations. [sent-6, score-0.704]

3 Given an input sentence, we build a lattice which represents paraphrases of the input sentence. [sent-7, score-0.844]

4 Then, we give the paraphrase lattice as an input to the lattice decoder. [sent-9, score-1.658]

5 Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets. [sent-11, score-0.764]

6 1 Introduction Lattice decoding in SMT is useful in speech translation and in the translation of German (Bertoldi et al. [sent-12, score-0.378]

7 In speech translation, by using lattices that represent not only 1-best result but also other possibilities of speech recognition, we can take into account the ambiguities of speech recognition. [sent-14, score-0.312]

8 Thus, the translation quality for lattice inputs is better than the quality for 1best inputs. [sent-15, score-0.623]

9 In this paper, we show that lattice decoding is also useful for handling input variations. [sent-16, score-0.704]

10 “Input variations” refers to the differences of input texts with the same meaning. [sent-17, score-0.042]

11 ” have the same meaning with variations in “beauty salon ” and “beauty parlor”. [sent-20, score-0.243]

12 Since these variations are frequently found in natural language texts, a mismatch of the expressions in source sentences and the expressions in training corpus leads to a decrease in translation quality. [sent-21, score-0.212]

13 Therefore, 1 we propose a novel method that can handle input variations using paraphrases and lattice decoding. [sent-22, score-0.839]

14 In the proposed method, we regard a given source sentence as one ofmany variations (1-best). [sent-23, score-0.143]

15 Given an input sentence, we build a paraphrase lattice which represents paraphrases of the input sentence. [sent-24, score-1.444]

16 Then, we give the paraphrase lattice as an input to the Moses decoder (Koehn et al. [sent-25, score-1.199]

17 By using paraphrases of source sentences, we can translate expressions which are not found in a training corpus on the condition that paraphrases of them are found in the training corpus. [sent-28, score-0.493]

18 Moreover, by using lattice decoding, we can employ the source-side language model as a decoding feature. [sent-29, score-0.643]

19 Since this feature is affected by the source-side context, the decoder can choose a proper paraphrase and translate correctly. [sent-30, score-0.697]

20 This paper is organized as follows: Related works on lattice decoding and paraphrasing are presented in Section 2. [sent-31, score-0.778]

21 2 Related Work Lattice decoding has been used to handle ambiguities of preprocessing. [sent-35, score-0.212]

22 (2007) employed a confusion network, which is a kind of lattice and represents speech recognition hypotheses in speech translation. [sent-37, score-0.647]

23 Dyer (2009) also employed a segmentation lattice, which represents ambiguities of compound word segmentation in German, Hungarian and Turkish translation. [sent-38, score-0.152]

24 However, to the best of our knowledge, there is no work which employed a lattice representing paraphrases of an input sentence. [sent-39, score-0.772]

25 On the other hand, paraphrasing has been used to enrich the SMT model. [sent-40, score-0.135]

26 1c 02 C01o0n Afesresonc ieat Siho nr fto Pra Cpoemrsp,u ptagateison 1a–l5 L,inguistics Input sentence (Pfoar apaleral Cphor apsues)ParaLpishtraseParaphrasing Paraphrase Lattice Pa(froarl teral Cinoinrpgu)sSMT modelLattice Decoding Output sentence Figure 1: Overview of the proposed method. [sent-43, score-0.094]

27 (2009) augmented the translation phrase table with paraphrases to translate unknown phrases. [sent-46, score-0.443]

28 (2008) and Nakov (2008) augmented the training data by paraphrasing. [sent-48, score-0.036]

29 However, there is no work which augments input sentences by paraphrasing and represents them in lattices. [sent-49, score-0.202]

30 3 Paraphrase Lattice for SMT Overview of the proposed method is shown in Figure 1. [sent-50, score-0.052]

31 In advance, we automatically acquire a paraphrase list from a parallel corpus. [sent-51, score-0.707]

32 In order to acquire paraphrases of unknown phrases, this parallel corpus is different from the parallel corpus for training. [sent-52, score-0.38]

33 Given an input sentence, we build a lattice which represents paraphrases of the input sentence using the paraphrase list. [sent-53, score-1.478]

34 Then, we give the paraphrase lattice to the lattice decoder. [sent-55, score-1.616]

35 1 Acquiring the paraphrase list We acquire a paraphrase list using Bannard and Callison-Burch (2005)’s method. [sent-57, score-1.289]

36 Their idea is, if two different phrases e1, e2 in one language are aligned to the same phrase c in another language, they are hypothesized to be paraphrases of each other. [sent-58, score-0.289]

37 Build a phrase table from parallel corpus using standard SMT techniques. [sent-62, score-0.12]

38 The phrase table built in 1has many inappropriate phrase pairs. [sent-65, score-0.126]

39 Therefore, we filter the 2 phrase table and keep only appropriate phrase pairs using the sigtest-filter (Johnson et al. [sent-66, score-0.162]

40 Calculate the paraphrase probability p(e2 |e1) if e2 is hypothesized to be a paraphrase of| e1. [sent-70, score-1.223]

41 p(e2|e1) = ∑P(c|e1)P(e2|c) ∑c where P(· | ·) is phrase translation probability. [sent-71, score-0.157]

42 Acquire (e1, e2) as a paraphrase pair if p(e2 |e1) > p(e1 |e1). [sent-74, score-0.6]

43 The purpose of this thres|heold is to keep highly-accurate paraphrase pairs. [sent-75, score-0.6]

44 In experiments, more than 80% of paraphrase pairs were eliminated by this threshold. [sent-76, score-0.617]

45 2 Building paraphrase lattice An input sentence is paraphrased using the paraphrase list and transformed into a paraphrase lat- tice. [sent-78, score-2.524]

46 The paraphrase lattice is a lattice which represents paraphrases of the input sentence. [sent-79, score-1.886]

47 An example of a paraphrase lattice is shown in Figure 2. [sent-80, score-1.108]

48 In this example, an input sentence is “is there a beauty salon ? [sent-81, score-0.476]

49 This paraphrase lattice contains two paraphrase pairs “beauty salon ” = “beauty parlor” and “beauty salon ” = “salon ”, and represents following three sentences. [sent-83, score-2.154]

50 In the paraphrase lattice, each node consists of a token, the distance to the next node and features for lattice decoding. [sent-87, score-1.182]

51 • Paraphrase probability (p) A paraphrase probability p(e2 |e1) calculated when acquiring the paraphrase. [sent-89, score-0.624]

52 hp • = p(e2|e1) Language model score (l) A ratio between the language model probability of the paraphrased sentence (para) and that of the original sentence (orig). [sent-90, score-0.214]

53 hl = 0 -- (" i s" llmm((oprairga)) , 1,, 1,, 1,, 1) "t 1 -- ("there" "a , 1, 1,, 1, 1) 6534 - ( " bp? [sent-91, score-0.03]

54 )36ed7n,gct3oh)d(in)g Figure • 2: An example of a paraphrase lattice, which contains three features of (p, l, d). [sent-97, score-0.624]

55 Normalized language model score (L) A language model score where the language model probability is normalized by the sentence length. [sent-98, score-0.09]

56 The sentence length is calculated as the number of tokens. [sent-99, score-0.061]

57 hL= LLMM((oprairga)), 1 where LM(sent) = lm(sent) length(sent) • Paraphrase length (d) The difference between the original sentence length and the paraphrased sentence length. [sent-100, score-0.24]

58 hd = exp(length(para) −length(orig)) The values of these features are calculated only if the node is the first node of the paraphrase, for example the second “beauty” and “salon ” in line 3 of Figure 2. [sent-101, score-0.074]

59 The features related to the language model, such as (l) and (L), are affected by the context of source sentences even if the same paraphrase pair is applied. [sent-103, score-0.665]

60 As these features can penalize paraphrases which are not appropriate to the context, appropriate paraphrases are chosen and appropriate translations are output in lattice decoding. [sent-104, score-1.02]

61 The features related to the sentence length, such as (L) and (d), are added to penalize the language model score in case the paraphrased sentence length is shorter than the original sentence length and the language model score is unreasonably low. [sent-105, score-0.379]

62 In experiments, we use four combinations of these features, (p), (p, l), (p, L) and (p, l, d). [sent-106, score-0.021]

63 Moses is an open source 3 SMT system which allows lattice decoding. [sent-110, score-0.53]

64 In lattice decoding, Moses selects the best path and the best translation according to features added in each node and other SMT features. [sent-111, score-0.694]

65 4 Experiments In order to evaluate the proposed method, we conducted English-to-Japanese and English-toChinese translation experiments using IWSLT 2007 (Fordyce, 2007) dataset. [sent-113, score-0.146]

66 This dataset contains EJ and EC parallel corpus for the travel domain and consists of 40k sentences for training and about 500 sentences sets (dev1, dev2 and dev3) for development and testing. [sent-114, score-0.076]

67 We used the dev1 set for parameter tuning, the dev2 set for choosing the setting of the proposed method, which is described below, and the dev3 set for testing. [sent-115, score-0.026]

68 The English-English paraphrase list was acquired from the EC corpus for EJ translation and 53K pairs were acquired. [sent-116, score-0.805]

69 Similarly, 47K pairs were acquired from the EJ corpus for EC translation. [sent-117, score-0.089]

70 In CCB, we paraphrased the phrase table using the automatically acquired paraphrase list. [sent-122, score-0.836]

71 Then, we augmented the phrase table with paraphrased phrases which were not found in the original phrase table. [sent-123, score-0.28]

72 Moreover, we used an additional feature whose value was the paraphrase probability (p) if the entry was generated by paraphrasing and Moses(w/o Paraphrases)CCBProposed Method EJ38. [sent-124, score-0.735]

73 Weights of the feature and other features in SMT were optimized using MERT. [sent-136, score-0.024]

74 2 Proposed method In the proposed method, we conducted experiments with various settings for paraphrasing and lattice decoding. [sent-138, score-0.739]

75 1 Limitation of paraphrasing As the paraphrase list was automatically acquired, there were many erroneous paraphrase pairs. [sent-142, score-1.383]

76 Building paraphrase lattices with all erroneous paraphrase pairs and decoding these paraphrase lattices caused high computational complexity. [sent-143, score-2.27]

77 Therefore, we limited the number of paraphrasing per phrase and per sentence. [sent-144, score-0.198]

78 The number ofparaphrasing per phrase was limited to three and the number of paraphrasing per sentence was limited to twice the size of the sentence length. [sent-145, score-0.266]

79 As a criterion for limiting the number of paraphrasing, we use three features (p), (l) and (L), which are same as the features described in Subsection 3. [sent-146, score-0.076]

80 When building paraphrase lattices, we apply paraphrases in descending order of the value of the criterion. [sent-148, score-0.822]

81 2 Finding optimal settings As previously mentioned, we have three choices for the criterion for building paraphrase lattices and four combinations of features for lattice decoding. [sent-151, score-1.364]

82 Thus, there are 3 4 = 12 combinations cofo dthinegse. [sent-152, score-0.021]

83 rWee a rceon 3d ×uc 4te =d parameter tuning with the dev1 set for each setting and used as best the setting which got the highest BLEU score for the dev2 set. [sent-154, score-0.045]

84 In EJ translation, the proposed method obtained the highest score of 40. [sent-158, score-0.08]

85 In EC translation, the proposed method also obtained the highest score of 27. [sent-162, score-0.08]

86 As the relation of three systems is Moses < CCB < Proposed Method, paraphrasing is useful for SMT and using paraphrase lattices and lattice decoding is especially more useful than augmenting the phrase table. [sent-166, score-1.65]

87 In Proposed Method, the criterion for building paraphrase lattices and the combination of features for lattice decoding were (p) and (p, L) in EJ translation and (L) and (p, l) in EC translation. [sent-167, score-1.554]

88 Since features related to the source-side language model were chosen in each direction, using the source-side language model is useful for decoding paraphrase lattices. [sent-168, score-0.778]

89 We also tried a combination of Proposed Method and CCB, which is a method of decoding paraphrase lattices with an augmented phrase table. [sent-169, score-1.006]

90 This is because the proposed method includes the effect of augmenting the phrase table. [sent-171, score-0.14]

91 Moreover, we conducted German-English translation using the Europarl corpus (Koehn, 2005). [sent-172, score-0.137]

92 3M pairs of German-German paraphrases from a 1M German-Spanish parallel corpus. [sent-175, score-0.26]

93 We conducted experiments with various sizes of training corpus, using 10K, 20K, 40K, 80K, 160K and 1M. [sent-176, score-0.026]

94 Figure 3 shows the proposed method consistently get higher score than Moses and CCB. [sent-177, score-0.08]

95 5 Conclusion This paper has proposed a novel method for transforming a source sentence into a paraphrase lattice and applying lattice decoding. [sent-178, score-1.724]

96 Since our method can employ source-side language models as a decoding feature, the decoder can choose proper paraphrases and translate properly. [sent-179, score-0.442]

97 36 BLEU points over Moses in EJ translation and 1. [sent-182, score-0.125]

98 In Europarl dataset, the proposed method consistently get higher score than baselines. [sent-187, score-0.08]

99 In future work, we plan to apply this method with paraphrases derived from a massive corpus such as the Web corpus and apply this method to a hierarchical phrase based SMT. [sent-188, score-0.352]

100 Using a maximum entropy model to build segmentation lattices for MT. [sent-207, score-0.195]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('paraphrase', 0.6), ('lattice', 0.508), ('paraphrases', 0.203), ('salon', 0.202), ('beauty', 0.198), ('lattices', 0.146), ('moses', 0.142), ('iwslt', 0.137), ('paraphrasing', 0.135), ('decoding', 0.135), ('paraphrased', 0.118), ('parlor', 0.112), ('translation', 0.094), ('ej', 0.084), ('bleu', 0.079), ('europarl', 0.079), ('smt', 0.076), ('ccb', 0.072), ('phrase', 0.063), ('ambiguities', 0.058), ('acquired', 0.055), ('ec', 0.052), ('decoder', 0.049), ('acquire', 0.045), ('llmm', 0.045), ('oprairga', 0.045), ('orig', 0.045), ('input', 0.042), ('bertoldi', 0.041), ('variations', 0.041), ('parallel', 0.04), ('chris', 0.04), ('bond', 0.039), ('bannard', 0.039), ('koehn', 0.037), ('para', 0.036), ('speech', 0.036), ('augmented', 0.036), ('sent', 0.035), ('sentence', 0.034), ('points', 0.031), ('hl', 0.03), ('translate', 0.029), ('criterion', 0.028), ('dyer', 0.028), ('score', 0.028), ('length', 0.027), ('marton', 0.027), ('method', 0.026), ('german', 0.026), ('proposed', 0.026), ('erroneous', 0.026), ('conducted', 0.026), ('represents', 0.025), ('segmentation', 0.025), ('augmenting', 0.025), ('penalize', 0.025), ('node', 0.025), ('build', 0.024), ('acquiring', 0.024), ('features', 0.024), ('philipp', 0.024), ('selects', 0.024), ('hypothesized', 0.023), ('confusion', 0.023), ('source', 0.022), ('marcello', 0.022), ('list', 0.022), ('lm', 0.021), ('combinations', 0.021), ('inputs', 0.021), ('summit', 0.021), ('ofmany', 0.02), ('sumita', 0.02), ('darren', 0.02), ('dra', 0.02), ('ecai', 0.02), ('onishi', 0.02), ('handle', 0.019), ('useful', 0.019), ('nicola', 0.019), ('zens', 0.019), ('dataset', 0.019), ('expressions', 0.019), ('path', 0.019), ('employed', 0.019), ('appropriate', 0.019), ('affected', 0.019), ('building', 0.019), ('overview', 0.018), ('unknown', 0.018), ('gains', 0.018), ('preslav', 0.018), ('eiichiro', 0.018), ('settings', 0.018), ('pairs', 0.017), ('tuning', 0.017), ('statistical', 0.017), ('johnson', 0.017), ('corpus', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.

2 0.2164295 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal

Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.

3 0.18649986 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

Author: Graeme Blackwood ; Adria de Gispert ; William Byrne

Abstract: This paper presents an efficient implementation of linearised lattice minimum Bayes-risk decoding using weighted finite state transducers. We introduce transducers to efficiently count lattice paths containing n-grams and use these to gather the required statistics. We show that these procedures can be implemented exactly through simple transformations of word sequences to sequences of n-grams. This yields a novel implementation of lattice minimum Bayes-risk decoding which is fast and exact even for very large lattices.

4 0.15759201 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder

Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.

5 0.14977874 107 acl-2010-Exemplar-Based Models for Word Meaning in Context

Author: Katrin Erk ; Sebastian Pado

Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.

6 0.13965391 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

7 0.12345012 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

8 0.12139196 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling

9 0.10678686 54 acl-2010-Boosting-Based System Combination for Machine Translation

10 0.10080064 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

11 0.083995603 243 acl-2010-Tree-Based and Forest-Based Translation

12 0.083801478 165 acl-2010-Learning Script Knowledge with Web Experiments

13 0.081218958 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation

14 0.080168441 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

15 0.078384787 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment

16 0.076160237 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking

17 0.07440953 102 acl-2010-Error Detection for Statistical Machine Translation Using Linguistic Features

18 0.070113733 264 acl-2010-Wrapping up a Summary: From Representation to Generation

19 0.069550827 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

20 0.063508205 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.14), (1, -0.117), (2, -0.056), (3, -0.009), (4, 0.091), (5, 0.007), (6, 0.006), (7, -0.028), (8, -0.059), (9, 0.013), (10, 0.096), (11, 0.079), (12, 0.151), (13, -0.008), (14, 0.006), (15, 0.003), (16, -0.103), (17, 0.05), (18, -0.118), (19, 0.088), (20, 0.24), (21, 0.133), (22, -0.113), (23, -0.093), (24, 0.102), (25, -0.229), (26, -0.014), (27, 0.032), (28, -0.272), (29, -0.139), (30, -0.054), (31, 0.187), (32, -0.088), (33, 0.032), (34, 0.059), (35, -0.054), (36, -0.029), (37, -0.033), (38, 0.019), (39, 0.012), (40, -0.014), (41, -0.105), (42, -0.018), (43, 0.11), (44, -0.079), (45, 0.035), (46, 0.002), (47, 0.037), (48, 0.083), (49, 0.076)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94158536 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.

2 0.64986163 107 acl-2010-Exemplar-Based Models for Word Meaning in Context

Author: Katrin Erk ; Sebastian Pado

Abstract: This paper describes ongoing work on distributional models for word meaning in context. We abandon the usual one-vectorper-word paradigm in favor of an exemplar model that activates only relevant occurrences. On a paraphrasing task, we find that a simple exemplar model outperforms more complex state-of-the-art models.

3 0.58414072 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

Author: Graeme Blackwood ; Adria de Gispert ; William Byrne

Abstract: This paper presents an efficient implementation of linearised lattice minimum Bayes-risk decoding using weighted finite state transducers. We introduce transducers to efficiently count lattice paths containing n-grams and use these to gather the required statistics. We show that these procedures can be implemented exactly through simple transformations of word sequences to sequences of n-grams. This yields a novel implementation of lattice minimum Bayes-risk decoding which is fast and exact even for very large lattices.

4 0.5392617 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

Author: Stefan Thater ; Hagen Furstenau ; Manfred Pinkal

Abstract: We present a syntactically enriched vector model that supports the computation of contextualized semantic representations in a quasi compositional fashion. It employs a systematic combination of first- and second-order context vectors. We apply our model to two different tasks and show that (i) it substantially outperforms previous work on a paraphrase ranking task, and (ii) achieves promising results on a wordsense similarity task; to our knowledge, it is the first time that an unsupervised method has been applied to this task.

5 0.43412367 166 acl-2010-Learning Word-Class Lattices for Definition and Hypernym Extraction

Author: Roberto Navigli ; Paola Velardi

Abstract: Definition extraction is the task of automatically identifying definitional sentences within texts. The task has proven useful in many research areas including ontology learning, relation extraction and question answering. However, current approaches mostly focused on lexicosyntactic patterns suffer from both low recall and precision, as definitional sentences occur in highly variable syntactic structures. In this paper, we propose WordClass Lattices (WCLs), a generalization of word lattices that we use to model textual definitions. Lattices are learned from a dataset of definitions from Wikipedia. Our method is applied to the task of definition and hypernym extraction and compares favorably to other pattern general– – ization methods proposed in the literature.

6 0.39203501 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling

7 0.3471815 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

8 0.34681684 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

9 0.34672374 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

10 0.329005 54 acl-2010-Boosting-Based System Combination for Machine Translation

11 0.32686028 223 acl-2010-Tackling Sparse Data Issue in Machine Translation Evaluation

12 0.32172376 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

13 0.31072149 165 acl-2010-Learning Script Knowledge with Web Experiments

14 0.29704344 154 acl-2010-Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

15 0.27669284 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment

16 0.27418846 45 acl-2010-Balancing User Effort and Translation Error in Interactive Machine Translation via Confidence Measures

17 0.27084056 135 acl-2010-Hindi-to-Urdu Machine Translation through Transliteration

18 0.26880509 7 acl-2010-A Generalized-Zero-Preserving Method for Compact Encoding of Concept Lattices

19 0.2686404 243 acl-2010-Tree-Based and Forest-Based Translation

20 0.26854834 244 acl-2010-TrustRank: Inducing Trust in Automatic Translations via Ranking


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(16, 0.048), (18, 0.018), (25, 0.054), (33, 0.026), (42, 0.01), (44, 0.012), (58, 0.277), (59, 0.185), (73, 0.036), (78, 0.016), (80, 0.011), (83, 0.078), (98, 0.117)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.84395993 225 acl-2010-Temporal Information Processing of a New Language: Fast Porting with Minimal Resources

Author: Francisco Costa ; Antonio Branco

Abstract: We describe the semi-automatic adaptation of a TimeML annotated corpus from English to Portuguese, a language for which TimeML annotated data was not available yet. In order to validate this adaptation, we use the obtained data to replicate some results in the literature that used the original English data. The fact that comparable results are obtained indicates that our approach can be used successfully to rapidly create semantically annotated resources for new languages.

same-paper 2 0.79772359 192 acl-2010-Paraphrase Lattice for Statistical Machine Translation

Author: Takashi Onishi ; Masao Utiyama ; Eiichiro Sumita

Abstract: Lattice decoding in statistical machine translation (SMT) is useful in speech translation and in the translation of German because it can handle input ambiguities such as speech recognition ambiguities and German word segmentation ambiguities. We show that lattice decoding is also useful for handling input variations. Given an input sentence, we build a lattice which represents paraphrases of the input sentence. We call this a paraphrase lattice. Then, we give the paraphrase lattice as an input to the lattice decoder. The decoder selects the best path for decoding. Using these paraphrase lattices as inputs, we obtained significant gains in BLEU scores for IWSLT and Europarl datasets.

3 0.70829999 227 acl-2010-The Impact of Interpretation Problems on Tutorial Dialogue

Author: Myroslava O. Dzikovska ; Johanna D. Moore ; Natalie Steinhauser ; Gwendolyn Campbell

Abstract: Supporting natural language input may improve learning in intelligent tutoring systems. However, interpretation errors are unavoidable and require an effective recovery policy. We describe an evaluation of an error recovery policy in the BEETLE II tutorial dialogue system and discuss how different types of interpretation problems affect learning gain and user satisfaction. In particular, the problems arising from student use of non-standard terminology appear to have negative consequences. We argue that existing strategies for dealing with terminology problems are insufficient and that improving such strategies is important in future ITS research.

4 0.64260638 156 acl-2010-Knowledge-Rich Word Sense Disambiguation Rivaling Supervised Systems

Author: Simone Paolo Ponzetto ; Roberto Navigli

Abstract: One of the main obstacles to highperformance Word Sense Disambiguation (WSD) is the knowledge acquisition bottleneck. In this paper, we present a methodology to automatically extend WordNet with large amounts of semantic relations from an encyclopedic resource, namely Wikipedia. We show that, when provided with a vast amount of high-quality semantic relations, simple knowledge-lean disambiguation algorithms compete with state-of-the-art supervised WSD systems in a coarse-grained all-words setting and outperform them on gold-standard domain-specific datasets.

5 0.64064336 254 acl-2010-Using Speech to Reply to SMS Messages While Driving: An In-Car Simulator User Study

Author: Yun-Cheng Ju ; Tim Paek

Abstract: Speech recognition affords automobile drivers a hands-free, eyes-free method of replying to Short Message Service (SMS) text messages. Although a voice search approach based on template matching has been shown to be more robust to the challenging acoustic environment of automobiles than using dictation, users may have difficulties verifying whether SMS response templates match their intended meaning, especially while driving. Using a high-fidelity driving simulator, we compared dictation for SMS replies versus voice search in increasingly difficult driving conditions. Although the two approaches did not differ in terms of driving performance measures, users made about six times more errors on average using dictation than voice search. 1

6 0.6380524 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

7 0.62836683 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

8 0.62469947 258 acl-2010-Weakly Supervised Learning of Presupposition Relations between Verbs

9 0.62469649 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

10 0.62407666 114 acl-2010-Faster Parsing by Supertagger Adaptation

11 0.62354982 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

12 0.61919302 26 acl-2010-All Words Domain Adapted WSD: Finding a Middle Ground between Supervision and Unsupervision

13 0.6182974 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

14 0.61697316 15 acl-2010-A Semi-Supervised Key Phrase Extraction Approach: Learning from Title Phrases through a Document Semantic Network

15 0.61600053 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment

16 0.61454016 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment

17 0.61377668 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

18 0.61368901 169 acl-2010-Learning to Translate with Source and Target Syntax

19 0.6123414 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

20 0.6121853 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons