acl acl2010 acl2010-51 knowledge-graph by maker-knowledge-mining

51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation


Source: pdf

Author: Boxing Chen ; George Foster ; Roland Kuhn

Abstract: This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ca Abstract This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc. [sent-5, score-0.671]

2 The sense similarity scores are computed by using the vector space model. [sent-7, score-0.699]

3 We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. [sent-8, score-1.424]

4 Similarity scores are used as additional features of the translation model to improve translation performance. [sent-9, score-0.391]

5 Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. [sent-10, score-0.326]

6 There has been a lot of work to compute the sense similarity between terms based on their distribution in a corpus, such as (Hindle, 1990; Lund and Burgess, 1996; Landauer and Dumais, 1997; Lin, 1998; Turney, 2001 ; Pantel and Lin, 2002; Pado and Lapata, 2007). [sent-14, score-0.591]

7 Given two terms to be compared, one first extracts various features for each term from their contexts in a corpus and forms a vector space model (VSM); then, one computes their similarity by using similarity functions. [sent-16, score-1.012]

8 The similarity function which has been most widely used is cosine distance (Salton and McGill, 1983); other similarity functions include Euclidean distance, City Block distance (Bullinaria and Levy; 2007), and Dice and Jaccard coefficients (Frakes and Baeza-Yates, 1992), etc. [sent-18, score-1.068]

9 Measures of monolin- gual sense similarity have been widely used in many applications, such as synonym recognizing (Landauer and Dumais, 1997), word clustering (Pantel and Lin 2002), word sense disambiguation (Yuret and Yatbaz 2009), etc. [sent-19, score-0.742]

10 Use of the vector space model to compute sense similarity has also been adapted to the multilingual condition, based on the assumption that two terms with similar meanings often occur in comparable contexts across languages. [sent-20, score-0.792]

11 The vectors in different languages are first mapped to a common space using an initial bilingual dictionary, and then compared. [sent-22, score-0.285]

12 However, there is no previous work that uses the VSM to compute sense similarity for terms from parallel corpora. [sent-23, score-0.641]

13 the translation probabilities in a translation model, for units from parallel corpora are mainly based on the co-occurrence counts of the two units. [sent-26, score-0.5]

14 Therefore, questions emerge: how good is the sense similarity computed via VSM for two units from parallel corpora? [sent-27, score-0.711]

15 In this paper, we try to answer these questions, focusing on sense similarity applied to the SMT task. [sent-29, score-0.543]

16 Due to noise in the training corpus or wrong word alignment, the source and target sides of some rules are not semantically equivalent, as can be seen from the following 834 Proce dinUgsp osfa tlhae, 4S8wthed Aen n,u 1a1l-1 M6e Jeutilnyg 2 o0f1 t0h. [sent-31, score-0.403]

17 In this work, we first propose new algorithms to compute the sense similarity between two units (unit here includes word, phrase, rule, etc. [sent-35, score-0.671]

18 Second, we use the sense similarities between the source and target sides of a translation rule to improve statistical machine translation performance. [sent-37, score-1.139]

19 This work attempts to measure directly the sense similarity for units from different languages by comparing their contexts1. [sent-38, score-0.623]

20 Our contribution includes proposing new bilingual sense similarity algorithms and applying them to machine translation. [sent-39, score-0.712]

21 We chose a hierarchical phrase-based SMT system as our baseline; thus, the units involved in computation of sense similarities are hierarchical rules. [sent-40, score-0.477]

22 2 Hierarchical phrase-based MT system The hierarchical phrase-based translation method (Chiang, 2005; Chiang, 2007) is a formal syntax- based translation modeling method; its translation model is a weighted synchronous context free grammar (SCFG). [sent-41, score-0.731]

23 An SCFG rule has the following form: X → γ, where X is a non-terminal symbol shared by all the rules; each rule has at most two nonterminals. [sent-43, score-0.304]

24 α, ~ 1 There has been a lot of work (more details in Section 7) on applying word sense disambiguation (WSD) techniques in SMT for translation selection. [sent-46, score-0.333]

25 However, WSD techniques for SMT do so indirectly, using source-side context to help select a particular translation for a source rule. [sent-47, score-0.449]

26 Figure 1: example of hierarchical rule pairs and their context features. [sent-48, score-0.379]

27 3 Bag-of-Words Vector Space Model To compute the sense similarity via VSM, we follow the previous work (Lin, 1998) and represent the source and target side of a rule by feature vectors. [sent-55, score-1.093]

28 In our work, each feature corresponds to a context word which co-occurs with the translation rule. [sent-56, score-0.349]

29 1 Context Features In the hierarchical phrase-based translation method, the translation rules are extracted by abstracting some words from an initial phrase pair (Chiang, 2005). [sent-58, score-0.526]

30 Consider a rule with nonterminals on the source and target side; for a giv- en instance of the rule (a particular phrase pair in the training corpus), the context will be the words instantiating the non-terminals. [sent-59, score-0.753]

31 For example in Figure 1, if we 835 have an initial phrase pair 他 出席 了 会议 ||| he attended the meeting, and we extract four rules from this initial phrase: 他 出席 了 X1 ||| he attended X1, 会议 ||| the meeting, 他X1 会议 ||| he X1 the meeting, and 出席 了 ||| attended. [sent-61, score-0.346]

32 2 Bag-of-Words Model For each side of a translation rule pair, its context words are all collected from the training data, and two “bags-of-words” which consist of collections of source and target context words cooccurring with the rule’s source and target sides are created. [sent-64, score-1.259]

33 eJf}I} (1) where fi(1≤ i≤ I) are source context words which co-occur with the source side of rule α , and ej (1≤ j ≤ J) are target context words which co-occur with the target side of rule γ . [sent-68, score-1.355]

34 Therefore, we can represent source and target sides of the rule by vectors vvf and as in Equation (2): vve v vef= { wwe1f1, wwef2 ,. [sent-69, score-0.69]

35 ,w,weJfI} (2) where wfi and wej are values for each source and target context feature; normally, these values are based on the counts of the words in the corresponding bags. [sent-72, score-0.583]

36 Let c ( c ∈ Bf or c ∈ Be ) be a context word and F(r, c) be the frequency count of a rule r ( α or γ ) co-occurring with the context word c. [sent-76, score-0.438]

37 We consider IBM model 1 probabilities and cosine distance similarity functions. [sent-83, score-0.566]

38 1 IBM Model 1Probabilities For the IBM model 1 similarity function, we take the geometric mean of symmetrized conditional IBM model 1 (Brown et al. [sent-85, score-0.378]

39 2 Vector Space Mapping A common way to calculate semantic similarity is by vector space cosine distance; we will also 836 use this similarity function in our algorithm. [sent-89, score-1.04]

40 Therefore, we need to first map a vector into the space of the other vector, so that the similarity can be calculated. [sent-91, score-0.496]

41 Fung (1998) and Rapp (1999) map the vector onedimension-to-one-dimension (a context word is a dimension in each vector space) from one language to another language via an initial bilingual dictionary. [sent-92, score-0.42]

42 Our goal is given a source pattern to distinguish between the senses of its associated target patterns. [sent-95, score-0.262]

43 Therefore, we map all vectors in target language into the vector space in the source language. [sent-96, score-0.483]

44 What we want is a representation vva in the source language space of the target vector vve . [sent-97, score-0.524]

45 To get vva , we can let , the weight of the ith source feature, be a linear combination over target features. [sent-98, score-0.362]

46 That is to say, given a source feature weight for fi, each target feature weight is linked to it with some probability. [sent-99, score-0.338]

47 So that we can calculate a transformed vector from the target vectors by calculating weights wafi using a translation lexicon: – – wafi wafi=∑j=J1Pr(fi|ej)wej (10) where p(fi | ej ) is a lexical probability (we use IBM model 1 probability). [sent-100, score-0.807]

48 Now the source vector and the mapped vector have the same dimensions avs shown in (11): vva v vaf= { wwaf 1 , wwaf 2 , . [sent-101, score-0.48]

49 3 Naïve Cosine Distance Similarity The standard cosine distance is defined as the inner product of the two vectors vvf and normalized by their norms. [sent-104, score-0.341]

50 4 Improved Similarity Function To incorporate more information than the original similarity functions IBM model 1 probabilities in Equation (6) and naïve cosine distance similarity function in Equation (12) we refine the similarity function and propose a new algorithm. [sent-107, score-1.379]

51 Therefore, the original similarity functions are to compare the two context vectors built on full training data directly, as shown in Equation (13). [sent-112, score-0.681]

52 sim(α, γ) = sim(Cffull, Cefull) (13) Then, we propose a new similarity function as follows: sim(α, γ) = sim(Cffull,Ccfooc)λ1 ⋅sim(Ccfooc,Cceooc)λ2 ⋅sim(Cefull,Cceooc)λ3 (14) λi where the parameters (i=1,2,3) can be tuned via minimal error rate training (MERT) (Och, 2003). [sent-113, score-0.378]

53 However, when it is linked with another unit in the other language, its sense pool is constrained and is just 837 a subset of the whole sense set. [sent-116, score-0.441]

54 sim(Cffull,Cfcooc) is the metric which evaluates the similarity between the whole sense pool of α and the sense pool when α co-occurs with γ ; sim(Cefull ,Cceooc) is the analogous similarity metric for γ . [sent-117, score-1.166]

55 These two metrics both evaluate the similarity for two vectors in the same language, so using cosine distance to compute the similarity is straightforward. [sent-119, score-1.095]

56 sim(Cfcooc, Cceooc) computes the similarity between the context vectors when α and γ co-occur. [sent-121, score-0.624]

57 We may compute sim(Cfcooc, Cceooc) by using IBM model 1 probability and cosine distance similarity functions as Equation (6) and (12). [sent-122, score-0.671]

58 5 Experiments We evaluate the algorithm of bilingual sense similarity via machine translation. [sent-124, score-0.712]

59 The sense similarity scores are used as feature functions in the translation model. [sent-125, score-0.806]

60 In particular, all the allowed bilingual corpora except the UN corpus and Hong Kong Hansard corpus have been used for estimating the translation model. [sent-130, score-0.303]

61 The parallel training data contains 21 million target words; both the dev set and test set contain 2000 sentences; one reference is provided for each source input sentence. [sent-145, score-0.479]

62 2 Results For the baseline, we train the translation model by following (Chiang, 2005; Chiang, 2007) and our decoder is Joshua5, an open-source hierarchical phrase-based machine translation system written in Java. [sent-148, score-0.454]

63 Then, we need to set the sizes of the vectors to balance the computing time and translation accu4 http://www. [sent-154, score-0.304]

64 , we keep only the top N context words with the highest feature value for each side of a rule . [sent-162, score-0.383]

65 In the following, we use “Alg1” to represent the original similarity functions which compare the two context vectors built on full training data, as in Equation (13); while we use 6 “Alg2” to represent the improved similarity as in Equation (14). [sent-163, score-1.059]

66 “IBM” represents IBM model 1 probabilities, and “COS” represents cosine distance similarity function. [sent-164, score-0.566]

67 After carrying out a series of additional experiments on the small data condition and observing the results on the dev set, we set the size of the vector to 500 for Alg1; while for Alg2, we set the sizes of Cffull and CefullN1 to 1000, and the sizes of Cfcooc and Cecooc N2 to 100. [sent-165, score-0.379]

68 The sizes of the vectors in Alg2 are set in the following process: first, we set N2 to 500 and let N1 range from 500 to 3,000, we observed that the dev set got best performance when N1was 1000; then we set N1to 1000 and let N1 range from 50 to 1000, we got best performance when N1=100. [sent-166, score-0.377]

69 Alg1 represents the original similarity functions as in Equation (13); while Alg2 represents the improved similarity as in Equation (14). [sent-171, score-0.813]

70 IBM represents IBM model 1 probability, and COS represents cosine distance similarity function. [sent-172, score-0.566]

71 So we filter the context vectors by only considering the feature values. [sent-181, score-0.284]

72 The improved similarity function Alg2 makes it possible to incorporate monolingual semantic similarity on top of the bilingual semantic similarity, thus it may improve the accuracy of the similarity estimate. [sent-184, score-1.401]

73 We can see that IBM model 1 and cosine distance similarity function both obtained significant improvement on all test sets of the two tasks. [sent-192, score-0.566]

74 This is because simIBM(Cfcooc,Cecooc) scores are more diverse than the latter when the number of context features is small (there are many rules that have only a few contexts. [sent-199, score-0.26]

75 ) For an extreme exam- ple, suppose that there is only one context word in each vector of source and target context features, and the translation probability of the two context words is not 0. [sent-200, score-0.93]

76 In this case, simIBM(Cfcooc,Cecooc) reflects the translation probability of the context word pair, while simCOS(Cfcooc,Cecooc) is always 1. [sent-201, score-0.311]

77 The monolingual similarity scores give it the ability to avoid “dangerous” words, and choose alternatives (such as larger phrase translations) when available. [sent-209, score-0.464]

78 Third, the similarity function of Alg2 consistently achieved further improvement by incorporating the monolingual similarities computed for the source and target side. [sent-210, score-0.784]

79 2 Effect of Combining the Two Similarities We then combine the two similarity scores by using both of them as features to see if we could obtain further improvement. [sent-214, score-0.433]

80 27650 8 Table 5: Results (BLEU%) for combination of two similarity scores. [sent-222, score-0.378]

81 3 Comparison Features with Simple Contextual Now, we try to answer the question: can the similarity features computed by the function in Equation (14) be replaced with some other simple features? [sent-225, score-0.471]

82 Nf(α)=∑fi∈CffullF(α,fi) (15) Ne(γ)=∑ej∈CefullF(γ,ej) (16) Ef(α,γ)=∑fi∈CNfcoofc(Fα()α,fi) (17) Ee(α,γ)=∑ej∈CNceoeoc(Fγ)(γ,ej) (18) where F(α, fi) and F(γ, ej ) are the frequency counts of rule α or γ co-occurring with the context word fi or ej respectively. [sent-227, score-0.743]

83 Although all these features have obtained some improvements on dev set, there was no significant effect on the test sets. [sent-234, score-0.262]

84 This means simple features based on context, such as the sum of the counts of the context features, are not as helpful as the sense similarity computed by Equation (14). [sent-235, score-0.813]

85 4 Null Context Feature There are two cases where no context word can be extracted according to the definition of context in Section 3. [sent-237, score-0.286]

86 The second case is when for some rule pairs, either their source or target contexts are out of the span limit of the initial phrase, so that we cannot extract contexts for those rule-pairs. [sent-240, score-0.58]

87 We assign a uniform number as their bilingual sense similarity score, and this number is tuned through MERT. [sent-243, score-0.678]

88 5 Discussion Our aim in this paper is to characterize the semantic similarity of bilingual hierarchical rules. [sent-256, score-0.642]

89 We can make several observations concerning our features: 1) Rules that are largely syntactic in nature, such as 的 X ||| the X of, will have very diffuse “meanings” and therefore lower similarity scores. [sent-257, score-0.378]

90 2) In addition to bilingual similarity, Alg2 relies on the degree of monolingual similarity between the sense of a source or target unit within a rule, and the sense of the unit in general. [sent-260, score-1.289]

91 1, it appears to have a synergistic effect when used along with the bilingual similarity feature. [sent-265, score-0.513]

92 3) Finally, we note that many of the features we use for capturing similarity, such as the context “the, of” for instantiations of X in the unit the X of, are arguably more syntactic than semantic. [sent-266, score-0.269]

93 Key papers by Carpuat and Wu (2007) and Chan et al (2007) showed that word-sense disambiguation (WSD) techniques relying on source-language context can be effective in selecting translations in phrase-based and hierarchical SMT. [sent-269, score-0.323]

94 Work by Wu and Fung (2009) breaks new ground in attempting to match semantic roles derived from a semantic parser across source and target languages. [sent-271, score-0.352]

95 In another words, WSD explicitly tries to choose a translation given the current source context, while our work rates rule pairs independent of the current context. [sent-273, score-0.458]

96 8 Conclusions and Future Work In this paper, we have proposed an approach that uses the vector space model to compute the sense 841 similarity for terms from parallel corpora and applied it to statistical machine translation. [sent-274, score-0.84]

97 We saw that the bilingual sense similarity computed by our algorithm led to significant improvements. [sent-275, score-0.716]

98 We have shown that the sense similarity computed between units from parallel corpora by means of our algorithm is helpful for at least one multilingual application: statistical machine translation. [sent-277, score-0.792]

99 Finally, although we described and evaluated bilingual sense similarity algorithms applied to a hierarchical phrase-based system, this method is also suitable for syntax-based MT systems and phrase-based MT systems. [sent-278, score-0.762]

100 For a syntax-based system, the context of a rule could be defined similarly to the way it was defined in the work described above. [sent-280, score-0.295]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('similarity', 0.378), ('sim', 0.246), ('cfcooc', 0.225), ('ibm', 0.187), ('cffull', 0.175), ('translation', 0.168), ('dev', 0.167), ('sense', 0.165), ('rule', 0.152), ('cceooc', 0.15), ('context', 0.143), ('ej', 0.141), ('source', 0.138), ('bilingual', 0.135), ('fi', 0.132), ('target', 0.124), ('cosine', 0.121), ('attended', 0.12), ('chiang', 0.111), ('bf', 0.109), ('nist', 0.106), ('vectors', 0.103), ('cefull', 0.1), ('simibm', 0.1), ('vva', 0.1), ('wafi', 0.1), ('wej', 0.1), ('equation', 0.092), ('vsm', 0.089), ('hierarchical', 0.084), ('contexts', 0.083), ('units', 0.08), ('sides', 0.079), ('smt', 0.076), ('wmt', 0.075), ('ccfooc', 0.075), ('cecooc', 0.075), ('unit', 0.071), ('vector', 0.071), ('distance', 0.067), ('simcos', 0.066), ('cos', 0.065), ('similarities', 0.064), ('rules', 0.062), ('bleu', 0.062), ('al', 0.058), ('functions', 0.057), ('features', 0.055), ('lund', 0.053), ('pado', 0.051), ('fung', 0.051), ('parallel', 0.05), ('wsd', 0.05), ('frakes', 0.05), ('sqrt', 0.05), ('vvf', 0.05), ('wwaf', 0.05), ('side', 0.05), ('compute', 0.048), ('space', 0.047), ('statistical', 0.047), ('semantic', 0.045), ('phrase', 0.044), ('blg', 0.044), ('yuret', 0.044), ('boxing', 0.044), ('bullinaria', 0.044), ('dangerous', 0.044), ('mauser', 0.044), ('vve', 0.044), ('wfi', 0.044), ('pantel', 0.043), ('monolingual', 0.042), ('landauer', 0.042), ('condition', 0.041), ('null', 0.041), ('pool', 0.04), ('improvements', 0.04), ('hong', 0.039), ('feature', 0.038), ('translations', 0.038), ('computed', 0.038), ('gimpel', 0.038), ('bangalore', 0.038), ('carpuat', 0.038), ('mt', 0.037), ('got', 0.037), ('burgess', 0.036), ('lm', 0.035), ('canada', 0.035), ('observing', 0.034), ('rapp', 0.034), ('gual', 0.034), ('roland', 0.034), ('machine', 0.034), ('counts', 0.034), ('sizes', 0.033), ('lin', 0.033), ('scfg', 0.033), ('salton', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

Author: Boxing Chen ; George Foster ; Roland Kuhn

Abstract: This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. 1

2 0.20738277 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

Author: Zhiyang Wang ; Yajuan Lv ; Qun Liu ; Young-Sook Hwang

Abstract: This paper presents a novel filtration criterion to restrict the rule extraction for the hierarchical phrase-based translation model, where a bilingual but relaxed wellformed dependency restriction is used to filter out bad rules. Furthermore, a new feature which describes the regularity that the source/target dependency edge triggers the target/source word is also proposed. Experimental results show that, the new criteria weeds out about 40% rules while with translation performance improvement, and the new feature brings an- other improvement to the baseline system, especially on larger corpus.

3 0.20239675 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

Author: Lei Cui ; Dongdong Zhang ; Mu Li ; Ming Zhou ; Tiejun Zhao

Abstract: In hierarchical phrase-based SMT systems, statistical models are integrated to guide the hierarchical rule selection for better translation performance. Previous work mainly focused on the selection of either the source side of a hierarchical rule or the target side of a hierarchical rule rather than considering both of them simultaneously. This paper presents a joint model to predict the selection of hierarchical rules. The proposed model is estimated based on four sub-models where the rich context knowledge from both source and target sides is leveraged. Our method can be easily incorporated into the practical SMT systems with the log-linear model framework. The experimental results show that our method can yield significant improvements in performance.

4 0.17934138 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

Author: Joern Wuebker ; Arne Mauser ; Hermann Ney

Abstract: Several attempts have been made to learn phrase translation probabilities for phrasebased statistical machine translation that go beyond pure counting of phrases in word-aligned training data. Most approaches report problems with overfitting. We describe a novel leavingone-out approach to prevent over-fitting that allows us to train phrase models that show improved translation performance on the WMT08 Europarl German-English task. In contrast to most previous work where phrase models were trained separately from other models used in translation, we include all components such as single word lexica and reordering mod- els in training. Using this consistent training of phrase models we are able to achieve improvements of up to 1.4 points in BLEU. As a side effect, the phrase table size is reduced by more than 80%.

5 0.17358588 54 acl-2010-Boosting-Based System Combination for Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Muhua Zhu ; Huizhen Wang

Abstract: In this paper, we present a simple and effective method to address the issue of how to generate diversified translation systems from a single Statistical Machine Translation (SMT) engine for system combination. Our method is based on the framework of boosting. First, a sequence of weak translation systems is generated from a baseline system in an iterative manner. Then, a strong translation system is built from the ensemble of these weak translation systems. To adapt boosting to SMT system combination, several key components of the original boosting algorithms are redesigned in this work. We evaluate our method on Chinese-to-English Machine Translation (MT) tasks in three baseline systems, including a phrase-based system, a hierarchical phrasebased system and a syntax-based system. The experimental results on three NIST evaluation test sets show that our method leads to significant improvements in translation accuracy over the baseline systems. 1

6 0.15660624 169 acl-2010-Learning to Translate with Source and Target Syntax

7 0.14745595 27 acl-2010-An Active Learning Approach to Finding Related Terms

8 0.14412409 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

9 0.14366333 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD

10 0.13740416 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

11 0.13581629 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

12 0.12753192 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

13 0.12209618 69 acl-2010-Constituency to Dependency Translation with Forests

14 0.12196562 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion

15 0.12044434 147 acl-2010-Improving Statistical Machine Translation with Monolingual Collocation

16 0.12014114 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection

17 0.11652544 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

18 0.11577257 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

19 0.1128642 183 acl-2010-Online Generation of Locality Sensitive Hash Signatures

20 0.11185324 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.305), (1, -0.157), (2, -0.093), (3, 0.042), (4, 0.17), (5, 0.033), (6, 0.054), (7, 0.006), (8, -0.103), (9, 0.047), (10, 0.111), (11, 0.105), (12, 0.182), (13, -0.036), (14, 0.023), (15, 0.009), (16, 0.023), (17, -0.162), (18, -0.127), (19, -0.038), (20, 0.1), (21, -0.011), (22, 0.03), (23, 0.116), (24, 0.019), (25, 0.046), (26, -0.077), (27, -0.049), (28, 0.159), (29, -0.094), (30, -0.087), (31, -0.072), (32, 0.015), (33, -0.066), (34, 0.045), (35, 0.111), (36, 0.049), (37, -0.003), (38, -0.071), (39, 0.017), (40, -0.021), (41, 0.077), (42, -0.05), (43, 0.085), (44, -0.029), (45, -0.032), (46, -0.017), (47, -0.048), (48, -0.064), (49, -0.056)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97972184 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

Author: Boxing Chen ; George Foster ; Roland Kuhn

Abstract: This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. 1

2 0.73415685 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

Author: Lei Cui ; Dongdong Zhang ; Mu Li ; Ming Zhou ; Tiejun Zhao

Abstract: In hierarchical phrase-based SMT systems, statistical models are integrated to guide the hierarchical rule selection for better translation performance. Previous work mainly focused on the selection of either the source side of a hierarchical rule or the target side of a hierarchical rule rather than considering both of them simultaneously. This paper presents a joint model to predict the selection of hierarchical rules. The proposed model is estimated based on four sub-models where the rich context knowledge from both source and target sides is leveraged. Our method can be easily incorporated into the practical SMT systems with the log-linear model framework. The experimental results show that our method can yield significant improvements in performance.

3 0.69524562 232 acl-2010-The S-Space Package: An Open Source Package for Word Space Models

Author: David Jurgens ; Keith Stevens

Abstract: We present the S-Space Package, an open source framework for developing and evaluating word space algorithms. The package implements well-known word space algorithms, such as LSA, and provides a comprehensive set of matrix utilities and data structures for extending new or existing models. The package also includes word space benchmarks for evaluation. Both algorithms and libraries are designed for high concurrency and scalability. We demonstrate the efficiency of the reference implementations and also provide their results on six benchmarks.

4 0.67423278 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

Author: Zhiyang Wang ; Yajuan Lv ; Qun Liu ; Young-Sook Hwang

Abstract: This paper presents a novel filtration criterion to restrict the rule extraction for the hierarchical phrase-based translation model, where a bilingual but relaxed wellformed dependency restriction is used to filter out bad rules. Furthermore, a new feature which describes the regularity that the source/target dependency edge triggers the target/source word is also proposed. Experimental results show that, the new criteria weeds out about 40% rules while with translation performance improvement, and the new feature brings an- other improvement to the baseline system, especially on larger corpus.

5 0.6323058 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation

Author: Xiangyu Duan ; Min Zhang ; Haizhou Li

Abstract: The pipeline of most Phrase-Based Statistical Machine Translation (PB-SMT) systems starts from automatically word aligned parallel corpus. But word appears to be too fine-grained in some cases such as non-compositional phrasal equivalences, where no clear word alignments exist. Using words as inputs to PBSMT pipeline has inborn deficiency. This paper proposes pseudo-word as a new start point for PB-SMT pipeline. Pseudo-word is a kind of basic multi-word expression that characterizes minimal sequence of consecutive words in sense of translation. By casting pseudo-word searching problem into a parsing framework, we search for pseudo-words in a monolingual way and a bilingual synchronous way. Experiments show that pseudo-word significantly outperforms word for PB-SMT model in both travel translation domain and news translation domain. 1

6 0.6303671 183 acl-2010-Online Generation of Locality Sensitive Hash Signatures

7 0.60990137 54 acl-2010-Boosting-Based System Combination for Machine Translation

8 0.60505486 62 acl-2010-Combining Orthogonal Monolingual and Multilingual Sources of Evidence for All Words WSD

9 0.59958994 119 acl-2010-Fixed Length Word Suffix for Factored Statistical Machine Translation

10 0.59359312 3 acl-2010-A Bayesian Method for Robust Estimation of Distributional Similarities

11 0.59066916 50 acl-2010-Bilingual Lexicon Generation Using Non-Aligned Signatures

12 0.57243127 27 acl-2010-An Active Learning Approach to Finding Related Terms

13 0.56786788 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models

14 0.55878919 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

15 0.55650282 169 acl-2010-Learning to Translate with Source and Target Syntax

16 0.53969646 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

17 0.53392375 107 acl-2010-Exemplar-Based Models for Word Meaning in Context

18 0.52646673 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

19 0.52644891 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

20 0.5172621 89 acl-2010-Distributional Similarity vs. PU Learning for Entity Set Expansion


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(14, 0.01), (25, 0.066), (42, 0.019), (43, 0.212), (59, 0.153), (71, 0.01), (73, 0.042), (78, 0.041), (83, 0.134), (84, 0.038), (98, 0.18)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94260883 67 acl-2010-Computing Weakest Readings

Author: Alexander Koller ; Stefan Thater

Abstract: We present an efficient algorithm for computing the weakest readings of semantically ambiguous sentences. A corpus-based evaluation with a large-scale grammar shows that our algorithm reduces over 80% of sentences to one or two readings, in negligible runtime, and thus makes it possible to work with semantic representations derived by deep large-scale grammars.

same-paper 2 0.86340666 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

Author: Boxing Chen ; George Foster ; Roland Kuhn

Abstract: This paper proposes new algorithms to compute the sense similarity between two units (words, phrases, rules, etc.) from parallel corpora. The sense similarity scores are computed by using the vector space model. We then apply the algorithms to statistical machine translation by computing the sense similarity between the source and target side of translation rule pairs. Similarity scores are used as additional features of the translation model to improve translation performance. Significant improvements are obtained over a state-of-the-art hierarchical phrase-based machine translation system. 1

3 0.83450556 52 acl-2010-Bitext Dependency Parsing with Bilingual Subtree Constraints

Author: Wenliang Chen ; Jun'ichi Kazama ; Kentaro Torisawa

Abstract: This paper proposes a dependency parsing method that uses bilingual constraints to improve the accuracy of parsing bilingual texts (bitexts). In our method, a targetside tree fragment that corresponds to a source-side tree fragment is identified via word alignment and mapping rules that are automatically learned. Then it is verified by checking the subtree list that is collected from large scale automatically parsed data on the target side. Our method, thus, requires gold standard trees only on the source side of a bilingual corpus in the training phase, unlike the joint parsing model, which requires gold standard trees on the both sides. Compared to the reordering constraint model, which requires the same training data as ours, our method achieved higher accuracy because ofricher bilingual constraints. Experiments on the translated portion of the Chinese Treebank show that our system outperforms monolingual parsers by 2.93 points for Chinese and 1.64 points for English.

4 0.80021948 184 acl-2010-Open-Domain Semantic Role Labeling by Modeling Word Spans

Author: Fei Huang ; Alexander Yates

Abstract: Most supervised language processing systems show a significant drop-off in performance when they are tested on text that comes from a domain significantly different from the domain of the training data. Semantic role labeling techniques are typically trained on newswire text, and in tests their performance on fiction is as much as 19% worse than their performance on newswire text. We investigate techniques for building open-domain semantic role labeling systems that approach the ideal of a train-once, use-anywhere system. We leverage recently-developed techniques for learning representations of text using latent-variable language models, and extend these techniques to ones that provide the kinds of features that are useful for semantic role labeling. In experiments, our novel system reduces error by 16% relative to the previous state of the art on out-of-domain text.

5 0.78976333 148 acl-2010-Improving the Use of Pseudo-Words for Evaluating Selectional Preferences

Author: Nathanael Chambers ; Dan Jurafsky

Abstract: This paper improves the use of pseudowords as an evaluation framework for selectional preferences. While pseudowords originally evaluated word sense disambiguation, they are now commonly used to evaluate selectional preferences. A selectional preference model ranks a set of possible arguments for a verb by their semantic fit to the verb. Pseudo-words serve as a proxy evaluation for these decisions. The evaluation takes an argument of a verb like drive (e.g. car), pairs it with an alternative word (e.g. car/rock), and asks a model to identify the original. This paper studies two main aspects of pseudoword creation that affect performance results. (1) Pseudo-word evaluations often evaluate only a subset of the words. We show that selectional preferences should instead be evaluated on the data in its entirety. (2) Different approaches to selecting partner words can produce overly optimistic evaluations. We offer suggestions to address these factors and present a simple baseline that outperforms the state-ofthe-art by 13% absolute on a newspaper domain.

6 0.78825408 144 acl-2010-Improved Unsupervised POS Induction through Prototype Discovery

7 0.78792238 76 acl-2010-Creating Robust Supervised Classifiers via Web-Scale N-Gram Data

8 0.78753328 55 acl-2010-Bootstrapping Semantic Analyzers from Non-Contradictory Texts

9 0.78695488 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation

10 0.78672606 145 acl-2010-Improving Arabic-to-English Statistical Machine Translation by Reordering Post-Verbal Subjects for Alignment

11 0.78664905 169 acl-2010-Learning to Translate with Source and Target Syntax

12 0.78554779 195 acl-2010-Phylogenetic Grammar Induction

13 0.78536665 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation

14 0.78508186 54 acl-2010-Boosting-Based System Combination for Machine Translation

15 0.78479314 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results

16 0.78435588 153 acl-2010-Joint Syntactic and Semantic Parsing of Chinese

17 0.78426671 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition

18 0.7830469 88 acl-2010-Discriminative Pruning for Discriminative ITG Alignment

19 0.78230858 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification

20 0.78206229 245 acl-2010-Understanding the Semantic Structure of Noun Phrase Queries