emnlp emnlp2011 emnlp2011-73 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole
Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.
Reference: text
sentIndex sentText sentNum sentScore
1 Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. [sent-6, score-0.725]
2 In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. [sent-7, score-0.721]
3 Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. [sent-8, score-1.009]
4 In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. [sent-9, score-1.063]
5 First, we explore word association measures and bilingual dictionaries to weigh the word pairs. [sent-10, score-0.345]
6 Later, we explore different selection strategies to remove the noisy pairs based on the association scores. [sent-11, score-0.287]
7 Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs. [sent-12, score-1.093]
8 Most of the existing approaches use manually aligned document pairs to find a common subspace in which the aligned document pairs are maximally correlated. [sent-19, score-0.384]
9 The discriminative approaches capture essential word co-occurrences in terms of two monolingual covariance matrices and a cross-covariance matrix. [sent-29, score-1.044]
10 Subsequently, they use these covariance matrices to find projection directions in each language such that aligned documents lie close to each other (Sec. [sent-30, score-1.133]
11 The strong reliance of these approaches on the covariance matrices leads to problems, especially with the noisy data caused either by the noisy words in a document or the noisy document alignments. [sent-32, score-1.198]
12 In this paper, we address the problem of identifying and removing noisy entries in the covariance matrices. [sent-39, score-0.723]
13 In the first stage, we explore the use of word association measures such as Mutual Information (MI) and Yule’s ω (Reis and Judd, 2000) in computing the strength of a word pair (Sec. [sent-41, score-0.183]
14 We also explore the use of bilingual dictionaries developed from cleaner resources such as parallel data. [sent-44, score-0.232]
15 In the second stage, we use the association strengths in filtering out the noisy word pairs from the covariance matrices. [sent-45, score-0.809]
16 We evaluate the utility of sparse covariance matrices in improving the bilingual projections incrementally (Sec. [sent-49, score-1.073]
17 We found that sparsifying the covariance matrices helps in general, but using cleaner resource such bilingual dictionaries performed best. [sent-55, score-1.204]
18 We mainly focus on representing the solution of CCA in terms of covariance matrices. [sent-57, score-0.629]
19 Given a training data of n aligned document pairs, CCA finds projection directions for each language, so that the documents when projected along these directions are maximally correlated (Hotelling, 1936). [sent-59, score-0.355]
20 (1) where Cxx = XXT, Cyy = Y YT are the monolingual covariance matrices, Cxy = XYT is the crosscovariance matrix and λ is the regularization parameter. [sent-73, score-0.887]
21 Using these eigenvectors as columns, we form the projection matrices A and B. [sent-74, score-0.397]
22 These projection matrices are used to map documents in both the languages into interlingual representation. [sent-75, score-0.49]
23 So, every non-zero entry of the crosscovariance matrix restricts the choice of the projection directions. [sent-85, score-0.217]
24 Every occurrence of a noisy word will have a non-zero contribution towards the covariance matrix making it dense, which in turn prevents the selection of appropriate projection directions. [sent-88, score-0.973]
25 In this section, we describe some techniques to recover the sparsity by removing the noisy entries from the covariance matrices. [sent-89, score-0.878]
26 We break this task into two sub problems: computing an association score for every word pair and then using an appropriate strategy to identify the noisy pairs based on their weights. [sent-90, score-0.18]
27 This measure uses information about the occurrence of a word pair in aligned documents and doesn’t use other statistics such as ‘how often this pair doesn ’t co-occur together’ and so on. [sent-105, score-0.195]
28 2 Mutual Information Association measures like covariance and Pointwise Mutual Information, which only use the frequency with which a word pair co-occurs, often overestimate the strength of low frequent words (Moore, 2004). [sent-108, score-0.697]
29 We treat the occurrence of a word in a document slightly different from others, we treat a word as occurring in a document if it has occurred more than its average frequency in the corpus. [sent-112, score-0.208]
30 4 Bilingual Dictionary The above three association measures use the same training data that is available to compute the covariance matrices in CCA. [sent-122, score-0.969]
31 Thus, their utility in bringing additional information, which is not captured by the covariance matrices, is arguable (our experiments show that they are indeed helpful). [sent-123, score-0.629]
32 Moreover, they use document level co-occurrence information which is coarse compared to the cooccurrence at sentence level or the translational information provided by a bilingual dictionary. [sent-124, score-0.195]
33 So, we use bilingual dictionaries as our final resource to weigh the word co-occurrences. [sent-125, score-0.193]
34 While the first three association measures can also be applied to monolingual data, bilingual dictionary can’t be used for weighting monolingual word pairs. [sent-132, score-0.619]
35 So in this case, we use either of the above mentioned techniques for weighting monolingual word pairs. [sent-133, score-0.206]
36 1 Thresholding A straight forward way to remove the noisy word co-occurrences is to zero out the entries of the cross-covariance matrix that are lower than a threshold. [sent-139, score-0.217]
37 So, iPf we want to remove some of the entries of the covariance matrix with minimal change in the value of the objective function, then the optimal choice is to sort the entries of the covariance matrix and filter out the less confident word pairs. [sent-142, score-1.545]
38 4 Monolingual Augmentation The above three selection strategies operate on the covariance matrices independently. [sent-176, score-0.986]
39 Specifically, we propose to augment the set of selected bilingual word pairs using the monolingual word pairs. [sent-178, score-0.356]
40 We first use any of the above mentioned strategies to select bilingual and monolingual word pairs. [sent-179, score-0.327]
41 Let Ixy, Ixx and Iyy be the binary matrices that indicate the selected word pairs based on the bilingual and monolingual association scores. [sent-180, score-0.628]
42 Then the monolingual augmentation strategy updates Ixy in the following way: Ixy ← Binarize(IxxIxyIyy) i. [sent-181, score-0.242]
43 , we multiply Ixy with the monolingual selection matrices and then binarize the resulting matrix. [sent-183, score-0.453]
44 Our monolingual augmentation is motivated by the following probabilistic interpretation: P(x,y) = XP(x|x′)P(y|y′)P(x′,y′) xX′ X,y′ which can be rewritten as P ← TxP(Ty)T where wTxh acnhd c Tany are monolingual Psta ←te t Transition matrices. [sent-184, score-0.415]
45 3 Our Approach In this section we summarize our approach for the task of finding aligned documents from a crosslingual comparable corpora. [sent-186, score-0.176]
46 The training phase involves finding projection directions for documents of both the languages. [sent-187, score-0.19]
47 We compute the covariance matrices using the training data. [sent-188, score-0.878]
48 2) to recover the sparseness in either only the cross-covariance or all of the covariance matrices. [sent-193, score-0.677]
49 Let Ixy, Ixx and Iyy be the binary matrices which represent the word pairs that are selected based on the chosen sparsification technique. [sent-194, score-0.379]
50 Let A and B be the matrices formed with top eigenvectors of Eq. [sent-199, score-0.303]
51 These projection matrices are used to map documents into the interlingual representation. [sent-201, score-0.49]
52 During the testing, given an English document x, finding an aligned Spanish document involves solv- ing: argmyaxrxT? [sent-205, score-0.211]
53 1 Experimental Setup We experiment with the task of finding aligned documents from a cross-lingual comparable corpora. [sent-213, score-0.176]
54 As the corpora are comparable, some documents in one collection have a comparable document in the other collection. [sent-215, score-0.184]
55 We evaluate our idea of sparsifying the covariance matrices incrementally. [sent-218, score-1.002]
56 That leaves us 12 different ways for sparsifying the covariance matrices, with each method having parameters to control the amount of sparseness. [sent-222, score-0.753]
57 We use the true feature correspondences to form the cross-covariance selection matrix Ixy (Sec. [sent-237, score-0.208]
58 For this experiment, we use the full monolingual covariance matrices. [sent-240, score-0.764]
59 We tokenize the documents, retain only the most frequent 2000 words in each language and convert the docu- Figure 2: Comparison of the word association measures along with different selection criteria. [sent-254, score-0.191]
60 The x-axis plots the number of non-zero entries in the covariance matrices and the y-axis plots the accuracy oftop-ranked document. [sent-255, score-1.002]
61 2 shows the performance of these different combinations with varying levels of sparsity in the covariance matrices. [sent-263, score-0.764]
62 We start with 2000 non-zero entries in the covariance matrices and experiment up to 20,000 non-zero entries. [sent-265, score-0.914]
63 Since our data set has 2000 words in each language, 2000 non-zero entries in a covariance matrix implies that, on an average, every word is associated with only one word. [sent-266, score-0.788]
64 selecting more number of elements in the covariance matrices, increases the performance slightly and then decreases again. [sent-270, score-0.629]
65 From the figure, it 936 seems that sparsifying the covariance matrices might help in improving the performance of the task. [sent-271, score-1.002]
66 This suggests that, apart from the weighting of the word pairs, appropriate selection of the word pairs is also equally important. [sent-274, score-0.208]
67 From this figure, we observe that Mutual Information and Yule’s ω perform competitively but they consistently outperform models that use covariance as the association measure. [sent-276, score-0.714]
68 2 Amount of Sparsity In the previous experiment, we used same level of sparsity for all the covariance matrices, i. [sent-280, score-0.736]
69 same number of associations were selected for each word in all the three covariance matrices. [sent-282, score-0.66]
70 In the following experiment, we use different levels of sparsity for the individual covariance matrices. [sent-283, score-0.764]
71 In the Yule+Match combination, we use Yule’s ω association measure for weighting the word pairs and use matching for selection. [sent-286, score-0.203]
72 In the Dictionary+Match combination, we use bilingual dictionary for sparsifying cross-covariance matrix, i. [sent-287, score-0.311]
73 And for monolingual word pairs, we use MI for weighting and matching for word pair selection. [sent-290, score-0.278]
74 For each level of sparsity of the cross-covariance matrix, we experiment with different levels of sparsity on the monolingual covariance matrices. [sent-291, score-1.006]
75 ‘Only XY’ indicates we use the full monolingual covariance matrices. [sent-292, score-0.764]
76 ‘Aug’ indicates that we use monolingual augmentation to refine the sparsity of the cross-covariance matrix (Sec. [sent-295, score-0.441]
77 From both the figures 3(a) and 3(b), we observe that ‘Only XY’ run (dark blue) performs poorly compared to the other runs, indicating that sparsifying all the covariance matrices is better than spar- sifying only the cross-covariance matrix. [sent-299, score-1.002]
78 Matching is used as selection criteria for all the covariance matrices. [sent-303, score-0.698]
79 The x-axis plots the threshold on bilingual translation probability and it determines the sparsity of Cxy. [sent-305, score-0.3]
80 Figure 3: Comparison of Yule+Match and Dictionary+Match combination with different levels of sparsity for the covariance matrices. [sent-307, score-0.764]
81 In both the figures, the x-axis plots the sparsity of the cross-covariance matrix and for each value we try different levels of sparsity on the monolingual covariance matrices (which are grouped together). [sent-308, score-1.391]
82 In both the above experiments, the performance bars are very similar when we use MI instead of Yule and vice versa for weighting monolingual word pairs. [sent-322, score-0.249]
83 Yule(l)+Match(k), where l ∈ {2, 3} is the number Yofu Spanish wcho(rkd)s, wallhoewreed l f∈or { e2a,3ch} English wmoberdr and vice versa and k=2 is the number of monolingual word associations for each word. [sent-327, score-0.209]
84 Again, we run these combinatainodns t wyi kth monolingual augmentation eidseen ctoimfiebdi by Dictionary+Match(k)+Aug. [sent-331, score-0.242]
85 of English document and the second half of its aligned foreign language document (Mimno et al. [sent-377, score-0.211]
86 From the results, it is clear that sparsifying the covariance matrices helps improving the accuracies significantly. [sent-400, score-1.002]
87 This indicates that using fine granular information such as a bilingual dictionary gleaned from an external source is very helpful in improving the accuracies. [sent-402, score-0.234]
88 Among the models that rely solely on the training data, models that use monolingual augmentation performed better on Wikipedia data set, while models that do not use augmentation performed better on Europarl data sets. [sent-403, score-0.349]
89 As the documents become comparable, we need to use monolingual statistics to refine the bilingual statistics. [sent-405, score-0.326]
90 This conforms with our initial hunch that, when the training data is clean the covariance matrices tend to be less noisy. [sent-407, score-0.906]
91 5 Discussion In this paper, we have proposed the idea of sparsifyng covariance matrices to improve bilingual projection directions. [sent-408, score-1.094]
92 We are not aware of any NLP research that attempts to recover the sparseness of the covariance matrices to improve the projection directions. [sent-409, score-1.02]
93 Their objective is to find projection directions such that the original documents are represented as a sparse vectors in the common sub-space. [sent-411, score-0.263]
94 Another seemingly relevant but different direction is the sparse covariance matrix selection research (Banerjee et al. [sent-412, score-0.863]
95 The objective in this work is to find matrices such that the inverse of the covariance matrix is sparse which has applications in Gaussian processes. [sent-414, score-1.043]
96 Our experimental results show that using external information such as bilingual dictionaries which is gleaned from cleaner resources brings significant improvements. [sent-416, score-0.249]
97 Moreover, we also observe that computing word pair association measures from the same training data along with an appropriate selection criteria can also yield significant improvements. [sent-417, score-0.191]
98 This is certainly encouraging and in future we would like to explore more sophisticated techniques to recover the sparsity based on the training data itself. [sent-418, score-0.185]
99 Sparse covariance selection 939 CoRR, via robust maximum likelihood estimation. [sent-436, score-0.698]
100 From bilingual dictionaries to interlin- gual document representations. [sent-504, score-0.235]
wordName wordTfidf (topN-words)
[('covariance', 0.629), ('yule', 0.311), ('cca', 0.268), ('matrices', 0.249), ('monolingual', 0.135), ('sparsifying', 0.124), ('bilingual', 0.122), ('ixy', 0.109), ('augmentation', 0.107), ('sparsity', 0.107), ('spanish', 0.106), ('thresholding', 0.097), ('projection', 0.094), ('iij', 0.093), ('matrix', 0.092), ('jagarlamudi', 0.08), ('interlingual', 0.078), ('document', 0.073), ('sparse', 0.073), ('fj', 0.07), ('documents', 0.069), ('selection', 0.069), ('aligned', 0.065), ('dictionary', 0.065), ('mi', 0.065), ('opca', 0.062), ('sparsification', 0.062), ('match', 0.059), ('noisy', 0.058), ('europarl', 0.054), ('association', 0.054), ('aug', 0.054), ('eigenvectors', 0.054), ('hal', 0.051), ('jagadeesh', 0.048), ('recover', 0.048), ('correspondences', 0.047), ('daum', 0.047), ('atxytb', 0.047), ('cyy', 0.047), ('gleaned', 0.047), ('ixx', 0.047), ('iyy', 0.047), ('reis', 0.047), ('platt', 0.045), ('synthetic', 0.045), ('plots', 0.044), ('vice', 0.043), ('comparable', 0.042), ('ei', 0.041), ('matching', 0.041), ('canonical', 0.04), ('dictionaries', 0.04), ('cxx', 0.04), ('cleaner', 0.04), ('daume', 0.04), ('weighting', 0.04), ('strategies', 0.039), ('rewritten', 0.038), ('measures', 0.037), ('pairs', 0.037), ('mutual', 0.036), ('entries', 0.036), ('subspace', 0.034), ('subsequently', 0.032), ('transliteration', 0.032), ('aligning', 0.031), ('word', 0.031), ('atxxta', 0.031), ('ballesteros', 0.031), ('bty', 0.031), ('competitively', 0.031), ('crosscovariance', 0.031), ('hardoon', 0.031), ('hermjakob', 0.031), ('jonker', 0.031), ('judd', 0.031), ('ravindra', 0.031), ('vinokourov', 0.031), ('ytb', 0.031), ('zeroing', 0.031), ('noise', 0.031), ('doesn', 0.03), ('mimno', 0.03), ('explore', 0.03), ('levels', 0.028), ('clean', 0.028), ('xy', 0.028), ('correlation', 0.027), ('dense', 0.027), ('translation', 0.027), ('directions', 0.027), ('iii', 0.027), ('rai', 0.027), ('tfidf', 0.027), ('vu', 0.027), ('neighbour', 0.027), ('bel', 0.027), ('cxy', 0.027), ('raghavendra', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole
Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.
2 0.096924864 118 emnlp-2011-SMT Helps Bitext Dependency Parsing
Author: Wenliang Chen ; Jun'ichi Kazama ; Min Zhang ; Yoshimasa Tsuruoka ; Yujie Zhang ; Yiou Wang ; Kentaro Torisawa ; Haizhou Li
Abstract: We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-theart baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.
3 0.08405678 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries
Author: Xabier Saralegi ; Iker Manterola ; Inaki San Vicente
Abstract: An A-C bilingual dictionary can be inferred by merging A-B and B-C dictionaries using B as pivot. However, polysemous pivot words often produce wrong translation candidates. This paper analyzes two methods for pruning wrong candidates: one based on exploiting the structure of the source dictionaries, and the other based on distributional similarity computed from comparable corpora. As both methods depend exclusively on easily available resources, they are well suited to less resourced languages. We studied whether these two techniques complement each other given that they are based on different paradigms. We also researched combining them by looking for the best adequacy depending on various application scenarios. ,
4 0.06985645 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
Author: Amittai Axelrod ; Xiaodong He ; Jianfeng Gao
Abstract: Xiaodong He Microsoft Research Redmond, WA 98052 xiaohe @mi cro s o ft . com Jianfeng Gao Microsoft Research Redmond, WA 98052 j fgao @mi cro s o ft . com have its own argot, vocabulary or stylistic preferences, such that the corpus characteristics will necWe explore efficient domain adaptation for the task of statistical machine translation based on extracting sentences from a large generaldomain parallel corpus that are most relevant to the target domain. These sentences may be selected with simple cross-entropy based methods, of which we present three. As these sentences are not themselves identical to the in-domain data, we call them pseudo in-domain subcorpora. These subcorpora 1% the size of the original can then used to train small domain-adapted Statistical Machine Translation (SMT) systems which outperform systems trained on the entire corpus. Performance is further improved when we use these domain-adapted models in combination with a true in-domain model. The results show that more training data is not always better, and that best results are attained via proper domain-relevant data selection, as well as combining in- and general-domain systems during decoding. – –
5 0.069551393 3 emnlp-2011-A Correction Model for Word Alignments
Author: J. Scott McCarley ; Abraham Ittycheriah ; Salim Roukos ; Bing Xiang ; Jian-ming Xu
Abstract: Models of word alignment built as sequences of links have limited expressive power, but are easy to decode. Word aligners that model the alignment matrix can express arbitrary alignments, but are difficult to decode. We propose an alignment matrix model as a correction algorithm to an underlying sequencebased aligner. Then a greedy decoding algorithm enables the full expressive power of the alignment matrix formulation. Improved alignment performance is shown for all nine language pairs tested. The improved alignments also improved translation quality from Chinese to English and English to Italian.
6 0.068639763 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis
7 0.067647204 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
8 0.058988106 129 emnlp-2011-Structured Sparsity in Structured Prediction
9 0.058537479 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
10 0.058432471 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
11 0.057624768 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
12 0.055718567 80 emnlp-2011-Latent Vector Weighting for Word Meaning in Context
13 0.054861389 9 emnlp-2011-A Non-negative Matrix Factorization Based Approach for Active Dual Supervision from Document and Word Labels
14 0.053999275 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
15 0.053438168 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers
16 0.053232808 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
17 0.048423968 72 emnlp-2011-Improved Transliteration Mining Using Graph Reinforcement
18 0.047960546 96 emnlp-2011-Multilayer Sequence Labeling
19 0.047312215 86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association
20 0.045809619 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
topicId topicWeight
[(0, 0.165), (1, -0.002), (2, -0.005), (3, -0.101), (4, 0.048), (5, 0.056), (6, -0.04), (7, 0.018), (8, -0.152), (9, 0.052), (10, -0.016), (11, -0.03), (12, 0.056), (13, 0.08), (14, 0.004), (15, 0.013), (16, 0.004), (17, 0.023), (18, -0.165), (19, -0.115), (20, -0.05), (21, -0.206), (22, -0.049), (23, 0.013), (24, 0.089), (25, -0.111), (26, 0.078), (27, 0.097), (28, 0.011), (29, 0.124), (30, 0.095), (31, 0.058), (32, -0.058), (33, 0.029), (34, 0.072), (35, -0.019), (36, 0.014), (37, 0.142), (38, 0.007), (39, 0.106), (40, -0.128), (41, -0.06), (42, -0.079), (43, 0.045), (44, -0.02), (45, -0.024), (46, -0.096), (47, -0.082), (48, -0.002), (49, -0.077)]
simIndex simValue paperId paperTitle
same-paper 1 0.93621039 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole
Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.
2 0.59629875 118 emnlp-2011-SMT Helps Bitext Dependency Parsing
Author: Wenliang Chen ; Jun'ichi Kazama ; Min Zhang ; Yoshimasa Tsuruoka ; Yujie Zhang ; Yiou Wang ; Kentaro Torisawa ; Haizhou Li
Abstract: We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank contains errors, the bilingual constraints are noisy. To overcome this problem, we use large-scale unannotated data to verify the constraints and design a set of effective bilingual features for parsing models based on the verified results. The experimental results show that our new parsers significantly outperform state-of-theart baselines. Moreover, our approach is still able to provide improvement when we use a larger monolingual treebank that results in a much stronger baseline. Especially notable is that our approach can be used in a purely monolingual setting with the help of SMT.
3 0.55784738 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries
Author: Xabier Saralegi ; Iker Manterola ; Inaki San Vicente
Abstract: An A-C bilingual dictionary can be inferred by merging A-B and B-C dictionaries using B as pivot. However, polysemous pivot words often produce wrong translation candidates. This paper analyzes two methods for pruning wrong candidates: one based on exploiting the structure of the source dictionaries, and the other based on distributional similarity computed from comparable corpora. As both methods depend exclusively on easily available resources, they are well suited to less resourced languages. We studied whether these two techniques complement each other given that they are based on different paradigms. We also researched combining them by looking for the best adequacy depending on various application scenarios. ,
4 0.53681529 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
Author: Zhengxian Gong ; Min Zhang ; Guodong Zhou
Abstract: Statistical machine translation systems are usually trained on a large amount of bilingual sentence pairs and translate one sentence at a time, ignoring document-level information. In this paper, we propose a cache-based approach to document-level translation. Since caches mainly depend on relevant data to supervise subsequent decisions, it is critical to fill the caches with highly-relevant data of a reasonable size. In this paper, we present three kinds of caches to store relevant document-level information: 1) a dynamic cache, which stores bilingual phrase pairs from the best translation hypotheses of previous sentences in the test document; 2) a static cache, which stores relevant bilingual phrase pairs extracted from similar bilingual document pairs (i.e. source documents similar to the test document and their corresponding target documents) in the training parallel corpus; 3) a topic cache, which stores the target-side topic words related with the test document in the source-side. In particular, three new features are designed to explore various kinds of document-level information in above three kinds of caches. Evaluation shows the effectiveness of our cache-based approach to document-level translation with the performance improvement of 0.8 1 in BLUE score over Moses. Especially, detailed analysis and discussion are presented to give new insights to document-level translation. 1
5 0.42651662 86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association
Author: Dipak L. Chaudhari ; Om P. Damani ; Srivatsan Laxman
Abstract: Om P. Damani Srivatsan Laxman Computer Science and Engg. Microsoft Research India IIT Bombay Bangalore damani @ cse . i . ac . in itb s laxman@mi cro s o ft . com of words that co-occur in a large number of docuLexical co-occurrence is an important cue for detecting word associations. We propose a new measure of word association based on a new notion of statistical significance for lexical co-occurrences. Existing measures typically rely on global unigram frequencies to determine expected co-occurrence counts. In- stead, we focus only on documents that contain both terms (of a candidate word-pair) and ask if the distribution of the observed spans of the word-pair resembles that under a random null model. This would imply that the words in the pair are not related strongly enough for one word to influence placement of the other. However, if the words are found to occur closer together than explainable by the null model, then we hypothesize a more direct association between the words. Through extensive empirical evaluation on most of the publicly available benchmark data sets, we show the advantages of our measure over existing co-occurrence measures.
6 0.42548651 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP
7 0.40637511 3 emnlp-2011-A Correction Model for Word Alignments
8 0.39937481 72 emnlp-2011-Improved Transliteration Mining Using Graph Reinforcement
9 0.3972986 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
10 0.3754417 115 emnlp-2011-Relaxed Cross-lingual Projection of Constituent Syntax
11 0.36930311 148 emnlp-2011-Watermarking the Outputs of Structured Prediction with an application in Statistical Machine Translation.
12 0.34585956 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
13 0.34430048 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
14 0.34335026 143 emnlp-2011-Unsupervised Information Extraction with Distributional Prior Knowledge
15 0.3362627 129 emnlp-2011-Structured Sparsity in Structured Prediction
16 0.31517419 107 emnlp-2011-Probabilistic models of similarity in syntactic context
17 0.30616894 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
18 0.29413411 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
19 0.28845316 30 emnlp-2011-Compositional Matrix-Space Models for Sentiment Analysis
20 0.28377515 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
topicId topicWeight
[(23, 0.083), (36, 0.016), (37, 0.023), (45, 0.098), (53, 0.036), (54, 0.03), (62, 0.017), (64, 0.035), (66, 0.033), (69, 0.394), (79, 0.048), (82, 0.031), (85, 0.01), (87, 0.01), (96, 0.024), (98, 0.029)]
simIndex simValue paperId paperTitle
1 0.83444709 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
2 0.80611223 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base
Author: Ni Lao ; Tom Mitchell ; William W. Cohen
Abstract: t om . We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL’s earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.
same-paper 3 0.76662064 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole
Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.
4 0.74051642 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
5 0.55480152 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
6 0.43542072 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
7 0.43507481 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
8 0.43100983 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
9 0.41964734 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
10 0.41862211 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
11 0.41244903 3 emnlp-2011-A Correction Model for Word Alignments
12 0.40354016 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
13 0.40319392 128 emnlp-2011-Structured Relation Discovery using Generative Models
14 0.40224829 56 emnlp-2011-Exploring Supervised LDA Models for Assigning Attributes to Adjective-Noun Phrases
15 0.40166879 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
16 0.39898247 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
17 0.39891586 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
18 0.39802447 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
19 0.39713833 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
20 0.39662316 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases