acl acl2013 acl2013-47 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Manaal Faruqui ; Chris Dyer
Abstract: We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the bilingual component is the average mutual information of the aligned clusters. To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. [sent-3, score-1.551]
2 The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the bilingual component is the average mutual information of the aligned clusters. [sent-4, score-1.072]
3 To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters. [sent-5, score-1.046]
4 1 Introduction A word cluster is a group of words which ideally captures syntactic, semantic, and distributional regularities among the words belonging to the group. [sent-6, score-0.136]
5 Word clustering is widely used to reduce the number of parameters in statistical models which leads to improved generalization (Brown et al. [sent-7, score-0.313]
6 , 2010), and multilingual clustering has been proposed as a means to improve modeling of translational correspondences and to facilitate projection of linguistic resource across languages (Och, 1999; T ¨ackstr o¨m et al. [sent-10, score-0.416]
7 In this paper, we argue that generally more informative clusters can be learned when evidence from multiple languages is considered while creating the clusters. [sent-12, score-0.259]
8 We propose a novel bilingual word clustering objective (§2). [sent-13, score-0.73]
9 The first term deals with each language independently and ensures that the data is well-explained by the clustering in a sequence model (§2. [sent-14, score-0.392]
10 The second term ensures that the cmloudsteelr alignments i nsedcuocnedd by a ewnosrudr alignment have high mutual information across languages (§2. [sent-16, score-0.273]
11 Since the objective consists of terms representing tnhcee entropy monolingual sd oafta t (for e raecphlanguage) and parallel bilingual data, it is particularly attractive for the usual situation in which there is much more monolingual data available than parallel data. [sent-18, score-1.269]
12 Because of its similarity to the variation of information metric (Meil ˇa, 2003), we call this bilingual term in the objective the aligned variation of information. [sent-19, score-0.558]
13 2 Word Clustering A word clustering C is a partition of a vocabulary AΣ = {x1, x2 , . [sent-20, score-0.386]
14 , x|Σ| } pinartoti Kion disjoint subsets, C1, C2, . [sent-23, score-0.035]
15 1 Monolingual objective We use the average surprisal in a probabilistic sequence model to define the monolingual clustering objective. [sent-34, score-0.789]
16 Our objective assumes that the probability of a word sequence w = hw1, w2, . [sent-36, score-0.175]
17 The term p(ci | ci−1) is the probability of class ci following class| ci−1, and p(wi | ci) is the probability of class ci emitting word wi. [sent-41, score-0.467]
18 Using the MLE esitmates after taking the negative logarithm, this term reduces to 777 ProceedingSsof oifa, th Beu 5l1gsarti Aan,An uuaglu Mste 4e-ti9n2g 0 o1f3 t. [sent-42, score-0.051]
19 , 1992): H(C;w) = 2kX=K1#M(Ck)log#M(Ck) −XiXj6=i#(CMi,Cj)log#(CMi,Cj) where #(Ck) is the count of Ck in the corpus w under the clustering C, #(Ci, Cj) is the count of tuhned enrum thbee crl uosft etirimnges Cth, a#t (cCluster Ci precedes Cj and M is the size of the corpus. [sent-45, score-0.313]
20 Using the monolingual objective to cluster, we solve the following search problem: Cˆ = argminH(C;w). [sent-46, score-0.476]
21 2 Bilingual objective Now let us suppose we have a second language with vocabulary Ω = {y1, y2 , . [sent-49, score-0.164]
22 , y|Ω| }, gwuhaicghe iws tchlus votecraedbu ilnartoy KΩ disjoint subsets D|Ω = {D1, D2, . [sent-52, score-0.035]
23 iOnb tvhieously we can cel,us vter = = bo htvh languages using the monolingual objective above: Cˆ,Dˆ = argmC,iDnH(C;w) + H(D;v). [sent-59, score-0.524]
24 This joint minimization for the clusterings for both languages clearly has no benefit since the two terms of the objective are independent. [sent-60, score-0.272]
25 To encode this belief, we introduce the notion of a weighted vocabulary alignment A, which is a ffu an wcteiiognh on pairs obfu wlaorryds a iling nvomcaebntul Aar,ie ws hΣic ahn ids Ω to a value greater than or equal to 0, i. [sent-62, score-0.112]
26 m Feosr t choant x eitse aligned (tox y yin) a willo bred aligned parallel corpus. [sent-67, score-0.184]
27 p C CD Figure 1: Factor graphs of the monolingual (left) & proposed bilingual clustering problem (right). [sent-71, score-0.898]
28 We call this quantity the aligned variation of information (AVI). [sent-72, score-0.094]
29 Our bilingual clustering objective can therefore be stated as the following search problem over a linear combination of the monolingual and bilingual objectives: monolingual z}|{ β ×bilingual zβ×bi}l|ingual{ argmC,iDnHz(C;w) }+| H(D;v{)+zβAV}I(|C,D){. [sent-76, score-1.616]
30 Intuitively, we can imagine sampling a random alignment from the distribution obtained by normalizing A(·, ·). [sent-79, score-0.081]
31 AdoV we oveb-s tain, on average, from knowing the cluster in one language about the clustering of a linked element chosen at random proportional to A(x, ·) (or concdihtoiosneend a tth rea nodtohmer way around). [sent-81, score-0.401]
32 Tseoc ftuiortnhse,r w wuend deernsotatned A AVI, we ;reAm)a brky t AhVatI A(VC,ID reduces to the VI metric when the alignment maps words to themselves in the same language. [sent-83, score-0.081]
33 3 Example Figure 2 provides an example illustrating the difference between the bilingual vs. [sent-86, score-0.242]
34 We compare two different clusterings of a two-sentence Arabic-English parallel corpus (the English half of the corpus contains the same sentence, twice, while the Arabic half has two variants with the same meaning). [sent-88, score-0.163]
35 While English has a relatively rigid SVO word order, Arabic can alternate between the traditional VSO order and an more modern SVO order. [sent-89, score-0.042]
36 Since our monolingual clustering objective relies exclusively on the distribution of clusters before and after each token, flexible word order alternations like this can cause unintuitive results. [sent-90, score-0.999]
37 This is indeed what we observe in the monolingual objective optimal solution (center), in which AwlAd (boys) and yElbwn (play+PRES + 3PL) are grouped into a single class, while yElb (play+PRES 3SG) is in its own class. [sent-92, score-0.504]
38 However, the AVI term (which is of course not included) has a value of 1. [sent-93, score-0.051]
39 0, reflecting the relatively disordered clustering relative to the given alignment. [sent-94, score-0.313]
40 On the right, we see the optimal solution that includes the AVI term in the clustering objective. [sent-95, score-0.392]
41 This has an AVI of 0, indicating that knowing the clustering of any word is completely informative about the words it is aligned to. [sent-96, score-0.456]
42 By including this term, a slightly worse monolingual solution is chosen, but the clustering corresponds to the reasonable intuition that words with the same meaning (i. [sent-97, score-0.656]
43 4 Inference Figure 1 shows the factor graph representation of our clustering models. [sent-101, score-0.313]
44 Finding the optimal clustering under both the monolingual and bilingual objectives is a computationally hard combinatorial optimization problem (Och, 1995). [sent-102, score-0.962]
45 We use a greedy hill-climbing word exchange algorithm (Martin et al. [sent-103, score-0.042]
46 We terminate the optimization procedure when the number of words exchanged at the end of one complete iteration through both the languages is less than 0. [sent-105, score-0.091]
47 1% of the sum of vocabulary of the two languages and at least five complete iterations have been completed. [sent-106, score-0.079]
48 2 For every language the word clusters are initialised in a round robin order according to the token frequency. [sent-107, score-0.21]
49 3 Experiments Evaluation of clustering is not a trivial problem. [sent-108, score-0.313]
50 One branch of work seeks to recast the problem as the of part-of-speech (POS) induction and attempts to match linguistic intuitions. [sent-109, score-0.033]
51 However, hard clusters are particularly useful for downstream tasks (Turian et al. [sent-110, score-0.168]
52 For our evaluation, we use our word clusters as an input to a named entity recognizer which uses these clusters as a source of features. [sent-113, score-0.415]
53 Corpora for Clustering: We used parallel corpora for {Arabic, English, French, Korean & Turkish}-German pairs lfrisohm, W FrIeTn-c3h corpus (CettToulrok et al. [sent-117, score-0.072]
54 The corpus was word aligned in two directions using an unsupervised word aligner (Dyer et al. [sent-121, score-0.14]
55 Monolingual Clustering: For every language pair, we train German word clusters on the monolingual German data from the parallel data. [sent-123, score-0.625]
56 Note that the parallel corpora are of different sizes and hence the monolingual German data from every parallel corpus is different. [sent-124, score-0.487]
57 We treat the F1 score 2 2In practice, the number of exchanged words drops of exponentially, so this threshold is typically reached in not many iterations. [sent-125, score-0.043]
58 56 AVI(C,D)= 0 Figure 2: A two-sentence English-Arabic monolingual objective (β = parallel corpus (left); a 3-class clustering that maximizes the 0; center); and a 3-class clustering that maximizes the joint monolingual and bilingual objective (any β > 0. [sent-141, score-1.892]
59 obtained using monolingual word clusters (β = 0) as the baseline. [sent-143, score-0.553]
60 Table 1 shows the F1 score of NER6 when trained on these monolingual German word clusters. [sent-144, score-0.385]
61 Bilingual Clustering: While we have formulated a joint objective that enables using both monolingual and bilingual evidence, it is possible to create word clusters using the bilingual signal only by removing the first term in Eq. [sent-145, score-1.221]
62 Table 1 shows the performance of NER when the word clusters are obtained using only the bilingual information for different language pairs. [sent-148, score-0.452]
63 As can be seen, these clusters are helpful for all the language pairs. [sent-149, score-0.168]
64 0 point over when there are no distributional clusters which clearly shows that the word align- ment information improves the clustering quality. [sent-151, score-0.574]
65 We now need to supplement the bilingual information with monolingual information to see if the improvement sustains. [sent-152, score-0.626]
66 We varied the weight of the bilingual objective (β) from 0. [sent-153, score-0.375]
67 This indicates that bilingual information is helpful, but less valuable than monolingual information. [sent-158, score-0.585]
68 We run our bilingual clustering model (β = 6Faruqui and Pad ´o (2010) show that for the size of our generalization K = 100 should give us the optimum value. [sent-161, score-0.555]
69 Table 1 (unrefined) shows that except for Arabic-German & French-German, all other language pairs deliver a better F1 score than only using monolingual German data. [sent-164, score-0.343]
70 Although, we have observed improvement in F1 score over the monolingual case, the gains do not reach significance according to McNemar’s test (Dietterich, 1998). [sent-167, score-0.384]
71 Thus we propose to further refine the quality of word alignment links as follows: Let x be a word in language Σ and y be a word in language Ω and let there exists an alignment link between x and y. [sent-168, score-0.288]
72 en Rt elicnaklls hbeattw Aee(nx x )an isd y eo cbsoeurnvted of fi tnh tehe a parallel data, and A(x) and A(y) are the respective marginal ,co aunndts A. [sent-170, score-0.129]
73 The values shown in bold are the highest improvements over the monolingual model. [sent-176, score-0.379]
74 For English and Turkish we observe a statistically significant improvement over the monolingual model (cf. [sent-177, score-0.384]
75 Arabic improves least with just an improvement of 0. [sent-181, score-0.041]
76 †1) Table 1: NER performance using different word clustering models. [sent-194, score-0.355]
77 Bold indicates an improvement over the monolingual (β = 0) baseline; † indicates a significant improvement (McNemar’s test, p < 0. [sent-195, score-0.425]
78 Since alignment models are parameterized based on the vocabularies of the languages they are aligning, larger vocabularies are more prone to degenerate solutions resulting from overfitting. [sent-204, score-0.199]
79 So we are not surprised to see that sparser alignments (resulting from higher values of e) are required by languages like Korean, while languages like French and English make due with denser alignments. [sent-205, score-0.096]
80 We take the best bilingual word clustering model obtained for every language pair (e = 0. [sent-207, score-0.597]
81 All the values shown in bold are better than the monolingual baselines. [sent-214, score-0.379]
82 English again has a statistically significant improvement over the baseline. [sent-215, score-0.041]
83 The English-German cluster model performs better than the mkcl s7 tool (72. [sent-217, score-0.043]
84 4 Related Work Our monolingual clustering model is purely distributional in nature. [sent-219, score-0.707]
85 Other extensions to word clustering have incorporated morphological and orthographic information (Clark, 2003). [sent-220, score-0.392]
86 The earliest work on bilingual word clustering was proposed by (Och, 1999) which, like us, uses a lan7http://www. [sent-222, score-0.597]
87 , 1992; Kneser and Ney, 1993) for monolingual optimization and a similarity function for bilingual similarity. [sent-226, score-0.585]
88 (2012) use cross-lingual word clusters to show transfer of linguistic structure. [sent-228, score-0.21]
89 While their clustering method is superficially similar, the objective function is more heuristic in nature than our information-theoretic conception of the problem. [sent-229, score-0.473]
90 Multilingual learning has been applied to a number of unsupervised and supervised learning problems, including word sense disambiguation (Diab, 2003; Guo and Diab, 2010), topic modeling (Mimno et al. [sent-230, score-0.042]
91 , 2009; Boyd-Graber and Blei, 2009), and morphological segmentation (Snyder and Barzilay, 2008). [sent-231, score-0.037]
92 , 2011), morphology (Fraser, 2009), tense (Schiehlen, 1998) and T/V pronoun usage (Faruqui and Pad o´, 2012). [sent-235, score-0.031]
93 5 Conclusions We presented a novel information theoretic model for bilingual word clustering which seeks a clustering with high average mutual information between clusters of adjacent words, and also high mutual information across observed word alignment links. [sent-236, score-1.423]
94 We have shown that improvement in clustering can be obtained across a range of language pairs, evaluated in terms of their value as features in an extrinsic NER task. [sent-237, score-0.354]
95 Our model can be extended for clustering any number of given languages together in a joint framework, and incorporate both monolingual and parallel data. [sent-238, score-0.776]
96 Combining distributional and morphological information for part of speech induction. [sent-278, score-0.088]
97 Combining orthogonal monolingual and multilingual sources of evidence for all words wsd. [sent-347, score-0.441]
98 Bootstrapping parsers via syntactic projection across parallel texts. [sent-366, score-0.072]
99 Forming word classes by statistical clustering for statistical language modelling. [sent-372, score-0.355]
100 Cross-lingual word clusters for direct transfer of linguistic structure. [sent-455, score-0.21]
wordName wordTfidf (topN-words)
[('avi', 0.549), ('monolingual', 0.343), ('clustering', 0.313), ('bilingual', 0.242), ('clusters', 0.168), ('ci', 0.159), ('objective', 0.133), ('ck', 0.111), ('ner', 0.108), ('faruqui', 0.101), ('clusterings', 0.091), ('german', 0.089), ('turkish', 0.088), ('alignment', 0.081), ('snyder', 0.072), ('parallel', 0.072), ('arabic', 0.072), ('logp', 0.066), ('mutual', 0.065), ('chahuneau', 0.062), ('haywood', 0.062), ('pres', 0.062), ('pa', 0.061), ('pad', 0.061), ('py', 0.061), ('stroudsburg', 0.06), ('theoretic', 0.059), ('kneser', 0.059), ('korean', 0.057), ('turian', 0.057), ('mcnemar', 0.057), ('aligned', 0.056), ('cx', 0.056), ('multilingual', 0.055), ('meil', 0.055), ('onl', 0.055), ('ackstr', 0.053), ('distributional', 0.051), ('term', 0.051), ('cj', 0.049), ('svo', 0.048), ('dy', 0.048), ('languages', 0.048), ('knowing', 0.045), ('exchanged', 0.043), ('evidence', 0.043), ('cluster', 0.043), ('word', 0.042), ('french', 0.042), ('improvement', 0.041), ('brown', 0.041), ('mimno', 0.04), ('variation', 0.038), ('recognizer', 0.037), ('morphological', 0.037), ('association', 0.036), ('smith', 0.036), ('objectives', 0.036), ('bold', 0.036), ('hwa', 0.035), ('disjoint', 0.035), ('nx', 0.035), ('px', 0.035), ('vocabularies', 0.035), ('dyer', 0.035), ('attractive', 0.034), ('guo', 0.034), ('finkel', 0.034), ('seeks', 0.033), ('koo', 0.033), ('play', 0.032), ('vocabulary', 0.031), ('tense', 0.031), ('eo', 0.03), ('sd', 0.03), ('diab', 0.03), ('vi', 0.029), ('och', 0.028), ('class', 0.028), ('log', 0.028), ('park', 0.028), ('optimal', 0.028), ('ensures', 0.028), ('ayl', 0.027), ('wmi', 0.027), ('atnod', 0.027), ('auai', 0.027), ('awnodr', 0.027), ('babel', 0.027), ('cdye', 0.027), ('conception', 0.027), ('eodf', 0.027), ('haifa', 0.027), ('iam', 0.027), ('inflect', 0.027), ('ingual', 0.027), ('isd', 0.027), ('liermann', 0.027), ('oism', 0.027), ('polylingual', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
Author: Manaal Faruqui ; Chris Dyer
Abstract: We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the bilingual component is the average mutual information of the aligned clusters. To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
2 0.24150114 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
Author: Jiajun Zhang ; Chengqing Zong
Abstract: Currently, almost all of the statistical machine translation (SMT) models are trained with the parallel corpora in some specific domains. However, when it comes to a language pair or a different domain without any bilingual resources, the traditional SMT loses its power. Recently, some research works study the unsupervised SMT for inducing a simple word-based translation model from the monolingual corpora. It successfully bypasses the constraint of bitext for SMT and obtains a relatively promising result. In this paper, we take a step forward and propose a simple but effective method to induce a phrase-based model from the monolingual corpora given an automatically-induced translation lexicon or a manually-edited translation dictionary. We apply our method for the domain adaptation task and the extensive experiments show that our proposed method can substantially improve the translation quality. 1
3 0.19184841 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
Author: Mengqiu Wang ; Wanxiang Che ; Christopher D. Manning
Abstract: Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models. We intro- duce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. We design a dual decomposition inference algorithm to perform joint decoding over the combined alignment and NER output space. Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines.
4 0.18001951 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
Author: Kashyap Popat ; Balamurali A.R ; Pushpak Bhattacharyya ; Gholamreza Haffari
Abstract: Expensive feature engineering based on WordNet senses has been shown to be useful for document level sentiment classification. A plausible reason for such a performance improvement is the reduction in data sparsity. However, such a reduction could be achieved with a lesser effort through the means of syntagma based word clustering. In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual, is addressed through the means of clustering. Experiments show that cluster based data sparsity reduction leads to performance better than sense based classification for sentiment analysis at document level. Similar idea is applied to Cross Lingual Sentiment Analysis (CLSA), and it is shown that reduction in data sparsity (after translation or bilingual-mapping) produces accuracy higher than Machine Translation based CLSA and sense based CLSA.
5 0.16041882 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
Author: Xiaojun Quan ; Chunyu Kit ; Yan Song
Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.
6 0.15810841 97 acl-2013-Cross-lingual Projections between Languages from Different Families
7 0.14466591 70 acl-2013-Bilingually-Guided Monolingual Dependency Grammar Induction
8 0.13989462 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
9 0.13354748 74 acl-2013-Building Comparable Corpora Based on Bilingual LDA Model
10 0.1260583 98 acl-2013-Cross-lingual Transfer of Semantic Role Labeling Models
11 0.12192523 154 acl-2013-Extracting bilingual terminologies from comparable corpora
12 0.11823579 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
13 0.11681475 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
14 0.10961289 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures
15 0.10919435 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
16 0.10414261 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
18 0.10097538 93 acl-2013-Context Vector Disambiguation for Bilingual Lexicon Extraction from Comparable Corpora
19 0.10085624 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
20 0.099974543 256 acl-2013-Named Entity Recognition using Cross-lingual Resources: Arabic as an Example
topicId topicWeight
[(0, 0.233), (1, -0.049), (2, 0.098), (3, 0.016), (4, 0.036), (5, -0.097), (6, -0.125), (7, 0.058), (8, -0.045), (9, -0.156), (10, 0.096), (11, -0.215), (12, 0.01), (13, 0.042), (14, -0.046), (15, 0.009), (16, -0.007), (17, -0.043), (18, 0.022), (19, -0.016), (20, -0.053), (21, -0.013), (22, 0.093), (23, 0.02), (24, 0.018), (25, -0.014), (26, -0.072), (27, 0.083), (28, 0.07), (29, -0.011), (30, 0.008), (31, 0.06), (32, -0.05), (33, -0.07), (34, -0.04), (35, 0.019), (36, -0.057), (37, 0.079), (38, -0.018), (39, 0.023), (40, 0.04), (41, 0.013), (42, -0.038), (43, 0.02), (44, 0.017), (45, -0.078), (46, 0.157), (47, -0.048), (48, 0.024), (49, -0.093)]
simIndex simValue paperId paperTitle
same-paper 1 0.96361578 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
Author: Manaal Faruqui ; Chris Dyer
Abstract: We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the bilingual component is the average mutual information of the aligned clusters. To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
2 0.70139229 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
Author: Xiaojun Quan ; Chunyu Kit ; Yan Song
Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.
3 0.69964731 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
Author: Tingting Li ; Tiejun Zhao ; Andrew Finch ; Chunyue Zhang
Abstract: Machine Transliteration is an essential task for many NLP applications. However, names and loan words typically originate from various languages, obey different transliteration rules, and therefore may benefit from being modeled independently. Recently, transliteration models based on Bayesian learning have overcome issues with over-fitting allowing for many-to-many alignment in the training of transliteration models. We propose a novel coupled Dirichlet process mixture model (cDPMM) that simultaneously clusters and bilingually aligns transliteration data within a single unified model. The unified model decomposes into two classes of non-parametric Bayesian component models: a Dirichlet process mixture model for clustering, and a set of multinomial Dirichlet process models that perform bilingual alignment independently for each cluster. The experimental results show that our method considerably outperforms conventional alignment models.
4 0.65569907 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
Author: Mengqiu Wang ; Wanxiang Che ; Christopher D. Manning
Abstract: Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models. We intro- duce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. We design a dual decomposition inference algorithm to perform joint decoding over the combined alignment and NER output space. Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines.
5 0.61864972 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
Author: Kashyap Popat ; Balamurali A.R ; Pushpak Bhattacharyya ; Gholamreza Haffari
Abstract: Expensive feature engineering based on WordNet senses has been shown to be useful for document level sentiment classification. A plausible reason for such a performance improvement is the reduction in data sparsity. However, such a reduction could be achieved with a lesser effort through the means of syntagma based word clustering. In this paper, the problem of data sparsity in sentiment analysis, both monolingual and cross-lingual, is addressed through the means of clustering. Experiments show that cluster based data sparsity reduction leads to performance better than sense based classification for sentiment analysis at document level. Similar idea is applied to Cross Lingual Sentiment Analysis (CLSA), and it is shown that reduction in data sparsity (after translation or bilingual-mapping) produces accuracy higher than Machine Translation based CLSA and sense based CLSA.
6 0.60873467 154 acl-2013-Extracting bilingual terminologies from comparable corpora
7 0.59228724 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures
8 0.59217024 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
9 0.58292001 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment
10 0.56047636 97 acl-2013-Cross-lingual Projections between Languages from Different Families
11 0.55410767 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks
12 0.54427516 270 acl-2013-ParGramBank: The ParGram Parallel Treebank
13 0.54302877 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
14 0.52842665 323 acl-2013-Simpler unsupervised POS tagging with bilingual projections
15 0.52519011 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics
16 0.5251717 240 acl-2013-Microblogs as Parallel Corpora
17 0.52369475 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
18 0.52140981 89 acl-2013-Computerized Analysis of a Verbal Fluency Test
19 0.51750523 68 acl-2013-Bilingual Data Cleaning for SMT using Graph-based Random Walk
20 0.51505572 72 acl-2013-Bridging Languages through Etymology: The case of cross language text categorization
topicId topicWeight
[(0, 0.067), (6, 0.045), (11, 0.067), (15, 0.263), (24, 0.06), (26, 0.044), (35, 0.062), (42, 0.073), (48, 0.056), (70, 0.032), (88, 0.033), (90, 0.036), (95, 0.091)]
simIndex simValue paperId paperTitle
1 0.97595966 232 acl-2013-Linguistic Models for Analyzing and Detecting Biased Language
Author: Marta Recasens ; Cristian Danescu-Niculescu-Mizil ; Dan Jurafsky
Abstract: Unbiased language is a requirement for reference sources like encyclopedias and scientific texts. Bias is, nonetheless, ubiquitous, making it crucial to understand its nature and linguistic realization and hence detect bias automatically. To this end we analyze real instances of human edits designed to remove bias from Wikipedia articles. The analysis uncovers two classes of bias: framing bias, such as praising or perspective-specific words, which we link to the literature on subjectivity; and epistemological bias, related to whether propositions that are presupposed or entailed in the text are uncontroversially accepted as true. We identify common linguistic cues for these classes, including factive verbs, implicatives, hedges, and subjective inten- cs . sifiers. These insights help us develop features for a model to solve a new prediction task of practical importance: given a biased sentence, identify the bias-inducing word. Our linguistically-informed model performs almost as well as humans tested on the same task.
2 0.88603747 199 acl-2013-Integrating Multiple Dependency Corpora for Inducing Wide-coverage Japanese CCG Resources
Author: Sumire Uematsu ; Takuya Matsuzaki ; Hiroki Hanaoka ; Yusuke Miyao ; Hideki Mima
Abstract: This paper describes a method of inducing wide-coverage CCG resources for Japanese. While deep parsers with corpusinduced grammars have been emerging for some languages, those for Japanese have not been widely studied, mainly because most Japanese syntactic resources are dependency-based. Our method first integrates multiple dependency-based corpora into phrase structure trees and then converts the trees into CCG derivations. The method is empirically evaluated in terms of the coverage of the obtained lexi- con and the accuracy of parsing.
3 0.86679667 262 acl-2013-Offspring from Reproduction Problems: What Replication Failure Teaches Us
Author: Antske Fokkens ; Marieke van Erp ; Marten Postma ; Ted Pedersen ; Piek Vossen ; Nuno Freire
Abstract: Repeating experiments is an important instrument in the scientific toolbox to validate previous work and build upon existing work. We present two concrete use cases involving key techniques in the NLP domain for which we show that reproducing results is still difficult. We show that the deviation that can be found in reproduction efforts leads to questions about how our results should be interpreted. Moreover, investigating these deviations provides new insights and a deeper understanding of the examined techniques. We identify five aspects that can influence the outcomes of experiments that are typically not addressed in research papers. Our use cases show that these aspects may change the answer to research questions leading us to conclude that more care should be taken in interpreting our results and more research involving systematic testing of methods is required in our field.
same-paper 4 0.8245793 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
Author: Manaal Faruqui ; Chris Dyer
Abstract: We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the bilingual component is the average mutual information of the aligned clusters. To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
5 0.81891143 304 acl-2013-SEMILAR: The Semantic Similarity Toolkit
Author: Vasile Rus ; Mihai Lintean ; Rajendra Banjade ; Nobal Niraula ; Dan Stefanescu
Abstract: We present in this paper SEMILAR, the SEMantic simILARity toolkit. SEMILAR implements a number of algorithms for assessing the semantic similarity between two texts. It is available as a Java library and as a Java standalone ap-plication offering GUI-based access to the implemented semantic similarity methods. Furthermore, it offers facilities for manual se-mantic similarity annotation by experts through its component SEMILAT (a SEMantic simILarity Annotation Tool). 1
6 0.80216348 233 acl-2013-Linking Tweets to News: A Framework to Enrich Short Text Data in Social Media
7 0.65677464 187 acl-2013-Identifying Opinion Subgroups in Arabic Online Discussions
8 0.65212893 250 acl-2013-Models of Translation Competitions
9 0.63977873 318 acl-2013-Sentiment Relevance
10 0.63881934 83 acl-2013-Collective Annotation of Linguistic Resources: Basic Principles and a Formal Model
11 0.63622433 63 acl-2013-Automatic detection of deception in child-produced speech using syntactic complexity features
12 0.63341051 99 acl-2013-Crowd Prefers the Middle Path: A New IAA Metric for Crowdsourcing Reveals Turker Biases in Query Segmentation
13 0.63094753 346 acl-2013-The Impact of Topic Bias on Quality Flaw Prediction in Wikipedia
14 0.62954897 59 acl-2013-Automated Pyramid Scoring of Summaries using Distributional Semantics
15 0.62620127 17 acl-2013-A Random Walk Approach to Selectional Preferences Based on Preference Ranking and Propagation
16 0.6207366 2 acl-2013-A Bayesian Model for Joint Unsupervised Induction of Sentiment, Aspect and Discourse Representations
17 0.61724365 267 acl-2013-PARMA: A Predicate Argument Aligner
18 0.61431098 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction
19 0.61421955 123 acl-2013-Discriminative Learning with Natural Annotations: Word Segmentation as a Case Study
20 0.61398399 18 acl-2013-A Sentence Compression Based Framework to Query-Focused Multi-Document Summarization