acl acl2013 acl2013-25 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tingting Li ; Tiejun Zhao ; Andrew Finch ; Chunyue Zhang
Abstract: Machine Transliteration is an essential task for many NLP applications. However, names and loan words typically originate from various languages, obey different transliteration rules, and therefore may benefit from being modeled independently. Recently, transliteration models based on Bayesian learning have overcome issues with over-fitting allowing for many-to-many alignment in the training of transliteration models. We propose a novel coupled Dirichlet process mixture model (cDPMM) that simultaneously clusters and bilingually aligns transliteration data within a single unified model. The unified model decomposes into two classes of non-parametric Bayesian component models: a Dirichlet process mixture model for clustering, and a set of multinomial Dirichlet process models that perform bilingual alignment independently for each cluster. The experimental results show that our method considerably outperforms conventional alignment models.
Reference: text
sentIndex sentText sentNum sentScore
1 However, names and loan words typically originate from various languages, obey different transliteration rules, and therefore may benefit from being modeled independently. [sent-9, score-0.633]
2 Recently, transliteration models based on Bayesian learning have overcome issues with over-fitting allowing for many-to-many alignment in the training of transliteration models. [sent-10, score-1.24]
3 We propose a novel coupled Dirichlet process mixture model (cDPMM) that simultaneously clusters and bilingually aligns transliteration data within a single unified model. [sent-11, score-0.8]
4 The unified model decomposes into two classes of non-parametric Bayesian component models: a Dirichlet process mixture model for clustering, and a set of multinomial Dirichlet process models that perform bilingual alignment independently for each cluster. [sent-12, score-0.714]
5 The experimental results show that our method considerably outperforms conventional alignment models. [sent-13, score-0.195]
6 1 Introduction Machine transliteration methods can be categorized into phonetic-based models (Knight et al. [sent-14, score-0.536]
7 , 2000), and hybrid models which utilize both phonetic and spelling information (Oh et al. [sent-16, score-0.095]
8 Among them, statistical spelling-based models which directly align characters in the training corpus have become popular because they are language-independent, do not require phonetic knowledge, and are capable of achieving stateof-the-art performance (Zhang et al. [sent-19, score-0.055]
9 The same Chinese character “金” should be aligned to different romanized character sequences: “Kim”, “Kana”, “King”, “Jin”. [sent-22, score-0.096]
10 To address this issue, many name classification methods have been proposed, such as the supervised language model-based approach of (Li et al. [sent-23, score-0.058]
11 , 2007), and the unsupervised approach of (Huang et al. [sent-24, score-0.027]
12 , 2007) proposed a supervised transliteration model which classifies names based on their origins and genders using a language model; it switches between transliteration models based on the input. [sent-27, score-1.285]
13 , 2011) tackled the issue by using an unsupervised method based on the EM algorithm to perform a soft classification. [sent-29, score-0.027]
14 , 2012) have attracted much attention in the transliteration field. [sent-33, score-0.509]
15 In comparison to many of the previous alignment models (Li et al. [sent-34, score-0.222]
16 , 2011), the nonparametric Bayesian models allow unconstrained monotonic many-to-many alignment and are able to overcome the inherent over-fitting problem. [sent-37, score-0.326]
17 , 2012) took these two factors into consideration, but their approach still operates within an EM framework and model order selection by hand is necessary prior to training. [sent-42, score-0.063]
18 c A2s0s1o3ci Aatsiosonc fioartio Cno fmorpu Ctoamtiopnuatalt Lioin gauli Lsitnicgsu,i psatgices 393–398, We propose a simple, elegant, fullyunsupervised solution based on a single generative model able to both cluster and align simultaneous- ly. [sent-45, score-0.163]
19 The coupled Dirichlet Process Mixture Model (cDPMM) integrates a Dirichlet process mixture model (DPMM) (Antoniak, 1974) and a Bayesian Bilingual Alignment Model (BBAM) (Finch et al. [sent-46, score-0.202]
20 1 Terminology In this paper, we concentrate on the alignment process for transliteration. [sent-52, score-0.233]
21 The proposed cDPMM segments a bilingual corpus of transliteration pairs into bilingual character sequence-pairs. [sent-53, score-0.677]
22 Finally, we express the training set itself as a set of sequence pairs: D = {xi}iI=1. [sent-69, score-0.033]
23 Our aim is to obtain a bilingual alignment ⟨(s1 , t1) , . [sent-70, score-0.255]
24 , (sl , tl)⟩ for each transliteraatliiognn pair xi, where each (sj , tj) ri se a segment eor-f the whole pair (a TU) and lis the number of seg- ments used to segment xi. [sent-73, score-0.092]
25 2 Methodology Our cDPMM integrates two Dirichlet process models: the DPMM clustering model, and the BBAM alignment model which is a multinomial Dirichlet process. [sent-75, score-0.444]
26 A Dirichlet process mixture model, models the data as a mixture of distributions one for each cluster. [sent-76, score-0.265]
27 It is an infinite mixture model, and the number of components is not fixed prior to training. [sent-77, score-0.149]
28 – Gc|αc, G0c ∼ DP(αc, G0c) θk |Gc ∼ Gc xi |θk ∼ f(xi |θk) (1) where G0c is the base measure and αc > 0 is the concentration parameter for the distribution Gc. [sent-79, score-0.174]
29 xi is a name pair in training data, and θk represents the parameters of a candidate cluster k for xi. [sent-80, score-0.364]
30 Specifically θk contains the probabilities of all the TUs in cluster k. [sent-81, score-0.096]
31 f(xi |θk) (defined in Equation 7) is the probability th|aθt mixture component k parameterized by θk will generate xi. [sent-82, score-0.139]
32 The alignment component of our cDPMM is a multinomial Dirichlet process and is defined as follows: Ga|αa, G0a ∼ DP(αa, G0a) (sj , tj) |Ga ∼ Ga (2) The subscripts ‘c’ and ‘a’ in Equations 1 and 2 indicate whether the terms belong to the clustering or alignment model respectively. [sent-83, score-0.678]
33 The generative story for the cDPMM is simple: first generate an infinite number of clusters, choose one, then generate a transliteration pair using the parameters that describe the cluster. [sent-84, score-0.662]
34 The basic sampling unit of the cDPMM for the clustering process is a transliteration pair, but the basic sampling unit for BBAM is a TU. [sent-85, score-0.718]
35 In order to integrate the two processes in a single model we treat a transliteration pair as a sequence of TUs generated by a BBAM model. [sent-86, score-0.624]
36 The BBAM generates a sequence (a transliteration pair) based on the joint source-channel model (Li et al. [sent-87, score-0.578]
37 We use a blocked version of a Gibbs sampler to train each BBAM (see (Mochihashi et al. [sent-89, score-0.034]
38 3 The Alignment Model This model is a multinomial DP model. [sent-92, score-0.106]
39 Under the Chinese restaurant process (CRP) (Aldous, 1985) 394 interpretation, each unique TU corresponds to a dish served at a table, and the number ofcustomers in each table represents the count of a particular TU in the model. [sent-93, score-0.122]
40 The base measure G0a is a joint spelling model: G0a((s,t)) = P(|s|)P(s||s|)P(|t|)P(t||t|) =λ|s|s|s|! [sent-96, score-0.04]
41 Under the CRP interpretation, a transliteration pair corresponds to a customer, the dish served on each table corresponds to an origin of names. [sent-101, score-0.704]
42 , K} to indicate the cluster of each transl∈ite {ra1,ti. [sent-108, score-0.096]
43 , θK) to represent the parameters of the component associated with each cluster. [sent-114, score-0.066]
44 In our model, each mixture component is a multinomial DP model, and since θk contains the probabilities of all the TUs in cluster k, the num- ber of parameters in each θk is uncertain and changes with the transliteration pairs that belong to the cluster. [sent-115, score-0.841]
45 For a new cluster (the K + 1th cluster), we use Equation 4 to calculate the probability of each TU. [sent-116, score-0.096]
46 The cluster membership probability of a transliteration pair xi is calculated as follows, P(zi = k|D,θ,z−i) P(zi = K + 1|D,θ,z−i) ∝ ∝ n −1n k + αcP(xi|z,θk) (5) n −α1 c + αcP(xi|z,θK+1) (6) where nk is the number of transliteration pairs in the existing cluster k ∈ {1, . [sent-117, score-1.427]
47 , K} (cluster K + 1 tish a newly gc creluastetedr cluster), zi Kis }th (ec cuslutsetre Kr in +di 1cator for xi, and z−i is the sequence of observed clusters up to xi. [sent-120, score-0.301]
48 As mentioned earlier, basic sampling units are inconsistent for the clustering and alignment model, therefore to couple the models the BBAM generates transliteration pairs as a sequence of TUs, these pairs are then used directly in the DPMM. [sent-121, score-0.902]
49 , (sl , tl)⟩ be a derivation of a trLaents γlit =era ⟨ti(osn pair xi. [sent-125, score-0.046]
50 To m)a⟩k bee et ahe d meriovdaetil oinnt eo-f gration process explicit, we use function f to calculate the probability P(xi |z, θk), where f is defined as follows, f(xi|θk) ={ ∑ γγ∈∈R ∏ ( s , t ) ∈ γγPG(0sc(,st,|tθ)k) k ∈ = { K1, +. [sent-126, score-0.038]
51 The cluster membership zi is sampled together with the derivation γ in a single step according to P(zi = k|D, θ, z−i) and f(xi |θk). [sent-128, score-0.289]
52 1 Corpora To empirically validate our approach, we investigate the effectiveness of our model by conducting English-Chinese name transliteration genera- tion on three corpora containing name pairs of varying degrees of mixed origin. [sent-132, score-0.661]
53 The first corpus was constructed with names only originating from English language (EO), and the second with names originating from English, Chinese, Japanese evenly (ECJO). [sent-134, score-0.238]
54 The third corpus was created by extracting name pairs from LDC (Linguistic Data Consortium) Named Entity List, which contains names from all over the world (Multi-O). [sent-135, score-0.135]
55 2 Baselines We compare our alignment model with GIZA++ (Och et al. [sent-139, score-0.231]
56 We employ two decoding models: a phrase-based machine translation decoder (specifically Moses (Koehn et al. [sent-141, score-0.051]
57 For the Moses decoder, we applied the grow-diag-final-and heuristic algorithm to extract the phrase table, and tuned the parameters using the BLEU metric. [sent-145, score-0.027]
58 3 Parameter Setting In our model, there are several important parameters: 1) max s, the maximum length of the source sequences of the alignment tokens; 2) max t, the maximum length of the target sequences of the alignment tokens; and 3) nc, the initial number of classes for the training data. [sent-150, score-0.572]
59 We set max s = 6, max t = 1 and nc = 5 empirically based on a small pilot experiment. [sent-151, score-0.091]
60 The Moses decoder was used with default settings except for the distortionlimit which was set to 0 to ensure monotonic decoding. [sent-152, score-0.108]
61 For the DirecTL decoder the following settings were used: cs = 4, ng = 9 and nBest = 5. [sent-153, score-0.051]
62 cs denotes the size of context window for features, ng indicates the size of n-gram features and nBest is the size of transliteration candidate list for updating the model in each iteration. [sent-154, score-0.545]
63 The concentration parameter αc, αa of the clustering model and the BBAM was learned by sampling its val- ue. [sent-155, score-0.211]
64 , 2009) we used a vague gamma prior Γ(10−4, 104), and sampled new values from a log-normal distribution whose mean was the value of the parameter, and variance was 0. [sent-157, score-0.027]
65 The parameters λs and λt in Equation 4 were set to λs = 4 and λt = 1. [sent-160, score-0.027]
66 4 Experimental Results Table 3 shows some details of the alignment results. [sent-169, score-0.195]
67 The #(Clusters) represents the average number of clusters from the cDPMM. [sent-170, score-0.058]
68 It is averaged over the final 50 iterations, and the classes which contain less than 10 name pairs are excluded. [sent-171, score-0.102]
69 The #(Targets) represents the average number of English character sequences that are aligned to each Chinese sequence. [sent-172, score-0.086]
70 From the results we can see that in terms of the number of alignment targets: GIZA++ > cDPMM > BBAM. [sent-173, score-0.195]
71 GIZA++ has considerably more targets than the other approaches, and this is likely to be a symptom of it overfitting the data. [sent-174, score-0.077]
72 cDPMM can alleviate the overfitting through its BBAM component, and at the same time effectively model the diversity in Chinese character sequences caused by multi-origin. [sent-175, score-0.149]
73 Table 1 shows some typical TUs from the alignments produced by BBAM and cDPMM on corpus Multi-O. [sent-176, score-0.037]
74 The information in brackets in Table 1, represents the ID of the class and origin of 396 CorporaModelACCEMva-FluscaotiorenMRR MEuCltJOi- cBDGBPIMZA MA 0 . [sent-177, score-0.065]
75 the name pair; the symbol ‘ ’ indicates a “NULL” alignment. [sent-181, score-0.058]
76 We can see the Chinese characters “丁(ding) 一(yi) 东(dong)” have different alignments in different origins, and that the cDPMM has provided the correct alignments for them. [sent-182, score-0.074]
77 We used the sampled alignment from running the BBAM and cDPMM models for 100 iterations, and combined the alignment tables of each class together. [sent-183, score-0.444]
78 The experiments are therefore investigating whether the alignment has been meaningfully improved by the clustering process. [sent-184, score-0.3]
79 We would expect further gains from exploiting the class information in the decoding process (as in (Li et al. [sent-185, score-0.038]
80 Our proposed model obtained the highest performance on all three datasets for all evaluation metrics by a considerable margin. [sent-189, score-0.036]
81 This shows that although names may have monolingual origin, there are hidden factors which can allow our model to succeed, possibly related to gender or convention. [sent-191, score-0.113]
82 Other models based on supervised classification or clustering with fixed classes may fail to capture these characteristics. [sent-192, score-0.176]
83 4 Conclusion In this paper we propose an elegant unsupervised technique for monotonic sequence alignment based on a single generative model. [sent-197, score-0.385]
84 efits of our model are that it can handle data from multiple origins, and model using many-to-many alignment without over-fitting. [sent-202, score-0.267]
85 The model operates by clustering the data into classes while simultaneously aligning it, and is able to discover an appropriate number of classes from the data. [sent-203, score-0.256]
86 Our results show that our alignment model can improve the performance of a transliteration generation system relative to two other state-of-the-art aligners. [sent-204, score-0.74]
87 Furthermore, the system produced gains even on data of monolingual origin, where no obvious clusters in the data were expected. [sent-205, score-0.058]
88 Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. [sent-220, score-0.047]
89 A Gibbs sampler for phrasal synchronous grammar induction. [sent-233, score-0.061]
90 A machine transliteration model based on correspondence between graphemes and phonemes. [sent-318, score-0.545]
wordName wordTfidf (topN-words)
[('transliteration', 0.509), ('cdpmm', 0.378), ('bbam', 0.349), ('alignment', 0.195), ('tus', 0.18), ('xi', 0.137), ('zi', 0.132), ('hagiwara', 0.127), ('origins', 0.127), ('directl', 0.116), ('clustering', 0.105), ('mixture', 0.1), ('dirichlet', 0.098), ('cluster', 0.096), ('jiampojamarn', 0.095), ('dpmm', 0.087), ('tu', 0.083), ('bayesian', 0.082), ('gc', 0.078), ('names', 0.077), ('mochihashi', 0.071), ('multinomial', 0.07), ('origin', 0.065), ('oh', 0.062), ('tj', 0.062), ('finch', 0.061), ('bilingual', 0.06), ('kana', 0.058), ('clusters', 0.058), ('name', 0.058), ('monotonic', 0.057), ('dp', 0.055), ('sj', 0.054), ('moses', 0.053), ('dish', 0.051), ('lit', 0.051), ('decoder', 0.051), ('targets', 0.05), ('infinite', 0.049), ('ga', 0.049), ('character', 0.048), ('obey', 0.047), ('sittichai', 0.047), ('li', 0.047), ('nonparametric', 0.047), ('haizhou', 0.047), ('pair', 0.046), ('masato', 0.045), ('classes', 0.044), ('originating', 0.042), ('efron', 0.042), ('elegant', 0.042), ('brill', 0.041), ('crp', 0.041), ('chinese', 0.04), ('zhang', 0.04), ('equation', 0.04), ('spelling', 0.04), ('tl', 0.039), ('component', 0.039), ('sequences', 0.038), ('nbest', 0.038), ('process', 0.038), ('min', 0.037), ('alignments', 0.037), ('concentration', 0.037), ('grzegorz', 0.037), ('named', 0.036), ('model', 0.036), ('news', 0.035), ('sampler', 0.034), ('satoshi', 0.034), ('membership', 0.034), ('sampling', 0.033), ('sequence', 0.033), ('served', 0.033), ('giza', 0.033), ('max', 0.031), ('unified', 0.031), ('blunsom', 0.031), ('generative', 0.031), ('sl', 0.031), ('huang', 0.03), ('cp', 0.029), ('jin', 0.029), ('nc', 0.029), ('eo', 0.028), ('coupled', 0.028), ('sm', 0.028), ('phonetic', 0.028), ('operates', 0.027), ('overfitting', 0.027), ('parameters', 0.027), ('models', 0.027), ('sampled', 0.027), ('unsupervised', 0.027), ('synchronous', 0.027), ('wenwen', 0.026), ('tively', 0.026), ('bhargava', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
Author: Tingting Li ; Tiejun Zhao ; Andrew Finch ; Chunyue Zhang
Abstract: Machine Transliteration is an essential task for many NLP applications. However, names and loan words typically originate from various languages, obey different transliteration rules, and therefore may benefit from being modeled independently. Recently, transliteration models based on Bayesian learning have overcome issues with over-fitting allowing for many-to-many alignment in the training of transliteration models. We propose a novel coupled Dirichlet process mixture model (cDPMM) that simultaneously clusters and bilingually aligns transliteration data within a single unified model. The unified model decomposes into two classes of non-parametric Bayesian component models: a Dirichlet process mixture model for clustering, and a set of multinomial Dirichlet process models that perform bilingual alignment independently for each cluster. The experimental results show that our method considerably outperforms conventional alignment models.
2 0.18847585 34 acl-2013-Accurate Word Segmentation using Transliteration and Language Model Projection
Author: Masato Hagiwara ; Satoshi Sekine
Abstract: Transliterated compound nouns not separated by whitespaces pose difficulty on word segmentation (WS) . Offline approaches have been proposed to split them using word statistics, but they rely on static lexicon, limiting their use. We propose an online approach, integrating source LM, and/or, back-transliteration and English LM. The experiments on Japanese and Chinese WS have shown that the proposed models achieve significant improvement over state-of-the-art, reducing 16% errors in Japanese.
Author: Phillippe Langlais
Abstract: Analogical learning over strings is a holistic model that has been investigated by a few authors as a means to map forms of a source language to forms of a target language. In this study, we revisit this learning paradigm and apply it to the transliteration task. We show that alone, it performs worse than a statistical phrase-based machine translation engine, but the combination of both approaches outperforms each one taken separately, demonstrating the usefulness of the information captured by a so-called formal analogy.
4 0.14658448 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
Author: Xiaojun Quan ; Chunyu Kit ; Yan Song
Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.
5 0.12548673 255 acl-2013-Name-aware Machine Translation
Author: Haibo Li ; Jing Zheng ; Heng Ji ; Qi Li ; Wen Wang
Abstract: We propose a Name-aware Machine Translation (MT) approach which can tightly integrate name processing into MT model, by jointly annotating parallel corpora, extracting name-aware translation grammar and rules, adding name phrase table and name translation driven decoding. Additionally, we also propose a new MT metric to appropriately evaluate the translation quality of informative words, by assigning different weights to different words according to their importance values in a document. Experiments on Chinese-English translation demonstrated the effectiveness of our approach on enhancing the quality of overall translation, name translation and word alignment over a high-quality MT baseline1 .
6 0.11878902 154 acl-2013-Extracting bilingual terminologies from comparable corpora
7 0.11788431 256 acl-2013-Named Entity Recognition using Cross-lingual Resources: Arabic as an Example
8 0.10852987 11 acl-2013-A Multi-Domain Translation Model Framework for Statistical Machine Translation
9 0.10414261 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
10 0.097656779 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
11 0.097164288 181 acl-2013-Hierarchical Phrase Table Combination for Machine Translation
12 0.096633837 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment
13 0.096367024 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
14 0.096048422 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
15 0.093918785 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
16 0.089985624 388 acl-2013-Word Alignment Modeling with Context Dependent Deep Neural Network
17 0.086632788 138 acl-2013-Enriching Entity Translation Discovery using Selective Temporality
18 0.083471373 307 acl-2013-Scalable Decipherment for Machine Translation via Hash Sampling
19 0.081756286 345 acl-2013-The Haves and the Have-Nots: Leveraging Unlabelled Corpora for Sentiment Analysis
20 0.081038117 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
topicId topicWeight
[(0, 0.196), (1, -0.066), (2, 0.066), (3, 0.062), (4, 0.091), (5, -0.008), (6, -0.078), (7, 0.015), (8, -0.064), (9, -0.044), (10, 0.029), (11, -0.131), (12, 0.014), (13, -0.051), (14, -0.01), (15, -0.052), (16, 0.043), (17, 0.016), (18, -0.009), (19, -0.026), (20, -0.019), (21, -0.022), (22, 0.062), (23, 0.037), (24, 0.041), (25, 0.045), (26, -0.054), (27, 0.043), (28, 0.036), (29, 0.01), (30, 0.079), (31, -0.003), (32, -0.002), (33, -0.123), (34, -0.057), (35, 0.017), (36, -0.063), (37, 0.137), (38, -0.063), (39, 0.041), (40, -0.073), (41, 0.015), (42, 0.09), (43, 0.017), (44, 0.018), (45, -0.114), (46, -0.054), (47, -0.042), (48, 0.081), (49, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.92412621 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
Author: Tingting Li ; Tiejun Zhao ; Andrew Finch ; Chunyue Zhang
Abstract: Machine Transliteration is an essential task for many NLP applications. However, names and loan words typically originate from various languages, obey different transliteration rules, and therefore may benefit from being modeled independently. Recently, transliteration models based on Bayesian learning have overcome issues with over-fitting allowing for many-to-many alignment in the training of transliteration models. We propose a novel coupled Dirichlet process mixture model (cDPMM) that simultaneously clusters and bilingually aligns transliteration data within a single unified model. The unified model decomposes into two classes of non-parametric Bayesian component models: a Dirichlet process mixture model for clustering, and a set of multinomial Dirichlet process models that perform bilingual alignment independently for each cluster. The experimental results show that our method considerably outperforms conventional alignment models.
Author: Phillippe Langlais
Abstract: Analogical learning over strings is a holistic model that has been investigated by a few authors as a means to map forms of a source language to forms of a target language. In this study, we revisit this learning paradigm and apply it to the transliteration task. We show that alone, it performs worse than a statistical phrase-based machine translation engine, but the combination of both approaches outperforms each one taken separately, demonstrating the usefulness of the information captured by a so-called formal analogy.
3 0.6535818 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
Author: Xiaojun Quan ; Chunyu Kit ; Yan Song
Abstract: This paper studies the problem of nonmonotonic sentence alignment, motivated by the observation that coupled sentences in real bitexts do not necessarily occur monotonically, and proposes a semisupervised learning approach based on two assumptions: (1) sentences with high affinity in one language tend to have their counterparts with similar relatedness in the other; and (2) initial alignment is readily available with existing alignment techniques. They are incorporated as two constraints into a semisupervised learning framework for optimization to produce a globally optimal solution. The evaluation with realworld legal data from a comprehensive legislation corpus shows that while exist- ing alignment algorithms suffer severely from non-monotonicity, this approach can work effectively on both monotonic and non-monotonic data.
4 0.63870472 47 acl-2013-An Information Theoretic Approach to Bilingual Word Clustering
Author: Manaal Faruqui ; Chris Dyer
Abstract: We present an information theoretic objective for bilingual word clustering that incorporates both monolingual distributional evidence as well as cross-lingual evidence from parallel corpora to learn high quality word clusters jointly in any number of languages. The monolingual component of our objective is the average mutual information of clusters of adjacent words in each language, while the bilingual component is the average mutual information of the aligned clusters. To evaluate our method, we use the word clusters in an NER system and demonstrate a statistically significant improvement in F1 score when using bilingual word clusters instead of monolingual clusters.
5 0.63696635 210 acl-2013-Joint Word Alignment and Bilingual Named Entity Recognition Using Dual Decomposition
Author: Mengqiu Wang ; Wanxiang Che ; Christopher D. Manning
Abstract: Translated bi-texts contain complementary language cues, and previous work on Named Entity Recognition (NER) has demonstrated improvements in performance over monolingual taggers by promoting agreement of tagging decisions between the two languages. However, most previous approaches to bilingual tagging assume word alignments are given as fixed input, which can cause cascading errors. We observe that NER label information can be used to correct alignment mistakes, and present a graphical model that performs bilingual NER tagging jointly with word alignment, by combining two monolingual tagging models with two unidirectional alignment models. We intro- duce additional cross-lingual edge factors that encourage agreements between tagging and alignment decisions. We design a dual decomposition inference algorithm to perform joint decoding over the combined alignment and NER output space. Experiments on the OntoNotes dataset demonstrate that our method yields significant improvements in both NER and word alignment over state-of-the-art monolingual baselines.
6 0.63027471 354 acl-2013-Training Nondeficient Variants of IBM-3 and IBM-4 for Word Alignment
7 0.62850249 15 acl-2013-A Novel Graph-based Compact Representation of Word Alignment
8 0.62311983 48 acl-2013-An Open Source Toolkit for Quantitative Historical Linguistics
9 0.60219473 34 acl-2013-Accurate Word Segmentation using Transliteration and Language Model Projection
10 0.56955576 9 acl-2013-A Lightweight and High Performance Monolingual Word Aligner
11 0.55198574 369 acl-2013-Unsupervised Consonant-Vowel Prediction over Hundreds of Languages
12 0.53357786 143 acl-2013-Exact Maximum Inference for the Fertility Hidden Markov Model
13 0.52630961 276 acl-2013-Part-of-Speech Induction in Dependency Trees for Statistical Machine Translation
14 0.52021599 89 acl-2013-Computerized Analysis of a Verbal Fluency Test
15 0.51383215 255 acl-2013-Name-aware Machine Translation
16 0.49077895 46 acl-2013-An Infinite Hierarchical Bayesian Model of Phrasal Translation
17 0.49031708 145 acl-2013-Exploiting Qualitative Information from Automatic Word Alignment for Cross-lingual NLP Tasks
18 0.47085455 281 acl-2013-Post-Retrieval Clustering Using Third-Order Similarity Measures
19 0.46827763 71 acl-2013-Bootstrapping Entity Translation on Weakly Comparable Corpora
20 0.46598172 154 acl-2013-Extracting bilingual terminologies from comparable corpora
topicId topicWeight
[(0, 0.063), (6, 0.047), (11, 0.074), (24, 0.057), (26, 0.04), (35, 0.072), (42, 0.061), (48, 0.045), (70, 0.057), (88, 0.022), (90, 0.05), (92, 0.197), (95, 0.135)]
simIndex simValue paperId paperTitle
1 0.9446435 42 acl-2013-Aid is Out There: Looking for Help from Tweets during a Large Scale Disaster
Author: Istvan Varga ; Motoki Sano ; Kentaro Torisawa ; Chikara Hashimoto ; Kiyonori Ohtake ; Takao Kawai ; Jong-Hoon Oh ; Stijn De Saeger
Abstract: The 2011 Great East Japan Earthquake caused a wide range of problems, and as countermeasures, many aid activities were carried out. Many of these problems and aid activities were reported via Twitter. However, most problem reports and corresponding aid messages were not successfully exchanged between victims and local governments or humanitarian organizations, overwhelmed by the vast amount of information. As a result, victims could not receive necessary aid and humanitarian organizations wasted resources on redundant efforts. In this paper, we propose a method for discovering matches between problem reports and aid messages. Our system contributes to problem-solving in a large scale disaster situation by facilitating communication between victims and humanitarian organizations.
2 0.85177743 213 acl-2013-Language Acquisition and Probabilistic Models: keeping it simple
Author: Aline Villavicencio ; Marco Idiart ; Robert Berwick ; Igor Malioutov
Abstract: Hierarchical Bayesian Models (HBMs) have been used with some success to capture empirically observed patterns of under- and overgeneralization in child language acquisition. However, as is well known, HBMs are “ideal”learningsystems, assumingaccess to unlimited computational resources that may not be available to child language learners. Consequently, it remains crucial to carefully assess the use of HBMs along with alternative, possibly simpler, candidate models. This paper presents such an evaluation for a language acquisi- tion domain where explicit HBMs have been proposed: the acquisition of English dative constructions. In particular, we present a detailed, empiricallygrounded model-selection comparison of HBMs vs. a simpler alternative based on clustering along with maximum likelihood estimation that we call linear competition learning (LCL). Our results demonstrate that LCL can match HBM model performance without incurring on the high computational costs associated with HBMs.
same-paper 3 0.84628785 25 acl-2013-A Tightly-coupled Unsupervised Clustering and Bilingual Alignment Model for Transliteration
Author: Tingting Li ; Tiejun Zhao ; Andrew Finch ; Chunyue Zhang
Abstract: Machine Transliteration is an essential task for many NLP applications. However, names and loan words typically originate from various languages, obey different transliteration rules, and therefore may benefit from being modeled independently. Recently, transliteration models based on Bayesian learning have overcome issues with over-fitting allowing for many-to-many alignment in the training of transliteration models. We propose a novel coupled Dirichlet process mixture model (cDPMM) that simultaneously clusters and bilingually aligns transliteration data within a single unified model. The unified model decomposes into two classes of non-parametric Bayesian component models: a Dirichlet process mixture model for clustering, and a set of multinomial Dirichlet process models that perform bilingual alignment independently for each cluster. The experimental results show that our method considerably outperforms conventional alignment models.
4 0.74909335 387 acl-2013-Why-Question Answering using Intra- and Inter-Sentential Causal Relations
Author: Jong-Hoon Oh ; Kentaro Torisawa ; Chikara Hashimoto ; Motoki Sano ; Stijn De Saeger ; Kiyonori Ohtake
Abstract: In this paper, we explore the utility of intra- and inter-sentential causal relations between terms or clauses as evidence for answering why-questions. To the best of our knowledge, this is the first work that uses both intra- and inter-sentential causal relations for why-QA. We also propose a method for assessing the appropriateness of causal relations as answers to a given question using the semantic orientation of excitation proposed by Hashimoto et al. (2012). By applying these ideas to Japanese why-QA, we improved precision by 4.4% against all the questions in our test set over the current state-of-theart system for Japanese why-QA. In addi- tion, unlike the state-of-the-art system, our system could achieve very high precision (83.2%) for 25% of all the questions in the test set by restricting its output to the confident answers only.
5 0.73574793 240 acl-2013-Microblogs as Parallel Corpora
Author: Wang Ling ; Guang Xiang ; Chris Dyer ; Alan Black ; Isabel Trancoso
Abstract: In the ever-expanding sea of microblog data, there is a surprising amount of naturally occurring parallel text: some users create post multilingual messages targeting international audiences while others “retweet” translations. We present an efficient method for detecting these messages and extracting parallel segments from them. We have been able to extract over 1M Chinese-English parallel segments from Sina Weibo (the Chinese counterpart of Twitter) using only their public APIs. As a supplement to existing parallel training data, our automatically extracted parallel data yields substantial translation quality improvements in translating microblog text and modest improvements in translating edited news commentary. The resources in described in this paper are available at http://www.cs.cmu.edu/∼lingwang/utopia.
6 0.73439568 174 acl-2013-Graph Propagation for Paraphrasing Out-of-Vocabulary Words in Statistical Machine Translation
7 0.73229122 134 acl-2013-Embedding Semantic Similarity in Tree Kernels for Domain Adaptation of Relation Extraction
8 0.7288909 288 acl-2013-Punctuation Prediction with Transition-based Parsing
9 0.72883165 207 acl-2013-Joint Inference for Fine-grained Opinion Extraction
10 0.72853947 326 acl-2013-Social Text Normalization using Contextual Graph Random Walks
11 0.72760642 267 acl-2013-PARMA: A Predicate Argument Aligner
12 0.72610003 223 acl-2013-Learning a Phrase-based Translation Model from Monolingual Data with Application to Domain Adaptation
13 0.72538781 120 acl-2013-Dirt Cheap Web-Scale Parallel Text from the Common Crawl
14 0.72456729 361 acl-2013-Travatar: A Forest-to-String Machine Translation Engine based on Tree Transducers
15 0.72454619 5 acl-2013-A Decade of Automatic Content Evaluation of News Summaries: Reassessing the State of the Art
16 0.72325915 226 acl-2013-Learning to Prune: Context-Sensitive Pruning for Syntactic MT
17 0.72315001 155 acl-2013-Fast and Accurate Shift-Reduce Constituent Parsing
18 0.72289544 154 acl-2013-Extracting bilingual terminologies from comparable corpora
19 0.72287631 259 acl-2013-Non-Monotonic Sentence Alignment via Semisupervised Learning
20 0.71950746 255 acl-2013-Name-aware Machine Translation