emnlp emnlp2013 emnlp2013-169 knowledge-graph by maker-knowledge-mining

169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification


Source: pdf

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. [sent-2, score-0.61]

2 An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. [sent-3, score-0.25]

3 In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. [sent-4, score-0.586]

4 Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. [sent-5, score-0.571]

5 We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. [sent-6, score-0.41]

6 Recently, cross-lingual adaptation methods have been studied to exploit labeled information from an existing source language domain where labeled training data is abundant for use in a target language domain where annotated training data is scarce (Prettenhofer and Stein, 2010). [sent-11, score-1.024]

7 Previous work has shown that cross-lingual adaptation 1465 can greatly reduce labeling effort for a variety of cross language NLP tasks such as document categorization (Bel et al. [sent-12, score-0.39]

8 , 2009), genre classification (Petrenz and Webber, 2012), and sentiment classification (Shanahan et al. [sent-14, score-0.454]

9 The fundamental challenge of cross-lingual adaptation stems from a lack of overlap between the fea- ture space of the source language data and that of the target language data. [sent-16, score-0.392]

10 They first translate all the text data from one language domain into the other and then apply techniques such as domain adaptation (Wan et al. [sent-18, score-0.454]

11 As an economic alternative solution, cross-lingual representation learning has recently been used in the literature to learn language-independent representations of the data for cross language text classification (Prettenhofer and Stein, 2010; Petrenz and Webber, 2012). [sent-24, score-0.554]

12 In this paper, we propose to tackle cross language text classification by inducing cross-lingual predictive data representations with both labeled and unlabeled documents from the two language domains. [sent-25, score-0.806]

13 Specifically, we propose a cross-lingual log-bilinear document model to learn distributed representations of words, which can capture both the semantic simProce Sdeiantgtlse o,f W thaesh 2i0n1gt3o nC,o UnSfeAre,n 1c8e- o2n1 E Omctpoibriecra 2l0 M13et. [sent-26, score-0.319]

14 oc d2s0 i1n3 N Aastusorcaila Ltiaon g fuoarg Ceo Pmrpoucetastsi on ga,l p Laignegsu 1is4t6ic5s–1475, ilarities of words across languages and the predictive information with respect to the target classification task. [sent-28, score-0.378]

15 We conduct the representation learning by maximizing the log-likelihood of all documents from both language domains under the crosslingual log-bilinear document model and minimizing the prediction log-losses of labeled documents. [sent-29, score-0.832]

16 To evaluate the effectiveness of the proposed approach, we conduct experiments on the task of cross language sentiment classification of Amazon product reviews. [sent-31, score-0.532]

17 Basically, they first employ machine translation tools to translate documents from one language domain to the other one and then induce low dimensional latent representations as interlingual representations (Littman et al. [sent-34, score-0.969]

18 (1998) proposed a cross-language latent semantic indexing method to induce interlingual representations by performing latent semantic indexing over a dual-language document-term matrix, where each dual-language document contains its original words and the corresponding translation text. [sent-40, score-0.671]

19 (201 1) proposed a bi-view non-negative matrix tri-factorization method for cross-lingual sentiment classification on the parallel training and test data. [sent-46, score-0.432]

20 Guo and Xiao (2012a) developed a transductive subspace representation learning method for crosslingual text classification based on non-negative matrix factorization. [sent-47, score-0.485]

21 Some other works exploited parallel data by using multilingual topic models to extract cross-language latent topics as interlingual representations (Mimno et al. [sent-48, score-0.485]

22 , 2011) and using neural probabilistic language modes to learn word embeddings as cross-lingual distributed representations (Klementiev et al. [sent-52, score-0.275]

23 , 2000) to induce cross-lingual word distributed representations on a set of wordlevel aligned parallel sentences. [sent-63, score-0.279]

24 Another group of works propose to use bilingual dictionaries to learn interlingual representations (Gliozzo, 2006; Prettenhofer and Stein, 2010). [sent-67, score-0.426]

25 Then they conducted latent semantic analysis (LSA) over the document-term matrix with concatenated vocabularies to obtain interlingual representations. [sent-69, score-0.294]

26 Some other bilingual resources, such as multilingual WordNet (Fellbaum, 1998) and universal partof-speech (POS) tags (Petrov et al. [sent-74, score-0.241]

27 (2012), which transformed words from different languages to WordNet synset identifiers as interlingual sense-based representations. [sent-80, score-0.213]

28 Recently, Petrenz and Webber (2012) used language-specific POS taggers to tag each word and then mapped those language-specific POS tags to twelve universal POS tags as interlingual features for cross language fine-grained genre classification. [sent-82, score-0.28]

29 3 Semi-Supervised Representation Learning for Cross-Lingual Text Classification In this section, we introduce a semi-supervised cross-lingual representation learning method and then use it for cross language text classification. [sent-84, score-0.256]

30 Assume we have ℓs labeled and us unlabeled doc- uments in the source language domain S and ℓt labuemleedn asn idn ut eun solaubrceleed la ndgocuuamgee ndotsm iani tnhe S target lan1467 guage domain T . [sent-85, score-0.78]

31 We assume all the documents are independent ann Td identically dei asltlri tbhuete ddo cinu meaecnhts sl aanreguage domain, and each document xi is represented as a bag of words, xi = {wi1, wi2 , . [sent-86, score-0.425]

32 a Wnde its label, and consider exploiting the labeled documents in the source domain S for learning classifiers imn ethntes target d soomurcaien d Tom . [sent-91, score-0.753]

33 s between the two language domains, we first construct a set of critical bilingual word pairs M = {(wis, wjt)}im=1, where wis is a critiwcaolr dw poardir si nM Mth =e source la)ng}uage domain, wjt is its translation in the target language domain, and m is the number of word pairs. [sent-93, score-0.492]

34 First we select a subset ofwords from the source language domain, which have the highest mutual infor- mation with the class labels in labeled source documents. [sent-96, score-0.382]

35 The mutual information is computed based on the empirical distributions of words and labels in the labeled source documents. [sent-97, score-0.277]

36 Then we translate the selected words into the target language using a translation tool to produce word pairs. [sent-98, score-0.266]

37 Finally we produce the M set by eliminating any candidate pair (ws, wt), if either occurs less than a predefined threshold value φ in all source language documents or wt occurs less than φ in all target language documents. [sent-99, score-0.437]

38 Given the constructed bilingual word pair set M, the words appearing in the source language documents but not in M can be put together to form a source specific vocabulary set Vs = {ws1, . [sent-100, score-0.58]

39 Similarly, the words appearing in th=e target language documents but not in M can be put together to form a target specific vocabulary set Vt = {wt1, . [sent-104, score-0.584]

40 ∪T Mhis, cross-lingual vocabulary set covers all words appearing in both domains, while mapping each bilingual pair in M into the same entry. [sent-110, score-0.207]

41 To tackle cross language text classification, we ws wvss then propose a cross-lingual log-bilinear document model to learn a predictive cross-lingual representation of words, which maps each entry in the vocabulary set V to one row vector in a word embedding matrix R ∈ Rv×k. [sent-111, score-0.677]

42 Moreover, we explicitly incorporate the label information into our proposed approach, rendering the induced word embeddings more discriminative to the target prediction task. [sent-115, score-0.367]

43 1 Cross-Lingual Word Embeddings As mentioned above, we assume a unified embedding matrix R which contains the distributed vector representations of words in the two language domains. [sent-117, score-0.422]

44 However, even in a unified representation space, the distribution of words in the two domains will be different. [sent-118, score-0.231]

45 To capture the distribution divergence of the two domains and facilitate cross- lingual learning, we split the word embedding matrix into three parts: source language specific part Rs ∈ Rv×ks , common part Rc ∈ Rv×kc and target language specific part Rt ∈ ∈R vR×kt , such that k = ks + kc + kt. [sent-119, score-0.786]

46 Intuitively, we assume that source language words contain no target language specific representations and target language words contain no source language specific representations. [sent-120, score-0.697]

47 Thus for words in the two language domains, we retrieve their distributed vector representations from the embedding matrix R using two mapping functions, ΦS and ΦT, one for each language domain. [sent-121, score-0.422]

48 To encode more information into the common part of representation for better knowledge transfer from the source 1468 language domain to the target language domain, we assume kc ≥ ks and kc ≥ kt. [sent-124, score-0.846]

49 The form of three part feature representations h≥as k been exploited in previous work of domain adaptation with heterogeneous feature spaces (Duan et al. [sent-125, score-0.411]

50 However, their approach simply duplicates the original features as language-specific representations, while we will automatically learn those three part latent representations in our approach. [sent-127, score-0.215]

51 The first part of the objective function captures the likelihood of the documents being generated with the learned representation R. [sent-130, score-0.292]

52 The second part of the objective function in (3) takes the label information into account and aims to render the latent word representations more taskpredictive. [sent-133, score-0.215]

53 6) where w, q are model parameters of the logistic regression model, ΨL (xi) is the k-dimensional vector representation of the document xi in the language domain L. [sent-139, score-0.491]

54 The distributed vector representation of any given document can then be computed using Eq. [sent-152, score-0.342]

55 4 Experiments We empirically evaluate the proposed approach using the cross language sentiment classification tasks of Amazon product reviews in four languages. [sent-157, score-0.672]

56 1 Dataset We used the multilingual sentiment classification dataset1 provided by Prettenhofer and Stein (2010), which contains Amazon product reviews in four different languages, English (E), French (F), German (G) and Japanese (J). [sent-160, score-0.553]

57 The English product reviews were sampled from previous cross-domain sentiment classification datasets (Blitzer et al. [sent-161, score-0.436]

58 Following the work (Prettenhofer and Stein, 2010), we used the original English reviews as the source language while treating the other three languages as target languages. [sent-166, score-0.417]

59 Thus, we construct nine cross language sentiment classification tasks (GB, GD, GM, FB, FD, FM, JB, JD, JM), one for each target language-category pair. [sent-167, score-0.815]

60 2 Approaches We compare our proposed semi-supervised crosslingual representation learning (CL-RL) approach to the following approaches for cross-lingual document classification. [sent-170, score-0.401]

61 de/research/corpora/ Table 1: Average classification accuracies and standard deviations for the 9 cross-lingual sentiment classification tasks. [sent-173, score-0.628]

62 • • TB: This is a target baseline method, which tTrBai:ns T a supervised monolingual cthlaosds,if wierh on the labeled training data from the target language domain without representation learning. [sent-176, score-0.783]

63 1470 In all experiments, we used a linear support vector machine (SVM) for sentiment classification. [sent-181, score-0.199]

64 For the CL-SCL method, we used the same parameter setting as suggested in the paper (Prettenhofer and Stein, 2010): the number of pivot features is set as 450, the threshold value for selecting pivot features is 30, and the reduced dimensionality after singular value decomposition is 100. [sent-184, score-0.231]

65 For the CLD-LSA method, we set the dimensionality of latent representation as 1000. [sent-185, score-0.284]

66 The values of α, β, γ and η are selected using the first cross language classification task GB. [sent-188, score-0.276]

67 3 Classification Accuracy For each of the nine cross language sentiment classification tasks with different target language-category pairs, we used the training set in the source language domain (English) as labeled data while treating the test set in the source language domain as unlabeled. [sent-195, score-1.485]

68 For target language domain, we used the test set as test data while randomly selecting 100 documents from the training set as labeled data and treating the rest as unlabeled data. [sent-197, score-0.55]

69 Thus, for each task, we have 2000 labeled documents and 2000 unlabeled documents from the source language domain, and 100 labeled and 1900 unlabeled documents from the tar- get language domain for training. [sent-198, score-1.174]

70 We have 2000 test documents from the target language domain as testing data. [sent-199, score-0.476]

71 We repeated each experiment 1471 10 times with different random selections of 100 labeled training documents from the target language domain. [sent-201, score-0.561]

72 The average classification accuracies and standard deviations are reported in Table 1. [sent-202, score-0.323]

73 From Table 1, we can see that the proposed semisupervised cross-lingual representation learning approach, CL-RL, clearly outperforms all other comparison methods on eight out of the nine tasks. [sent-203, score-0.338]

74 By exploiting the large amount of labeled training data from the source language domain, even the simple cross-lingual adaptation approach, CL-Dict, produces effective improvements over TB. [sent-206, score-0.395]

75 With a better designed representation learning, CLD-LSA outperforms CL-Dict on all the nine tasks, but the improvements are very small on some tasks (e. [sent-210, score-0.343]

76 4 Classification Accuracy vs the Number of Labeled Target Documents Next, we investigated the performance of the six approaches by varying the number of labeled training documents from the target language domain. [sent-223, score-0.504]

77 In each experiment, for a given value ℓt, we randomly selected ℓt documents from the training set of the target language domain as labeled data and used the rest as unlabeled data. [sent-225, score-0.694]

78 We still performed prediction on the same 2000 test documents in the target language domain. [sent-226, score-0.378]

79 We repeated each experiment 10 times based on different random selections of the labeled training data from the target language domain. [sent-227, score-0.398]

80 The average classification accuracies and standard deviations across different ℓt values for all comparison methods on all the nine tasks are plotted in Figure 1. [sent-228, score-0.537]

81 We can see when the number of labeled target documents is small, TB performs poorly, especially for the first six tasks (GB, GD, GM, FB, FD, FM). [sent-229, score-0.561]

82 By increasing the size of labeled target training data, TB can greatly increase its prediction accuracies and even outperform the CL-Dict method. [sent-230, score-0.479]

83 Its performance is better than TB when the labeled training data in the target language domain is very limited and is poor than TB when the labeled target data reaches 300 for the six tasks using German and French as target languages. [sent-232, score-1.052]

84 Moreover, when adapting a system from English to a much more different target language (Japanese), CL-Dict produces much lower accuracies for all the three tasks comparing with TB. [sent-233, score-0.318]

85 These results show that CL-Dict has very limited capacity on transferring labeled information from a related source language domain. [sent-234, score-0.277]

86 By using more translation resources, the MT method outperforms TB, CL-Dict, CLD-LSA, CL-SCL in all the nine tasks across almost all scenarios. [sent-238, score-0.263]

87 Moreover, it is especially important to notice that CL-RL achieves high test accuracies even when the number of labeled target instances is small. [sent-240, score-0.476]

88 This is important for transferring knowledge from a source language to reduce the labeling effort in the target language. [sent-241, score-0.274]

89 5 Sensitivity Analysis We also investigated the sensitivity of the proposed approach over the dimensionality of the induced cross-lingual representations. [sent-243, score-0.197]

90 e W ke, repeated each experiment for 10 times based on dif- ferent random selections of labeled target training data and plotted the average prediction accuracies and standard deviations in Figure 2 for all the nine cross-lingual sentiment classification tasks. [sent-249, score-1.08]

91 This suggests the proposed approach is not very sensitive to the dimensionality ofthe cross-lingual embedding features within the considered range of values, and with a small dimensionality of 100, the induced representation can already perform very well. [sent-251, score-0.488]

92 Given an English word as seed word, we find its five closest neighboring English words and German words according to the Euclidean distances calculated in the induced crosslingual representation space. [sent-254, score-0.317]

93 From Table 2, we can see that the retrieved words in both language domains are seman- tically close to the seed words, which indicates that our proposed method can capture semantic similarities of words not only in a monolingual setting but also in a multilingual setting. [sent-256, score-0.271]

94 5 Conclusion In this paper, we proposed a semi-supervised crosslingual representation learning approach to address cross-lingual text classification. [sent-257, score-0.313]

95 The distributed word representation induced by the proposed approach can capture semantic similarities of words across languages while maintaining predictive information with respect to the target classification tasks. [sent-258, score-0.697]

96 To evaluate the proposed approach, we conducted experiments on nine cross language sentiment classification tasks constructed from the Amazon product reviews in four languages, comparing to a number of comparison methods. [sent-259, score-0.829]

97 Crosslingual sentiment analysis for indian languages using linked wordnets. [sent-273, score-0.216]

98 Cross-lingual Table 2: Examples of source seed words together with five closest English words and five closest German words estimated using the Euclidean distance in the cross-lingual representation space on the task GB. [sent-279, score-0.234]

99 Biographies, bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. [sent-298, score-0.274]

100 Cross lingual text classification by mining multilingual topics from wikipedia. [sent-395, score-0.344]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('prettenhofer', 0.247), ('wij', 0.185), ('stein', 0.181), ('labeled', 0.172), ('target', 0.169), ('documents', 0.163), ('nine', 0.157), ('sentiment', 0.156), ('interlingual', 0.153), ('tb', 0.152), ('classification', 0.149), ('representations', 0.149), ('domain', 0.144), ('crosslingual', 0.132), ('representation', 0.129), ('cross', 0.127), ('bilingual', 0.124), ('adaptation', 0.118), ('multilingual', 0.117), ('kc', 0.115), ('german', 0.108), ('source', 0.105), ('petrenz', 0.104), ('domains', 0.102), ('accuracies', 0.092), ('gliozzo', 0.09), ('dimensionality', 0.089), ('document', 0.088), ('xi', 0.087), ('reviews', 0.083), ('deviations', 0.082), ('gb', 0.082), ('distributed', 0.082), ('amini', 0.078), ('jb', 0.078), ('lingual', 0.078), ('vinokourov', 0.078), ('platt', 0.076), ('rc', 0.076), ('matrix', 0.075), ('embedding', 0.073), ('pivot', 0.071), ('ks', 0.069), ('smet', 0.068), ('bel', 0.068), ('latent', 0.066), ('fd', 0.066), ('guo', 0.064), ('gm', 0.062), ('shanahan', 0.062), ('pan', 0.061), ('languages', 0.06), ('japanese', 0.059), ('gd', 0.057), ('plsa', 0.057), ('selections', 0.057), ('tasks', 0.057), ('induced', 0.056), ('books', 0.055), ('kt', 0.054), ('rv', 0.054), ('klementiev', 0.054), ('fb', 0.054), ('mt', 0.053), ('amazon', 0.052), ('proposed', 0.052), ('bwij', 0.052), ('diamantaras', 0.052), ('logpl', 0.052), ('rigutini', 0.052), ('mimno', 0.052), ('row', 0.05), ('ws', 0.05), ('littman', 0.049), ('translation', 0.049), ('translate', 0.048), ('induce', 0.048), ('english', 0.048), ('product', 0.048), ('rs', 0.048), ('webber', 0.047), ('xiao', 0.047), ('prediction', 0.046), ('pl', 0.046), ('unlabeled', 0.046), ('rt', 0.045), ('wjt', 0.045), ('maas', 0.045), ('pakdd', 0.045), ('embeddings', 0.044), ('correspondence', 0.043), ('instances', 0.043), ('yi', 0.043), ('vector', 0.043), ('nips', 0.043), ('vocabulary', 0.042), ('jd', 0.041), ('temple', 0.041), ('lel', 0.041), ('appearing', 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

2 0.39339671 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo

Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.

3 0.15698604 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

Abstract: Domain adaptation for SMT usually adapts models to an individual specific domain. However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. Furthermore, it also outperforms the individual adaptation of each specific domain.

4 0.15463327 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

Author: Will Y. Zou ; Richard Socher ; Daniel Cer ; Christopher D. Manning

Abstract: We introduce bilingual word embeddings: semantic embeddings associated across two languages in the context of neural language models. We propose a method to learn bilingual embeddings from a large unlabeled corpus, while utilizing MT word alignments to constrain translational equivalence. The new embeddings significantly out-perform baselines in word semantic similarity. A single semantic similarity feature induced with bilingual embeddings adds near half a BLEU point to the results of NIST08 Chinese-English machine translation task.

5 0.14689292 143 emnlp-2013-Open Domain Targeted Sentiment

Author: Margaret Mitchell ; Jacqui Aguilar ; Theresa Wilson ; Benjamin Van Durme

Abstract: We propose a novel approach to sentiment analysis for a low resource setting. The intuition behind this work is that sentiment expressed towards an entity, targeted sentiment, may be viewed as a span of sentiment expressed across the entity. This representation allows us to model sentiment detection as a sequence tagging problem, jointly discovering people and organizations along with whether there is sentiment directed towards them. We compare performance in both Spanish and English on microblog data, using only a sentiment lexicon as an external resource. By leveraging linguisticallyinformed features within conditional random fields (CRFs) trained to minimize empirical risk, our best models in Spanish significantly outperform a strong baseline, and reach around 90% accuracy on the combined task of named entity recognition and sentiment prediction. Our models in English, trained on a much smaller dataset, are not yet statistically significant against their baselines.

6 0.14189649 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

7 0.13236761 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge

8 0.13194849 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

9 0.12968758 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

10 0.12749229 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

11 0.11762263 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning

12 0.11318479 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

13 0.11135665 109 emnlp-2013-Is Twitter A Better Corpus for Measuring Sentiment Similarity?

14 0.10686928 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models

15 0.10493808 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

16 0.10042401 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

17 0.098639324 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

18 0.094876781 193 emnlp-2013-Unsupervised Induction of Cross-Lingual Semantic Relations

19 0.09180674 204 emnlp-2013-Word Level Language Identification in Online Multilingual Communication

20 0.084556274 127 emnlp-2013-Max-Margin Synchronous Grammar Induction for Machine Translation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.326), (1, -0.055), (2, -0.157), (3, -0.142), (4, 0.17), (5, 0.001), (6, 0.031), (7, 0.01), (8, -0.09), (9, -0.209), (10, 0.106), (11, -0.275), (12, 0.033), (13, -0.144), (14, 0.076), (15, -0.003), (16, -0.191), (17, 0.041), (18, -0.07), (19, -0.045), (20, 0.086), (21, -0.16), (22, 0.123), (23, -0.151), (24, 0.057), (25, 0.045), (26, 0.059), (27, -0.024), (28, -0.019), (29, 0.035), (30, -0.01), (31, 0.084), (32, 0.04), (33, 0.042), (34, 0.008), (35, -0.01), (36, -0.021), (37, 0.079), (38, 0.002), (39, 0.09), (40, -0.024), (41, -0.004), (42, 0.006), (43, -0.005), (44, -0.033), (45, 0.072), (46, 0.01), (47, 0.035), (48, 0.032), (49, -0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9763267 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

2 0.94026339 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo

Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.

3 0.75043613 29 emnlp-2013-Automatic Domain Partitioning for Multi-Domain Learning

Author: Di Wang ; Chenyan Xiong ; William Yang Wang

Abstract: Chenyan Xiong School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA cx@ c s . cmu .edu William Yang Wang School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, USA ww@ cmu .edu might not be generalizable to other domains (BenDavid et al., 2006; Ben-David et al., 2010). Multi-Domain learning (MDL) assumes that the domain labels in the dataset are known. However, when there are multiple metadata at- tributes available, it is not always straightforward to select a single best attribute for domain partition, and it is possible that combining more than one metadata attributes (including continuous attributes) can lead to better MDL performance. In this work, we propose an automatic domain partitioning approach that aims at providing better domain identities for MDL. We use a supervised clustering approach that learns the domain distance between data instances , and then cluster the data into better domains for MDL. Our experiment on real multi-domain datasets shows that using our automatically generated domain partition improves over popular MDL methods.

4 0.68439108 136 emnlp-2013-Multi-Domain Adaptation for SMT Using Multi-Task Learning

Author: Lei Cui ; Xilun Chen ; Dongdong Zhang ; Shujie Liu ; Mu Li ; Ming Zhou

Abstract: Domain adaptation for SMT usually adapts models to an individual specific domain. However, it often lacks some correlation among different domains where common knowledge could be shared to improve the overall translation quality. In this paper, we propose a novel multi-domain adaptation approach for SMT using Multi-Task Learning (MTL), with in-domain models tailored for each specific domain and a general-domain model shared by different domains. The parameters of these models are tuned jointly via MTL so that they can learn general knowledge more accurately and exploit domain knowledge better. Our experiments on a largescale English-to-Chinese translation task validate that the MTL-based adaptation approach significantly and consistently improves the translation quality compared to a non-adapted baseline. Furthermore, it also outperforms the individual adaptation of each specific domain.

5 0.57326323 42 emnlp-2013-Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge

Author: Dhouha Bouamor ; Adrian Popescu ; Nasredine Semmar ; Pierre Zweigenbaum

Abstract: Bilingual lexicons are central components of machine translation and cross-lingual information retrieval systems. Their manual construction requires strong expertise in both languages involved and is a costly process. Several automatic methods were proposed as an alternative but they often rely on resources available in a limited number of languages and their performances are still far behind the quality of manual translations. We introduce a novel approach to the creation of specific domain bilingual lexicon that relies on Wikipedia. This massively multilingual encyclopedia makes it possible to create lexicons for a large number of language pairs. Wikipedia is used to extract domains in each language, to link domains between languages and to create generic translation dictionaries. The approach is tested on four specialized domains and is compared to three state of the art approaches using two language pairs: FrenchEnglish and Romanian-English. The newly introduced method compares favorably to existing methods in all configurations tested.

6 0.54683477 196 emnlp-2013-Using Crowdsourcing to get Representations based on Regular Expressions

7 0.53705561 135 emnlp-2013-Monolingual Marginal Matching for Translation Model Adaptation

8 0.50634986 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

9 0.50254053 138 emnlp-2013-Naive Bayes Word Sense Induction

10 0.48090297 148 emnlp-2013-Orthonormal Explicit Topic Analysis for Cross-Lingual Document Matching

11 0.4574877 13 emnlp-2013-A Study on Bootstrapping Bilingual Vector Spaces from Non-Parallel Data (and Nothing Else)

12 0.45007661 38 emnlp-2013-Bilingual Word Embeddings for Phrase-Based Machine Translation

13 0.4332808 158 emnlp-2013-Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank

14 0.40086246 143 emnlp-2013-Open Domain Targeted Sentiment

15 0.39290097 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

16 0.39157712 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

17 0.39023325 103 emnlp-2013-Improving Pivot-Based Statistical Machine Translation Using Random Walk

18 0.38776717 77 emnlp-2013-Exploiting Domain Knowledge in Aspect Extraction

19 0.38570559 156 emnlp-2013-Recurrent Continuous Translation Models

20 0.38475493 104 emnlp-2013-Improving Statistical Machine Translation with Word Class Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.018), (18, 0.477), (22, 0.054), (30, 0.069), (35, 0.01), (50, 0.013), (51, 0.16), (66, 0.032), (71, 0.029), (75, 0.018), (77, 0.035), (90, 0.014), (96, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92666304 49 emnlp-2013-Combining Generative and Discriminative Model Scores for Distant Supervision

Author: Benjamin Roth ; Dietrich Klakow

Abstract: Distant supervision is a scheme to generate noisy training data for relation extraction by aligning entities of a knowledge base with text. In this work we combine the output of a discriminative at-least-one learner with that of a generative hierarchical topic model to reduce the noise in distant supervision data. The combination significantly increases the ranking quality of extracted facts and achieves state-of-the-art extraction performance in an end-to-end setting. A simple linear interpolation of the model scores performs better than a parameter-free scheme based on nondominated sorting.

same-paper 2 0.91531634 169 emnlp-2013-Semi-Supervised Representation Learning for Cross-Lingual Text Classification

Author: Min Xiao ; Yuhong Guo

Abstract: Cross-lingual adaptation aims to learn a prediction model in a label-scarce target language by exploiting labeled data from a labelrich source language. An effective crosslingual adaptation system can substantially reduce the manual annotation effort required in many natural language processing tasks. In this paper, we propose a new cross-lingual adaptation approach for document classification based on learning cross-lingual discriminative distributed representations of words. Specifically, we propose to maximize the loglikelihood of the documents from both language domains under a cross-lingual logbilinear document model, while minimizing the prediction log-losses of labeled documents. We conduct extensive experiments on cross-lingual sentiment classification tasks of Amazon product reviews. Our experimental results demonstrate the efficacy of the pro- posed cross-lingual adaptation approach.

3 0.90347356 71 emnlp-2013-Efficient Left-to-Right Hierarchical Phrase-Based Translation with Improved Reordering

Author: Maryam Siahbani ; Baskaran Sankaran ; Anoop Sarkar

Abstract: Left-to-right (LR) decoding (Watanabe et al., 2006b) is a promising decoding algorithm for hierarchical phrase-based translation (Hiero). It generates the target sentence by extending the hypotheses only on the right edge. LR decoding has complexity O(n2b) for input of n words and beam size b, compared to O(n3) for the CKY algorithm. It requires a single language model (LM) history for each target hypothesis rather than two LM histories per hypothesis as in CKY. In this paper we present an augmented LR decoding algorithm that builds on the original algorithm in (Watanabe et al., 2006b). Unlike that algorithm, using experiments over multiple language pairs we show two new results: our LR decoding algorithm provides demonstrably more efficient decoding than CKY Hiero, four times faster; and by introducing new distortion and reordering features for LR decoding, it maintains the same translation quality (as in BLEU scores) ob- tained phrase-based and CKY Hiero with the same translation model.

4 0.74049199 120 emnlp-2013-Learning Latent Word Representations for Domain Adaptation using Supervised Word Clustering

Author: Min Xiao ; Feipeng Zhao ; Yuhong Guo

Abstract: Domain adaptation has been popularly studied on exploiting labeled information from a source domain to learn a prediction model in a target domain. In this paper, we develop a novel representation learning approach to address domain adaptation for text classification with automatically induced discriminative latent features, which are generalizable across domains while informative to the prediction task. Specifically, we propose a hierarchical multinomial Naive Bayes model with latent variables to conduct supervised word clustering on labeled documents from both source and target domains, and then use the produced cluster distribution of each word as its latent feature representation for domain adaptation. We train this latent graphical model us- ing a simple expectation-maximization (EM) algorithm. We empirically evaluate the proposed method with both cross-domain document categorization tasks on Reuters-21578 dataset and cross-domain sentiment classification tasks on Amazon product review dataset. The experimental results demonstrate that our proposed approach achieves superior performance compared with alternative methods.

5 0.66692966 164 emnlp-2013-Scaling Semantic Parsers with On-the-Fly Ontology Matching

Author: Tom Kwiatkowski ; Eunsol Choi ; Yoav Artzi ; Luke Zettlemoyer

Abstract: We consider the challenge of learning semantic parsers that scale to large, open-domain problems, such as question answering with Freebase. In such settings, the sentences cover a wide variety of topics and include many phrases whose meaning is difficult to represent in a fixed target ontology. For example, even simple phrases such as ‘daughter’ and ‘number of people living in’ cannot be directly represented in Freebase, whose ontology instead encodes facts about gender, parenthood, and population. In this paper, we introduce a new semantic parsing approach that learns to resolve such ontological mismatches. The parser is learned from question-answer pairs, uses a probabilistic CCG to build linguistically motivated logicalform meaning representations, and includes an ontology matching model that adapts the output logical forms for each target ontology. Experiments demonstrate state-of-the-art performance on two benchmark semantic parsing datasets, including a nine point accuracy improvement on a recent Freebase QA corpus.

6 0.57175672 51 emnlp-2013-Connecting Language and Knowledge Bases with Embedding Models for Relation Extraction

7 0.56188142 134 emnlp-2013-Modeling and Learning Semantic Co-Compositionality through Prototype Projections and Neural Networks

8 0.55863351 64 emnlp-2013-Discriminative Improvements to Distributional Sentence Similarity

9 0.53509271 139 emnlp-2013-Noise-Aware Character Alignment for Bootstrapping Statistical Machine Transliteration from Bilingual Corpora

10 0.53445977 189 emnlp-2013-Two-Stage Method for Large-Scale Acquisition of Contradiction Pattern Pairs using Entailment

11 0.53411978 172 emnlp-2013-Simple Customization of Recursive Neural Networks for Semantic Relation Classification

12 0.53337574 157 emnlp-2013-Recursive Autoencoders for ITG-Based Translation

13 0.53054631 88 emnlp-2013-Flexible and Efficient Hypergraph Interactions for Joint Hierarchical and Forest-to-String Decoding

14 0.52906042 204 emnlp-2013-Word Level Language Identification in Online Multilingual Communication

15 0.52035862 187 emnlp-2013-Translation with Source Constituency and Dependency Trees

16 0.51804662 14 emnlp-2013-A Synchronous Context Free Grammar for Time Normalization

17 0.51697564 52 emnlp-2013-Converting Continuous-Space Language Models into N-Gram Language Models for Statistical Machine Translation

18 0.51626015 81 emnlp-2013-Exploring Demographic Language Variations to Improve Multilingual Sentiment Analysis in Social Media

19 0.51565033 53 emnlp-2013-Cross-Lingual Discriminative Learning of Sequence Models with Posterior Regularization

20 0.51435685 82 emnlp-2013-Exploring Representations from Unlabeled Data with Co-training for Chinese Word Segmentation